Written by
Nickolay Shmyrev
on
Wav2Vec and other audio embeddings
Reading recent Facebook paper on audio embeddings wav2vec 2.0: A
Framework for Self-Supervised Learning of Speech Representations by
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael
Auli I wonder how accurate embeddings
for speech recognition one can learn from a large collection of music
instead of large collection of speech. And if they will be simpy
wavelets.