Wav2Vec and other audio embeddings

Reading recent Facebook paper on audio embeddings wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli I wonder how accurate...

Opus and MP3 for speech recognition

Recently got discussion what is worse for telephony audio compression - opus or mp3. I was under impression that opus is unconditionally better than mp3, but it doesn’t seem the...

Kaldi models testing

Many models and datasets become available recently, testing models against datasets becomes more complicated and in the same time more fun. Recenly Kaldi Active Grammar Project released some new models...

Lookahead composition in Kaldi and Vosk

In 2019 AlphaCephei has made quite some good progress. We have introduced a project called Vosk which is meant to be a portable API for speech recognition for variety of...

Spectre and deep learning

I noticed a big slowdown in RELU layer performance recently, essentially the RELU operation can now take up to 10% in the total CPU count. This is with kernel 4.15....

Selected Papers Interspeech 2019 Wednesday

A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2700.pdf...

Selected Papers Interspeech 2019 Tuesday

Spatial and Spectral Fingerprint in The Brain: Speaker Identification from Single Trial MEG Signals Oral; 1000–1020 Debadatta Dash (The University of Texas at Dallas), Paul Ferrari (University of Texas at...

Selected Papers Interspeech 2019 Monday

Overall, it is going pretty good. Many very good papers, diarization joins with decoding, everything goes to the right direction. RadioTalk: a large-scale corpus of talk radio transcripts Doug Beeferman...

Information flows of the future

It is interesting how similar ideas raise here and there in seemingly unrelated context. The recent quote from Actionable Book Summary: The Inevitable by Kevin Kelly And what’s next probably...

The masking problem - capsules, specaug, bert

An important issue with a modern neural networks is their vulnerability to the masked corruption, that is the random corruption of some small amount of samples in the image or...