Interspeech 2020 Thursday

Horay, this year I made an effort to review all Interspech paper, it never happened for me for quite some years. Speaker recognition, emotion recognition, ASR for language learning, transformers...

Interspeech 2020 Wednesday

Wednesday is very promising with many interesting papers, challenges and enligthments. Multimodal learning is gaining more and more attention. Semi-supervised learning is everywhere. Important DNS supression challenge and wonderful Asteroid...

Interspeech 2020 Tuesday

Returning to Monday I’d like to mention a wonderful keynote of Prof. Janet B. Pierrehumbert The cognitive status of simple and complex models which covered some very interesting details of...

Interspeech 2020 Monday

Interspeech is overwhelming as usual. Thosands of papers and ideas, lives and thoughts. On one hand I kind of like online format when you can participate in discussions sitting at...

Vosk/Kaldi French model

Recently some good news happened in Kaldi word, essentially, LINTO project released their French model 2.0. This model is trained on 7100 hours according to documentation and looks a bit...

Status of Vosk in October 2020

When you work on things day to day you lose the overall picture very quickly. We’ve been actively training models and fixing things here and there and adding new platforms....

ML datasets are not relevant anymore

We have started promoting data collection for open source speech recognition at Voxforge project in 2007. It has been a great time before the speech recognition revolution but even then...

Vosk/Kaldi German acoustic model for callcenter and broacast transcription

There many open source German models already around, unfortunately, most of them are not perfectly trained. Here is a review of the current state and some information about new German...

Wav2Vec and other audio embeddings

Reading recent Facebook paper on audio embeddings wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli I wonder how accurate...

Opus and MP3 for speech recognition

Recently got discussion what is worse for telephony audio compression - opus or mp3. I was under impression that opus is unconditionally better than mp3, but it doesn’t seem the...