Interspeech 2020 Tuesday

Returning to Monday I’d like to mention a wonderful keynote of Prof. Janet B. Pierrehumbert The cognitive status of simple and complex models which covered some very interesting details of the langauge the developers always forget.

The Sunday tutorial from Prof. Roger K. Moore Speech 101’-What Eveyone Working on Spoken Language Processing Needs to Know about Spoken Language is also very interesting in that regard.

People always mention famous “When I fire the lingust…” phrase. I would say it is somewhat harmful phrase since it destroyed connections between engineers and linguists and promoted pure technoogy-oriented developments while some great adavances are possible if we truely understood the language. Some good example of using the domain knowledge is LPCNet which enabled good synthesis quality at pretty high speed and recently presented Neural Homomorphic Vocoder.

As for the paper of the day, it is not from Interspeech but a recent Google’s submission to Arxiv:

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

They trained on 3 MILLION HOURS of Youtube data and improved accuracy by 30%. Молодцы!