Written by
Nickolay Shmyrev
on
Interspeech 2020 Tuesday
Returning to Monday I’d like to mention a wonderful keynote of Prof. Janet B.
Pierrehumbert The cognitive status of simple and complex
models
which covered some very interesting details of the langauge the
developers always forget.
The Sunday tutorial from Prof. Roger K. Moore Speech 101’-What Eveyone Working
on Spoken Language Processing Needs to Know about Spoken
Language
is also very interesting in that regard.
People always mention famous “When I fire the lingust…” phrase. I would
say it is somewhat harmful phrase since it destroyed connections between
engineers and linguists and promoted pure technoogy-oriented developments
while some great adavances are possible if we truely understood the
language. Some good example of using the domain knowledge is LPCNet which
enabled good synthesis quality at pretty high speed and recently
presented Neural Homomorphic Vocoder.
As for the paper of the day, it is not from Interspeech but a recent Google’s submission to Arxiv:
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
They trained on 3 MILLION HOURS of Youtube data and improved accuracy by 30%. Молодцы!