Interspeech 2020 Monday
Interspeech is overwhelming as usual. Thosands of papers and ideas, lives and thoughts.
On one hand I kind of like online format when you can participate in
discussions sitting at home with a cup of tea. You can visit the
presentation without running cross the floors, no need to hurry. On the
other hand, I miss Shanghai which I really wanted to visit.
Monday seems to be very active day, Tuesday not so much. I bookmarked
more than 50 entries myself, I won’t list them here. Here are just some major
It is still not clear which approach in ASR is the best. Transformers,
encoder-decoders, hybrid networks, and so on. No best direction to
take yet and many small but not so critical improvements. I haven’t
got the full picture yet, but it seems that the core ideas are not
stabilized. There are problems in E2E, there are problems in hybrid.
In hybrid networks we still have quite some things to do. It is sad
that Kaldi doesn’t make it easy to experiment with architectures. The
first thing is to get rid of context dependency tree Mon-2-11-3 Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
E2E teams mostly work on context integration, it seems that they have a big problem with context and with a proper recognition of special cases (names, etc).
Streaming ASR is a big problem, some research is going on, but in general the situation is
clear - there will be a significant drop of the accuracy from missing right context.
Mon-2-3-2 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer
Looks like Speechbrain is not going to be announced on Interspeech as planned. Not sure.
Non-native children speech recognition is a very intersting tasks with very cool results from Cambridge HTK team as well
as many opportunities
Mon-SS-1-6-3 Non-Native Children’s Automatic Speech Recognition: the INTERSPEECH 2020 Shared Task ALTA Systems
Many presentations on semi-supervised, self-supervised learning and so on. Hopefully it will even grow further.
Speech is not a frontier in modern AI unfortunately. Most of the
ideas first appear in NLP (memory-augmented transformers, noisy
student augmentation, distillation, etc) and only then applied in speech. From that
point of view Interspeech is not the leading conference.
And the paper of the day:
Mon-3-7-1 Continual Learning in Automatic Speech Recognition
Very important research and step in the right direction. Unfortuantely
this work misses the core component that we implemented Vosk, so it is
not that efficient as it might be.