Interspeech 2020 Wednesday

Wednesday is very promising with many interesting papers, challenges and enligthments. Multimodal learning is gaining more and more attention. Semi-supervised learning is everywhere. Important DNS supression challenge and wonderful Asteroid tool. So many challenges around that some challnges have just a single participant.

For me it was quite exciting that TTS area continues very impressive growth. From new vocoders to practical applications. A section on new paradigms and methods is extremely interesting, 3 papers of the day:

Wed-3-4-3 Towards Universal Text-to-Speech

After training on 1200 hours of speech, new speaker learned from 20 seconds of speech, new langauge from 6 minutes.

Wed-3-4-7 Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

You can reuse millions of hours of speech data to build much better TTS, not just much better ASR.

Wed-1-3-7 High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

And streaming TTS is so long waited feature. Very sad that none of the popular TTS implementations support streaming which is really critical for responsive VUI.