Interspeech 2020 Wednesday
Wednesday is very promising with many interesting papers, challenges and enligthments. Multimodal learning is gaining more and more attention. Semi-supervised learning is everywhere. Important DNS supression challenge and wonderful Asteroid tool. So many challenges around that some challnges have just a single participant.
For me it was quite exciting that TTS area continues very impressive growth. From new vocoders to practical applications. A section on new paradigms and methods is extremely interesting, 3 papers of the day:
After training on 1200 hours of speech, new speaker learned from 20 seconds of speech, new langauge from 6 minutes.
You can reuse millions of hours of speech data to build much better TTS, not just much better ASR.
And streaming TTS is so long waited feature. Very sad that none of the popular TTS implementations support streaming which is really critical for responsive VUI.