Written by Nickolay Shmyrev
on October 27, 2020

Interspeech 2020 Wednesday

Wednesday is very promising with many interesting papers, challenges and enligthments. Multimodal learning is gaining more and more attention. Semi-supervised learning is everywhere. Important DNS supression challenge and wonderful Asteroid tool. So many challenges around that some challnges have just a single participant.

For me it was quite exciting that TTS area continues very impressive growth. From new vocoders to practical applications. A section on new paradigms and methods is extremely interesting, 3 papers of the day:

Wed-3-4-3 Towards Universal Text-to-Speech

After training on 1200 hours of speech, new speaker learned from 20 seconds of speech, new langauge from 6 minutes.

Wed-3-4-7 Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

You can reuse millions of hours of speech data to build much better TTS, not just much better ASR.

Wed-1-3-7 High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

And streaming TTS is so long waited feature. Very sad that none of the popular TTS implementations support streaming which is really critical for responsive VUI.

← Top →