Written by
Nickolay Shmyrev
on
Interspeech 2020 Wednesday
Wednesday is very promising with many interesting papers, challenges and
enligthments. Multimodal learning is gaining more and more attention.
Semi-supervised learning is everywhere. Important DNS supression
challenge and wonderful Asteroid
tool. So many challenges around that some challnges have just a single
participant.
For me it was quite exciting that TTS area continues very impressive
growth. From new vocoders to practical applications. A section on new
paradigms and methods is extremely interesting, 3 papers of the day:
Wed-3-4-3 Towards Universal Text-to-Speech
After training on 1200 hours of speech, new speaker learned from 20 seconds of speech, new langauge from 6 minutes.
Wed-3-4-7 Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
You can reuse millions of hours of speech data to build much better TTS, not just much better ASR.
Wed-1-3-7 High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
And streaming TTS is so long waited feature. Very sad that none of the popular TTS implementations support streaming which is
really critical for responsive VUI.