Some features are somewhat small and require few lines of code, not really worth a conference paper or a poster. Still, they are somewhat widespread. A blog post about them...
Discrete units made a splash since Hubert probably (2021, four years already), then with Tortoise TTS and successors. Before that there were many attempts too, like the very old system...
Recently I've spent some time with [Matcha](https://github.com/shivammehta25/Matcha-TTS) by Shivam Mehta. Some related papers [Matcha-TTS: A fast TTS architecture with conditional flow matching](https://arxiv.org/abs/2309.03199) [Should you use a probabilistic duration model in...
We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...
We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...
There are two extremes these days - one party claims that LLMs has magical emergent abilities, another claims that AI is overhyped and will end soon. The real situation is...
Whisper ASR is a great technology with many innovative things. For example, multiobjective transcription/translation training, a huge 600k hours training dataset or long-context decoding were really revolutionary at the time...
Обновлено 15.12.2024: * Добавлена GigaAM2 RNNT Обновлено 03.10.2024 * Добавлена Whisper V3 Turbo Обновлено 01.06.2024: * добавлена GigaAM, Whisper V3, GigaAM RNNT Предыдущая версия [здесь](https://alphacephei.com/nsh/2023/01/22/russian-models.html) Мы протестировали доступные модели для...
Recently published NaturalSpeech paper attracted some attention. While ideas discussed there are somewhat straight, it is nice to see a solid implementation from a reputable institution and great results. It...
There are many TTS engines around, here are some notes about them ## Speed According to https://arxiv.org/pdf/2210.15975.pdf decoder takes most of the time in TTS, so decoder speed optimization is...