We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...
We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...
There are two extremes these days - one party claims that LLMs has magical emergent abilities, another claims that AI is overhyped and will end soon. The real situation is...
Whisper ASR is a great technology with many innovative things. For example, multiobjective transcription/translation training, a huge 600k hours training dataset or long-context decoding were really revolutionary at the time...
Обновлено 15.12.2024: Добавлена GigaAM2 RNNT Обновлено 03.10.2024 Добавлена Whisper V3 Turbo Обновлено 01.06.2024: добавлена GigaAM, Whisper V3, GigaAM RNNT Предыдущая версия здесь Мы протестировали доступные модели для распознавания русской речи...
Recently published NaturalSpeech paper attracted some attention. While ideas discussed there are somewhat straight, it is nice to see a solid implementation from a reputable institution and great results. It...
There are many TTS engines around, here are some notes about them ## Speed According to https://arxiv.org/pdf/2210.15975.pdf decoder takes most of the time in TTS, so decoder speed optimization is...
Speech technology is continuously disrupted by neural network things and generative AI approaches. A good example is the TTS area. In the last years a hundred methods and models have...
Чем дольше мы изучаем реальность, тем необычней она нам кажется. Например, по текущим представлениям, мозг обладает следующими свойствами: 1. мозг -- высокопараллельная система, 1. информация в мозгу передаётся с помощью...
It is interesting that the longer we study the reality the more unusual it appears to us. For example, if we think about brain, there are two important ideas we...