We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...
We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...
There are two extremes these days - one party claims that LLMs has magical emergent abilities, another claims that AI is overhyped and will end soon. The real situation is...
Whisper ASR is a great technology with many innovative things. For example, multiobjective transcription/translation training, a huge 600k hours training dataset or long-context decoding were really revolutionary at the time...
Обновлено 03.10.2024 * Добавлена Whisper V3 Turbo Обновлено 01.06.2024: * добавлена GigaAM, Whisper V3, GigaAM RNNT Предыдущая версия [здесь](https://alphacephei.com/nsh/2023/01/22/russian-models.html) Мы протестировали доступные модели для распознавания русской речи на различных наборах...
Recently published NaturalSpeech paper attracted some attention. While ideas discussed there are somewhat straight, it is nice to see a solid implementation from a reputable institution and great results. It...
There are many TTS engines around, here are some notes about them ## Speed According to https://arxiv.org/pdf/2210.15975.pdf decoder takes most of the time in TTS, so decoder speed optimization is...
Speech technology is continuously disrupted by neural network things and generative AI approaches. A good example is the TTS area. In the last years a hundred methods and models have...
Чем дольше мы изучаем реальность, тем необычней она нам кажется. Например, по текущим представлениям, мозг обладает следующими свойствами: 1. мозг -- высокопараллельная система, 1. информация в мозгу передаётся с помощью...
It is interesting that the longer we study the reality the more unusual it appears to us. For example, if we think about brain, there are two important ideas we...