Открытые модели для распознавания русской речи 2025

April 18, 2025

Обновлено 21.07.2025: Добавлены потоковые модели Vosk Small Streaming 0.54 и t-tech/T-One Предыдущие версии 2023, 2024 Мы протестировали доступные модели для распознавания русской речи на различных наборах данных. Интересных моделей довольно...

Experiments with correction of speech recognition output with LLMs

March 15, 2025

Generative error correction is a thing recently, there are many papers on that, even a challenge: Some notable papers: * Large language model based generative error correction: A challenge and...

Experiments with solvers and decoding-time guidance in flow matching

January 17, 2025

Some features are somewhat small and require few lines of code, not really worth a conference paper or a poster. Still, they are somewhat widespread. A blog post about them...

Why discrete units

January 12, 2025

Discrete units made a splash since Hubert probably (2021, four years already), then with Tortoise TTS and successors. Before that there were many attempts too, like the very old system...

Matcha TTS notes

January 03, 2025

Recently I've spent some time with [Matcha](https://github.com/shivammehta25/Matcha-TTS) by Shivam Mehta. Some related papers [Matcha-TTS: A fast TTS architecture with conditional flow matching](https://arxiv.org/abs/2309.03199) [Should you use a probabilistic duration model in...

TTS Design Thoughts

October 18, 2024

We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...

Evaluation of Russian TTS models

July 12, 2024

We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...

Emergent abilities in LLMs

July 07, 2024

There are two extremes these days - one party claims that LLMs has magical emergent abilities, another claims that AI is overhyped and will end soon. The real situation is...

Status of Whisper ASR Libraries

April 20, 2024

Whisper ASR is a great technology with many innovative things. For example, multiobjective transcription/translation training, a huge 600k hours training dataset or long-context decoding were really revolutionary at the time...

Открытые модели для распознавания русской речи 2024

April 14, 2024

Обновлено 19.04.2025: * Последняя версия за 2025 год здесь: Обновлено 15.12.2024: * Добавлена GigaAM2 RNNT Обновлено 03.10.2024 * Добавлена Whisper V3 Turbo Обновлено 01.06.2024: * добавлена GigaAM, Whisper V3, GigaAM...