Factorizing E2E on acoustic and language models

February 23, 2026

While end-to-end speech recognition systems are dominating leaderboards, it’s still valuable to consider the separate acoustic and language models. This separation present in the network as the lower layers of...

Failure of SSL

December 13, 2025

The recent release of the FAIR omnilingual model, LeCun news and active use of wav2vec in "semantics" made me think again about SSL in speech. This is going to be...

Открытые модели для распознавания русской речи 2025

April 18, 2025

Обновлено 14.09.2025 * Добавлена модель Vikhr Borealis Обновлено 17.08.2025: * Добавлены Nemo Canary V2 и Whisper Podlodka Turbo Обновлено 15.08.2025: * Добавлена Nemo Parakeet V3 Обновлено 21.07.2025: * Добавлены потоковые...

Experiments with correction of speech recognition output with LLMs

March 15, 2025

Generative error correction is a thing recently, there are many papers on that, even a challenge: Some notable papers: * Large language model based generative error correction: A challenge and...

Experiments with solvers and decoding-time guidance in flow matching

January 17, 2025

Some features are somewhat small and require few lines of code, not really worth a conference paper or a poster. Still, they are somewhat widespread. A blog post about them...

Why discrete units

January 12, 2025

Discrete units made a splash since Hubert probably (2021, four years already), then with Tortoise TTS and successors. Before that there were many attempts too, like the very old system...

Matcha TTS notes

January 03, 2025

Recently I've spent some time with [Matcha](https://github.com/shivammehta25/Matcha-TTS) by Shivam Mehta. Some related papers [Matcha-TTS: A fast TTS architecture with conditional flow matching](https://arxiv.org/abs/2309.03199) [Should you use a probabilistic duration model in...

TTS Design Thoughts

October 18, 2024

We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...

Evaluation of Russian TTS models

July 12, 2024

We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...

Emergent abilities in LLMs

July 07, 2024

There are two extremes these days - one party claims that LLMs has magical emergent abilities, another claims that AI is overhyped and will end soon. The real situation is...