Logo
  • Demo
  • News
  • Products
  • Open Source
  • Research
  • About
  • Рус
  • 中文

In-depth evaluation of ASR engines

May 24, 2026

While everyone focuses on latency, there are many measurable aspects of ASR that are easy to evaluate and have a significant impact on user experience. Here are some of them:...

Factorizing E2E on acoustic and language models

February 23, 2026

While end-to-end speech recognition systems are dominating leaderboards, it's still valuable to consider the separate acoustic and language models. This separation present in the network as the lower layers of...

Failure of SSL

December 13, 2025

The recent release of the FAIR omnilingual model, LeCun news and active use of wav2vec in "semantics" made me think again about SSL in speech. This is going to be...

Открытые модели для распознавания русской речи 2025

April 18, 2025

Обновлено 14.09.2025 * Добавлена модель Vikhr Borealis Обновлено 17.08.2025: * Добавлены Nemo Canary V2 и Whisper Podlodka Turbo Обновлено 15.08.2025: * Добавлена Nemo Parakeet V3 Обновлено 21.07.2025: * Добавлены потоковые...

Experiments with correction of speech recognition output with LLMs

March 15, 2025

Generative error correction is a thing recently, there are many papers on that, even a challenge: Some notable papers: * Large language model based generative error correction: A challenge and...

Experiments with solvers and decoding-time guidance in flow matching

January 17, 2025

Some features are somewhat small and require few lines of code, not really worth a conference paper or a poster. Still, they are somewhat widespread. A blog post about them...

Why discrete units

January 12, 2025

Discrete units made a splash since Hubert probably (2021, four years already), then with Tortoise TTS and successors. Before that there were many attempts too, like the very old system...

Matcha TTS notes

January 03, 2025

Recently I've spent some time with [Matcha](https://github.com/shivammehta25/Matcha-TTS) by Shivam Mehta. Some related papers [Matcha-TTS: A fast TTS architecture with conditional flow matching](https://arxiv.org/abs/2309.03199) [Should you use a probabilistic duration model in...

TTS Design Thoughts

October 18, 2024

We spent last year working mostly on TTS just as in the good old Festival times. Here are some more random thoughts I have on the subject. Rants follow, I...

Evaluation of Russian TTS models

July 12, 2024

We recently evaluated Russian open source and proprietary TTS models. Here are the results: {:class="table table-bordered"} |Engine | Voice | CER | xRT GPU | xRT CPU | UTMOS |...

1 →