Speech factorization in NaturalSpeech3

March 08, 2024

Recently published NaturalSpeech paper attracted some attention. While ideas discussed there are somewhat straight, it is nice to see a solid implementation from a reputable institution and great results. It...

Modern TTS - details

January 19, 2024

There are many TTS engines around, here are some notes about them ## Speed According to https://arxiv.org/pdf/2210.15975.pdf decoder takes most of the time in TTS, so decoder speed optimization is...

Modern TTS - approaches and requirements

November 29, 2023

Speech technology is continuously disrupted by neural network things and generative AI approaches. A good example is the TTS area. In the last years a hundred methods and models have...

Мозг, время, потоковая обработка и пустые состояния в модели CTC

October 07, 2023

Чем дольше мы изучаем реальность, тем необычней она нам кажется. Например, по текущим представлениям, мозг обладает следующими свойствами: 1. мозг -- высокопараллельная система, 1. информация в мозгу передаётся с помощью...

Brain, Time, CTC blank states and streaming

September 22, 2023

It is interesting that the longer we study the reality the more unusual it appears to us. For example, if we think about brain, there are two important ideas we...

Тестирование сервисов распознавания русской речи

September 17, 2023

По аналогии с [тестом открытых русских моделей](https://alphacephei.com/nsh/2023/01/22/russian-models.html) мы протестировали популярные сервисы для распознавания речи на записях телефонии. Результаты на сентябь 2023: {:class="table table-bordered"} | Dataset | Vosk 0.52 | Яндекс...

Testing Facebook MMS and SeamlessMT4 Word Error Rate

August 24, 2023

Recently Facebook released MMS and Seamless models for multilingual ASR and translation Releases got some coverage in media and frequently mentioned as a cool model on Github. Claims from the...

Открытые модели для распознавания русской речи

January 22, 2023

Обновлено 15.04.2024: Последняя версия за 2025 год здесь: Обновлено 10.04.2023: * добавлены 3 набора данных - телевещание, медицина (спасибо Александре Антоновой), русский librispeech * добавлены 2 модели - vosk 0.42,...

Whisper Fine-Tuning

January 15, 2023

Whisper is very popular these days, so here are some more observations on it. Whisper has many cool properties like very good generic transcription accuracy or accurate punctuation, but Whisper...

Boosting and Stable Diffusion

December 24, 2022

_"The wind blows to the south and turns to the north; round and round it goes, ever returning on its course." - Ecclesiastes 1:6_ _"Hundred years day and night spins...