Whisper ASR is a great technology with many innovative things. For example, multiobjective transcription/translation training, a huge 600k hours training dataset or long-context decoding were really revolutionary at the time...
Обновлено 19.04.2025: * Последняя версия за 2025 год здесь: Обновлено 15.12.2024: * Добавлена GigaAM2 RNNT Обновлено 03.10.2024 * Добавлена Whisper V3 Turbo Обновлено 01.06.2024: * добавлена GigaAM, Whisper V3, GigaAM...
Recently published NaturalSpeech paper attracted some attention. While ideas discussed there are somewhat straight, it is nice to see a solid implementation from a reputable institution and great results. It...
There are many TTS engines around, here are some notes about them ## Speed According to https://arxiv.org/pdf/2210.15975.pdf decoder takes most of the time in TTS, so decoder speed optimization is...
Speech technology is continuously disrupted by neural network things and generative AI approaches. A good example is the TTS area. In the last years a hundred methods and models have...
Чем дольше мы изучаем реальность, тем необычней она нам кажется. Например, по текущим представлениям, мозг обладает следующими свойствами: 1. мозг -- высокопараллельная система, 1. информация в мозгу передаётся с помощью...
It is interesting that the longer we study the reality the more unusual it appears to us. For example, if we think about brain, there are two important ideas we...
По аналогии с [тестом открытых русских моделей](https://alphacephei.com/nsh/2023/01/22/russian-models.html) мы протестировали популярные сервисы для распознавания речи на записях телефонии. Результаты на сентябь 2023: {:class="table table-bordered"} | Dataset | Vosk 0.52 | Яндекс...
Recently Facebook released MMS and Seamless models for multilingual ASR and translation Releases got some coverage in media and frequently mentioned as a cool model on Github. Claims from the...
Обновлено 15.04.2024: Последняя версия за 2025 год здесь: Обновлено 10.04.2023: * добавлены 3 набора данных - телевещание, медицина (спасибо Александре Антоновой), русский librispeech * добавлены 2 модели - vosk 0.42,...