Everyone is crazy about OpenAI Whisper. Trained on 680 thousand hours of multilingual data it indeed sets a new stage in speech recognition. We tested it and some other recent...
While people recently argue if Google's model [is sentient](https://news.ycombinator.com/item?id=31721584) we must admit that another important property of the living creatures emerged in recent AI models - they started to have...
Almost a year we haven't updated the news here. Time goes fast and new things keep us busy. There are some news to discuss, but they are mostly worth a...
What I really like in speech recognition and what keeps me excited about it is an active on-going development of speech recognition technology which boosts both speech recognition results and,...
While dataset sizes grow beyond 10 thousand hours (Gigaspeech) the compute requirements for speech recognition research also grow. Any research even a simple architecture testing gets harder and harder because...
Not long after Citrinet Nvidia NeMo released Conformer-CTC model. As usual, forget about Citrinet now, Conformer-CTC is way better. The model is available for download [here](https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_en_conformer_ctc_large), latest Nemo repo supports...
This week ICASSP 2021 starts online. A bit late time for a year and everyone already looked on the publications. Many papers are already on Arxiv for some time, some...
The race for biggest model continue. Recently NVIDIA came out with a Citrinet model, a bigger and more advanced version of Quartznet. The publication is: [Citrinet: Closing the Gap between...
With the development of neural network toolkits it seems that the technology reached the point where huge network can remember and recognize almost everything as long as it was properly...
We continue testing of the most advanced ASR models, here we try famous Wav2Vec2.0, an impressive work by Facebook. Here are previous posts: Nvidia Nemo Wav2Letter RASR The ideas behind...