Status of Vosk in October 2020

When you work on things day to day you lose the overall picture very quickly. We’ve been actively training models and fixing things here and there and adding new platforms. Now Vosk supports 16 languages (recent additions are German, Catalan and Farsi) and several important platforms, Asterisk, Freeswitch and even Jigasi. It is somewhat easy to start although the API needs more love and the Windows support needs more love and so on and so forth.

Two big events and few small events brought me to the Earth recently.

First one is Kaldi meeting, a really exciting event with very wide representation from academia, industry and even big corporations. It is amazing how many people use Kaldi in their practice and follow it. Many interesting things in the agenda, but the major ones are:

A good example of modern pretty simple decoder is Google’s ContextNet.

Most of those issues are going to be solved with new K2 which is really promising but there are few things I would try in the original Kaldi itself (tried some of it already):

Second thing was that I tried to run Vosk on OSX with the microphone. The experience is awful to be honest, the accuracy is pretty low and response time is like 2 seconds even with a very small model. It is very important to “eat your own dog food” as I have been reminded again and again. Sadly, I rarely use speech recognition in daily life, definitely I need to use it more frequently.

Things to do in Vosk:

Lots of fun ahead.