Vosk/Kaldi German acoustic model for callcenter and broacast transcription

There many open source German models already around, unfortunately, most of them are not perfectly trained. Here is a review of the current state and some information about new German model for Vosk.

Zamia

Zamia provides scripts for training Kaldi models as well as pretrained models. Pretrained models are pretty good, in particular a mobile one. Some points:

Tuda-de

Tuda provides corpus for training the German model as well as smooth Kaldi model training setup. Overall training process is ok, but there are some tiny issues.

überzuckert ? y: b 6 'ts U k 6 t

here ‘ is a stress mark and it should be on vowel since stressed vowel is affected most of all, not on consonant.

   überzuckert ? y: b 6 ts 'U k 6 t

German ASR

German ASR is another project to train on mAI-Labs, SwC and Tuda data. Scripts are more straightforward since based on librispeech.

Deepspeech

There are several models, one is Deepspeech German with model for 0.6, another is Jaco Deepspeech Polyglot model with model for 0.7

Vosk model

We have recently trained Vosk German model mostly following Tuda recipe. Our model uses proper big language model and narrowband acoustic model so fits telephony. You can download model here.

Vosk-server is also updated, so you can simply run:

docker run -d -p 2700:2700 alphacep/kaldi-de:latest

Also we have a small model for mobile applications which is derived from a Zamia small model with updated lookahead graph.

Test results

Here are error rates on TUDA-De test set and on our internal Podcast transcription test:

Model Tuda Test WER Podcast WER Speed
Zamia 11.48 31.12 0.33xRT
Tuda pretrained 13.21 27.78 0.9xRT
German ASR 12.80 ??? ???
Deepspeech German 39.79 55.89 Very slow
Deepspeech Polyglot 29.07 52.72 Very slow
Vosk 11.07 27.45 0.33xRT
Vosk Rescoring 9.31 26.26 0.33xRT
Vosk Mobile 14.81 37.46 0.14xRT

Discussion

So you see all Kaldi German models are more or less the same, since they are using about the same data. Models could be improved significantly as well, we will post the updates soon. Feel free to test and comment.