Written by Nickolay Shmyrev
on October 21, 2020

Vosk/Kaldi French model

Recently some good news happened in Kaldi word, essentially, LINTO project released their French model 2.0. This model is trained on 7100 hours according to documentation and looks a bit better than previosly widely used model from Paul Guyot

Both models are not perfect: LM needs update possibly to RNNLM, then LINTO model uses some text postprocessing to split out articles (not very nice for interoperability with commonly used LMs). Then the graph LINTO creates is too huge (6Gb) but in general these models are pretty useful.

We fixed the langauge model issues in the model with better graph and proper CARPA rescoring and made the model for Vosk available for download. Get it here:

https://alphacephei.com/vosk/models/vosk-model-fr-0.6-linto.zip

You can also use this model with Vosk server via docker:

docker run -p 2700:2700 alphacep/kaldi-fr

Here are some results of the models. Please try and comment on the issues you encounter.

Model	CV Test WER	Podcast WER	Speed	Memory
Pguyot	27.98	30.25	0.30xRT	500 Mb
Linto original	14.24	27.04	0.22xRT	7 Gb
Linto VOSK LM	16.25	24.36	0.23xRT	1.7 Gb
Deepspeech FR Polyglot	24.74	43.75	1.0xRT	700 Mb

← Top →