on
Vosk/Kaldi French model
Recently some good news happened in Kaldi word, essentially, LINTO project released their French model 2.0. This model is trained on 7100 hours according to documentation and looks a bit better than previosly widely used model from Paul Guyot
Both models are not perfect: LM needs update possibly to RNNLM, then LINTO model uses some text postprocessing to split out articles (not very nice for interoperability with commonly used LMs). Then the graph LINTO creates is too huge (6Gb) but in general these models are pretty useful.
We fixed the langauge model issues in the model with better graph and proper CARPA rescoring and made the model for Vosk available for download. Get it here:
https://alphacephei.com/vosk/models/vosk-model-fr-0.6-linto.zip
You can also use this model with Vosk server via docker:
docker run -p 2700:2700 alphacep/kaldi-fr
Here are some results of the models. Please try and comment on the issues you encounter.
Model | CV Test WER | Podcast WER | Speed | Memory |
---|---|---|---|---|
Pguyot | 27.98 | 30.25 | 0.30xRT | 500 Mb |
Linto original | 14.24 | 27.04 | 0.22xRT | 7 Gb |
Linto VOSK LM | 16.25 | 24.36 | 0.23xRT | 1.7 Gb |
Deepspeech FR Polyglot | 24.74 | 43.75 | 1.0xRT | 700 Mb |