Written by
Nickolay Shmyrev
on
Building interpolated language model
Yo, with some little pocking around I managed to get a combined model from the database prompts and a generic model. The accuracy is jumped significantly.
Sadly cmuclmtk requires a lot of magic passes with the models to get lm_combine work. Many thanks to Bayle Shanks from
voicekey to write a receipt. So if you want to give it a try:
- Download voice-keyboard
- Unpack it
- Train both language models
- Process them with the scripts lm_combine_workaround
- Process both with lm_fix_ngram_counts
- Create a weight file like this (the weights could be different of course):
first_model 0.5
second_model 0.5
- Combine models with lm_combine.
After all the steps you can enjoy the good language model suitable for dialog transcription.