Building interpolated language model

Yo, with some little pocking around I managed to get a combined model from the database prompts and a generic model. The accuracy is jumped significantly.

Sadly cmuclmtk requires a lot of magic passes with the models to get lm_combine work. Many thanks to Bayle Shanks from voicekey to write a receipt. So if you want to give it a try:

  • Download voice-keyboard
  • Unpack it
  • Train both language models
  • Process them with the scripts lm_combine_workaround
  • Process both with lm_fix_ngram_counts
  • Create a weight file like this (the weights could be different of course):
first_model 0.5

second_model 0.5
  • Combine models with lm_combine.
After all the steps you can enjoy the good language model suitable for dialog transcription.