Written by Nickolay Shmyrev
on March 16, 2009

Building interpolated language model

Yo, with some little pocking around I managed to get a combined model from the database prompts and a generic model. The accuracy is jumped significantly.

Sadly cmuclmtk requires a lot of magic passes with the models to get lm_combine work. Many thanks to Bayle Shanks from voicekey to write a receipt. So if you want to give it a try:

Download voice-key board
Unpack it
Train both language models
Process them with the scripts lm_combine_workaround
Process both with lm_fix_ngram_counts
Create a weight file like this (the weights could be different of course):

first_model 0.5

second_model 0.5

Combine models with lm_combine.

After all the steps you can enjoy the good language model suitable for dialog transcription.

← Top →