Written by Nickolay Shmyrev
on March 10, 2009

Cleanup strategies for acoustic models

An interesting discussion goes on Voxforge about the cleanup of the acoustic database. It seems for me that we really different from the usual research acoustic models which are mostly properly balanced. We have a load of unchecked contributions with non-native accents and so on. But we still have to work with such database and get sensible models from them. Fisher experience showed that even loosely checked data could be useful for training. Although we aren't that nicely transcribed as Fisher, we still can be useful only if we'll apply specific training method that assume the nature of the data collected.

I tried to find some articles about training of the acoustic model on the uncomplete data, but it seems that most of such research is devoted to another domains like web classification. Web data by definition is incomplete and has errors. We could reuse their methods on unsupervised learning, but I failed to find information on this. Links are welcome.

Another interesting reading I had today is the performance of the Fisher database. Articles mention that the baseline is around 22% WER on 20xRT speed. 20xRT is unacceptably slow I think, but even with 5xRT we are close to this barrier. The thing that makes me wonder is that in sphinx4 beams make decoding slow but doesn't improve accuracy. It must be a bug I think.

← Top →