Written by
Nickolay Shmyrev
on
Fillers in WFST
Another practical question is - how do you integrate fillers? There is silence class introduced in
A GENERALIZED CONSTRUCTION OF INTEGRATED SPEECH RECOGNITION TRANSDUCERS by Cyril Allauzen, Mehryar Mohri, Michael Riley and Brian Roarkand implemented in
transducersaurus.
But you know each practical model has more than just a silence. Fillers like noise, silence, breath, laugh they all go to specific senones in the model. I usually try to minimize them during the training for example joining all them ums, hmms, and mhms into a single phone but I still think they are needed. How to integrate them when you build WFST recognizer?
So I tried few approaches. For example instead of adding just a <sil> class in T transducer I tried to create many branches for each filler. As a result final cascade expands to a huge moster. Like if cascade was 50mb after combination with 1 silence class it is 100mb but after 3-4 classes it's 300mb. Not a nice thing to do.
So I ended in dynamic expansion of silence transitions like this:
if edge is silence:
for filler in fillers:
from node.add_edge(filler)
This seems to work well.