Written by
Nickolay Shmyrev
on
Using HTK models in sphinx4
As from yesterday long waited cool patch by
Christophe Cerisara with the help of super fast
Yaniv Kunda has landed in svn trunk. Now you can use HTK model directly from sphinx4. Though it's not easy since I spend a few hours today figuring the required issues, so here is a little step-by-step howto:
1. Update to sphinx4 trunk
2. Download small model, because currently binary loading is not supported unfortunately and it takes a lot of resources to load the model from a huge text file. Get a model from
Keith Vertanen http://www.inference.phy.cam.ac.uk/kv227/htk/htk_wsj_si84_2750_8.zip3. Convert model to text format with HTK HHEd
mkdir out
touch empty
HHEd -H hmmdefs -H macros -M out empty tiedlist
4. Replace model in Lattice demo in configuration file:
<component name="wsj" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
<property name="loader" value="wsjLoader"/>
<property name="unitManager" value="unitManager"/>
</component>
<component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.HTKLoader">
<property name="logMath" value="logMath"/>
<property name="modelDefinition" value="/home/shmyrev/sphinx4/wsj/out/hmmdefs"/>
<property name="unitManager" value="unitManager"/>
</component>
Please note here that modelDefinition property points to the location of the newly created hmmdefas file.
5. Replace the frontend configuration to load HTK features from a file. Unfortunately it's impossible to create HTK features with sphinx4 frontend right now, but this will be implemented soon I hope. Some bits are already present like DCT-II transform with frontend.transform.DiscreteCosineTransform2, some are easy to setup like proper filter coefficients, some are missing. So for now we'll recognize MFC file instead.
<component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
<propertylist name="pipeline">
<item> streamHTKSource </item>
</propertylist>
</component>
<component name="streamHTKSource" type="edu.cmu.sphinx.frontend.util.StreamHTKCepstrum">
<property name="cepstrumLength" value="39"/>
</component>
and let's change the Java file
StreamHTKCepstrum source = (StreamHTKCepstrum) cm.lookup ("streamHTKSource");
InputStream stream = new FileInputStream(new File ("input.mfc"));
source.setInputStream(stream);
6. Now let's extract mfc. Create a config file for HCopy
SOURCEFORMAT = WAV
TARGETKIND = MFCC_D_A_Z_0
TARGETRATE = 100000.0
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = T
ZMEANSOURCE = T
USEPOWER = T
and run it
HCopy -C config 10001-90210-01803.wav input.mfc
make sure input.mfc is located in top sphinx4 folder now since this is the place we'll take it.
7. Now everything is ready
ant && java -jar bin/LatticeDemo.jar
Check the result
I heard: once or a zero zero one nine oh to one oh say or oil days or a jury
It's not very precise, but still ok for such a small model and limited language model.
This is still a work in progress and a lot of things still pending. The most important are reading the binary HTK files, frontend adaptation, cleanup and unification. But I really look forward on the results, since it's really a promising approach. There are not so many BSD-licensed HTK decoders out there.