Using HTK models in sphinx4

As from yesterday long waited cool patch by Christophe Cerisara with the help of super fast Yaniv Kunda has landed in svn trunk. Now you can use HTK model directly from sphinx4. Though it's not easy since I spend a few hours today figuring the required issues, so here is a little step-by-step howto:

1. Update to sphinx4 trunk

2. Download small model, because currently binary loading is not supported unfortunately and it takes a lot of resources to load the model from a huge text file. Get a model from Keith Vertanen

http://www.inference.phy.cam.ac.uk/kv227/htk/htk_wsj_si84_2750_8.zip

3. Convert model to text format with HTK HHEd

mkdir out
touch empty
HHEd -H hmmdefs -H macros -M out empty tiedlist


4. Replace model in Lattice demo in configuration file:

<component name="wsj" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
<property name="loader" value="wsjLoader"/>
<property name="unitManager" value="unitManager"/>
</component>
<component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.HTKLoader">
<property name="logMath" value="logMath"/>
<property name="modelDefinition" value="/home/shmyrev/sphinx4/wsj/out/hmmdefs"/>
<property name="unitManager" value="unitManager"/>
</component>


Please note here that modelDefinition property points to the location of the newly created hmmdefas file.

5. Replace the frontend configuration to load HTK features from a file. Unfortunately it's impossible to create HTK features with sphinx4 frontend right now, but this will be implemented soon I hope. Some bits are already present like DCT-II transform with frontend.transform.DiscreteCosineTransform2, some are easy to setup like proper filter coefficients, some are missing. So for now we'll recognize MFC file instead.

<component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
<propertylist name="pipeline">
<item> streamHTKSource </item>
</propertylist>
</component>
<component name="streamHTKSource" type="edu.cmu.sphinx.frontend.util.StreamHTKCepstrum">
<property name="cepstrumLength" value="39"/>
</component>


and let's change the Java file

StreamHTKCepstrum source = (StreamHTKCepstrum) cm.lookup ("streamHTKSource");
InputStream stream = new FileInputStream(new File ("input.mfc"));
source.setInputStream(stream);


6. Now let's extract mfc. Create a config file for HCopy

SOURCEFORMAT = WAV
TARGETKIND = MFCC_D_A_Z_0
TARGETRATE = 100000.0
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = T
ZMEANSOURCE = T
USEPOWER = T


and run it

HCopy -C config 10001-90210-01803.wav input.mfc


make sure input.mfc is located in top sphinx4 folder now since this is the place we'll take it.

7. Now everything is ready

ant && java -jar bin/LatticeDemo.jar


Check the result

I heard: once or a zero zero one nine oh to one oh say or oil days or a jury


It's not very precise, but still ok for such a small model and limited language model.

This is still a work in progress and a lot of things still pending. The most important are reading the binary HTK files, frontend adaptation, cleanup and unification. But I really look forward on the results, since it's really a promising approach. There are not so many BSD-licensed HTK decoders out there.