Written by Nickolay Shmyrev
on June 03, 2012

How To Choose Embedded Speech Recognizer

There are quite many solutions around to build an open source speech recognition system for low-resource device and it's quite hard to choose. For example you need a speech recognition system for a platform like Raspberry Pi and you consider between HTK, CMUSphinx, Julius and many other implementations.

In order to make an informed decision you need to consider a set of features specifically required to run speech recognition in a low-resource environment. Without them your system will probably be accurate but it also will consume too much resources to be useful. Some of them are:

Features for the small memory footprint:

Support for a semi-continuous models
Quantized and pruned data structures, mixture weights quantized to 4 bits and pruned, acoustic scores are quantized to 16 bits.
Fixed pointer arithmetics
Bitvector structures

Features for the fast computation:

Top gaussian selection
Simplified lextree search without cross-word context
Multipass processing with tunable performance on each step
Cache access optimization for increased memory throughput
Downsampling
Phone lookahead

Support for a popular mobile platforms:

Out-of-box support for Android
Out-of-box support for IPhone
Out-of-box support for embedded Linux systems like Beagleboard

And quite many other features which are helpful for speech recognition. Except commercial engines the only engine which implements the features above is Pocketsphinx

http://cmusphinx.sourceforge.net

You can learn more about pocketsphinx features from the publication:

http://www.cs.cmu.edu/~dhuggins/Publications/pocketsphinx.pdf

You can learn how to optimize Pocketsphinx for a low-resource environment from the wiki page:

http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

Training acoustic models for embedded device also has some specifics which are required for Pocketsphinx, so Sphinxtrain is an optimal solution here.

There are also demos for Android and IPhone

← Top →