Written by
Nickolay Shmyrev
on
on
How To Choose Embedded Speech Recognizer
There are quite many solutions around to build an open source speech recognition system for low-resource device and it's quite hard to choose. For example you need a speech recognition system for a platform like Raspberry Pi and you consider between HTK, CMUSphinx, Julius and many other implementations.In order to make an informed decision you need to consider a set of features specifically required to run speech recognition in a low-resource environment. Without them your system will probably be accurate but it also will consume too much resources to be useful. Some of them are:
Features for the small memory footprint:
- Support for a semi-continuous models
- Quantized and pruned data structures, mixture weights quantized to 4 bits and pruned, acoustic scores are quantized to 16 bits.
- Fixed pointer arithmetics
- Bitvector structures
- Top gaussian selection
- Simplified lextree search without cross-word context
- Multipass processing with tunable performance on each step
- Cache access optimization for increased memory throughput
- Downsampling
- Phone lookahead
- Out-of-box support for Android
- Out-of-box support for IPhone
- Out-of-box support for embedded Linux systems like Beagleboard
http://cmusphinx.sourceforge.net
You can learn more about pocketsphinx features from the publication:
http://www.cs.cmu.edu/~dhuggins/Publications/pocketsphinx.pdf
You can learn how to optimize Pocketsphinx for a low-resource environment from the wiki page:
http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds
Training acoustic models for embedded device also has some specifics which are required for Pocketsphinx, so Sphinxtrain is an optimal solution here.
There are also demos for Android and IPhone