Written by Nickolay Shmyrev
on January 14, 2012

Dealing with pruning issues

I spent a holiday looking on the issues in poketsphinx decoding in fwdflat mode. Initially I thought it's a bug but it appeared that it's just a pruning issue. The result looked like this:

 INFO: ngram_search.c(1045): bestpath 0.00 wall 0.000 xRT
INFO:   <s>                  0     5     1.000 -94208     0          1
INFO:   par..grafo           6     63    1.000 -472064    -467       2
INFO:   terceiro             64    153   1.000 -1245184   -115       3
INFO:   as                   154   176   0.934 -307200    -172       3
INFO:   emendas              177   218   1.000 -452608    -292       3
INFO:   ao                   219   226   1.000 -208896    -181       3
INFO:   projeto              227   273   1.000 -342016    -152       3
INFO:   de                   274   283   1.000 -115712    -75        3
INFO:   lei                  284   3059  1.000 -115712    -79        3

Speech recognition is essentially a search for a globally best path in a graph. Beam pruning is used to drop the nodes during the search if node score is worse then the best node like in this picture

If beam is too narrow, the result might not be the globally best one despite its locally the best. In practice it could lead to complex issues like desribed above. See the word "lei" spans about 2k frames which means about 20 seconds. Another sign of overpruning is number of words scored per frame


INFO: ngram_search_fwdflat.c(940):     2931 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(942):    48013 senones evaluated (16/fr)
INFO: ngram_search_fwdflat.c(944):     9586 channels searched (3/fr)
INFO: ngram_search_fwdflat.c(946):     3849 words searched (1/fr)
INFO: ngram_search_fwdflat.c(948):     9602 word transitions (3/fr)

If you have just one word per frame it's likely an issue.

More detailed behaviour can be seen if debugging in enabled in sources

 #define __CHAN_DUMP__           1

You'll see something like


BEFORE:
SSID          2866         610         611 (2608)
SENSCR        -604        -215        -371
SCORES    -1014874     -583095     -583097     -583223
HISTID         170         170         170         170
AFTER:
SSID          2866         610         611 (2608)
SENSCR        -604        -215        -371
SCORES    -1015481     -583315     -583317     -583489
HISTID         170         170         170         170
BEFORE:
SSID          2866         610         611 (2608)
SENSCR        -568        -122        -358
SCORES    -1015481     -583315     -583317     -583489
HISTID         170         170         170         170
AFTER:
SSID          2866         610         611 (2608)
SENSCR        -568        -122        -358
SCORES    -1016052     -583442     -583444     -583696
HISTID         170         170         170         170

So you see only one HMM per frame is scored and it doesn't generate any other HMMs

Since those issues are hard to notice since today we will also issue you a warning in the decoder log. It will look like this:

 WARNING: "ngram_search.c", line 404: Word 'lei' survived for 2764 frames, potential overpruning
WARNING: "ngram_search.c", line 404: Word 'lei' survived for 2765 frames, potential overpruning

So you'll be warned if something will go wrong.

It's very easy to forget about pruning issues because they are not really visible. You'll only get a drop in the accuracy and you might not notice it. And you might think it's a model accuracy not a search accuracy. In practice you need always remember about that:

Search space configuration and settings have certain effect on the final accuracy and speed.

Default settings are often wrong for modified models. If you have a new model you need to review all the configuration parameters in order to make sure they work. If there are many parameters, you need to check all of them.

If pruning errors in your decoder have very small effect it means you haven't optimized your search space properly. You can definitely do better.

At least we might want to report more useful metrics about pruning in the future.

← Top →