Dealing with pruning issues

I spent a holiday looking on the issues in poketsphinx decoding in fwdflat mode. Initially I thought it's a bug but it appeared that it's just a pruning issue. The result looked like this:

INFO: ngram_search.c(1045): bestpath 0.00 wall 0.000 xRT
INFO: <s> 0 5 1.000 -94208 0 1
INFO: par..grafo 6 63 1.000 -472064 -467 2
INFO: terceiro 64 153 1.000 -1245184 -115 3
INFO: as 154 176 0.934 -307200 -172 3
INFO: emendas 177 218 1.000 -452608 -292 3
INFO: ao 219 226 1.000 -208896 -181 3
INFO: projeto 227 273 1.000 -342016 -152 3
INFO: de 274 283 1.000 -115712 -75 3
INFO: lei 284 3059 1.000 -115712 -79 3


Speech recognition is essentially a search for a globally best path in a graph. Beam pruning is used to drop the nodes during the search if node score is worse then the best node like in this picture


If beam is too narrow, the result might not be the globally best one despite its locally the best. In practice it could lead to complex issues like desribed above. See the word "lei" spans about 2k frames which means about 20 seconds. Another sign of overpruning is number of words scored per frame




INFO: ngram_search_fwdflat.c(940): 2931 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(942): 48013 senones evaluated (16/fr)
INFO: ngram_search_fwdflat.c(944): 9586 channels searched (3/fr)
INFO: ngram_search_fwdflat.c(946): 3849 words searched (1/fr)
INFO: ngram_search_fwdflat.c(948): 9602 word transitions (3/fr)


If you have just one word per frame it's likely an issue.

More detailed behaviour can be seen if debugging in enabled in sources
#define __CHAN_DUMP__ 1


You'll see something like

BEFORE:
SSID 2866 610 611 (2608)
SENSCR -604 -215 -371
SCORES -1014874 -583095 -583097 -583223
HISTID 170 170 170 170
AFTER:
SSID 2866 610 611 (2608)
SENSCR -604 -215 -371
SCORES -1015481 -583315 -583317 -583489
HISTID 170 170 170 170
BEFORE:
SSID 2866 610 611 (2608)
SENSCR -568 -122 -358
SCORES -1015481 -583315 -583317 -583489
HISTID 170 170 170 170
AFTER:
SSID 2866 610 611 (2608)
SENSCR -568 -122 -358
SCORES -1016052 -583442 -583444 -583696
HISTID 170 170 170 170


So you see only one HMM per frame is scored and it doesn't generate any other HMMs

Since those issues are hard to notice since today we will also issue you a warning in the decoder log. It will look like this:

WARNING: "ngram_search.c", line 404: Word 'lei' survived for 2764 frames, potential overpruning
WARNING: "ngram_search.c", line 404: Word 'lei' survived for 2765 frames, potential overpruning


So you'll be warned if something will go wrong.

It's very easy to forget about pruning issues because they are not really visible. You'll only get a drop in the accuracy and you might not notice it. And you might think it's a model accuracy not a search accuracy. In practice you need always remember about that:

Search space configuration and settings have certain effect on the final accuracy and speed.

Default settings are often wrong for modified models. If you have a new model you need to review all the configuration parameters in order to make sure they work. If there are many parameters, you need to check all of them.

If pruning errors in your decoder have very small effect it means you haven't optimized your search space properly. You can definitely do better.



At least we might want to report more useful metrics about pruning in the future.