Campaign For Decoding Performance

We spent some time to make speech recognition backend faster. Ben reports in his blog the results on moving scoring to GPU with CUDA/jCUDA, which reduced scoring time dramatically. That's an improvement we are happy to apply in our production environment.

We consider that GPU is not just a speedup of computation, it's a paradigm shift. Historically search is optimized to make the number of scored tokens smaller since it affected accuracy. Now scoring is immediate, but that means that other parts should be changed. There are few issues to smash on the way:

We really target to make it even more faster, in particular we would really like to solve grow part problem.


In classical score/prune/grow scheme unfortunately not only scoring takes significant time. In particular in sphinx4 growing branches is also a bottleneck. When sphinx4 was optimized for LVCSR in the beginning, grow time was also a problem. That's why whole number of workarounds where developed: grow skipping, skew pruning, arc caching and acoustic lookahead. They were successful at that time but not as successful as they could be. At least they don't scale for GPU.

Among papers that I've found there are several publications about GPU-based speech recognition, in particular I would like to note interesting research by Jike Chong and his colleagues. Thanks to Tao for the link! But the issue is that complex grow algorithms is also not considered there. They write about bigram search which basically means they explore very simple state space. In results they compare themselves with HVite. That's the same situation as with WFST when the attempt to bring complexity of large vocabulary into first pass fails and one should stick with bigrams and hope that all important things will be done later on subsequent passes. I'm kind of think that it's a waste of processor resources.

Next issue is more technical. We haven't found good gpu cloud service yet. Though such services are certainly very promising because GPU is more energy-efficient they aren't common yet. Lets hope this situation will improve soon.