Written by
Nickolay Shmyrev
on
Greetings and Random Thoughts
So 2010 is here, Happy New Year everyone. Wish you all success and happiness and of course increased decoder accuracy! Now we have a long 10 days vacation in Russia, time to travel, eat, drink and sort out bookmarks, read books on the shelf and watch pending google tech talks. Santa also promised me to do some great changes in sphinx4, waiting for that as well.
Though
Ohloh doesn't confirm that, I have a strong feeling that last year the activity around CMUSphinx definitely increased and it's usage is going to grow.
I was thinking a little what should be the direction of sphinx4 development, I think we should consider several factors here. I would be happy to see it as widely-used enterprise level speech recognition engine with a great list of features, but I completely understand that due to the lack of resources it's naive to think we'll be able to do it all. We definitely need to find a market sector for the sphinx project and grow using it. There are already well established projects like HTK that are used widely with their own set of strong and weak features. Julius is used widely as a large vocabulary speech recognition engine with HTK models. It's hard to compete with HTK for us just because it will take years to add that flexibility we probably don't even need. Consider variable of adjustable number of states per phone, something that is only proven to be useful for a small vocabulary task, something we aren't really interested in and I hope will not be interested in a near future. What could be different is our practical orientation.
Many project in speech domain and releated areas are often grown from the research projects and though flexible sometimes, often really unusable in applications since they aren't really designed for that. Usually a research project isn't well documented, has a lot of ways to implement the same thing and some of them are sometimes obsolete. Bugs are rarely fixed and documentation almost missing. Releases are not stable. It's definitely a large field for a commercial support company.
There is a different side, many projects are created in order to solve the user needs, more or less well documented and have stable interfaces, large open community but they are doing so wrong internally I always wonder how they are used at all.
Espeak with it amazingly bad speech synthesis quality and even more amazing popularity. Out-of-date synthesis method doesn't let it be good with any possible modifications. Another example of this is strikingly
Lucene. Unlike
lucidimagination blog states states lucene community is thriving, it's definitely not true. The research articles like
Lucene and Juru at Trec 2007: 1-Million Queries Track definitely shows there is something wrong with Lucene. Basically it lists several trivial changes well known in research community that make Lucene perform two times better on a standard test. I can't understand why this wasn't integrated into stock after three years since article was published.
Let's hope CMUSphinx will find it's place somewhere in the middle. Also, let's hope this year will bring more useful posts decreasing information overload that is certainly going to be a problem in a near future.