Written by
Nickolay Shmyrev
on
The First Glance On The Interspeech 2009 Papers
Interspeech 2009 in Brighton is over today. Unfortunately I wasn't able to particiapte for various reasons. Still, it was very interesting to review the list of sessions, abstracts and read some articles
available. The modern activity in speech research is amazing, the number of articles and groups is enormous, in total I counted 459 abstracts with grep. It was enjoying to process them all. Currently I reduced the list to 50% of the original size so still need a few lookups to find something more interesting. A few random thoughts I've got:
Sphinx is mentioned 2 times and HTK only once :), that's a win. Of course many researches use HTK for experiments. So it's more the win in being more open.
A lot of machine learning research. And quite a significant amount of research is dedicated to another target space representation/classifier/cost function adjustments. The first glance didn't show anything interesting here unfortunately. Discriminative training is probably the most recent advance in ASR.
Still enormous amount of the old style phontic research. Is vowel length a feature? How do Zulu people click? Sometimes it's interesting to read though.
Almost all TTS is about HMM for speech synthesis. The quality of audio for TTS is a problem. I've recenly read the good and very detailed good
review by Dr. Zen, even adepts of the approach know that the hybrid of HMM and unit-selection is better.
Suprisingly short section on new methods and paradigms unfortunately.
New trends include emotions, machine speech-to-speech translation, language aquisition. Combination of visual and speech recognition is suprisingly common.
No Russians at all. Well, not strange, Russian speech technology doesn't exist in fact.
The RWTH Aachen University Open Source Speech Recognition System is a terrific news. The source is available, downloaded and ready for investigation.
"Improvements to the LIUM French ASR system based on CMU Sphinx: what helps to significantly reduce the word error rate", no link available yet unfortunately. Should be a very interesting reading. The only problem that arises here is that someone should do the merge. The issue is that source is available but really it's very hard to integrate with the research-oriented system.
I'm also waiting for Blizzard 2009 results that should be presented but still not available.
A Self-Labeling Speech Corpus: Collecting Spoken Words with an Online Educational Game - we wanted that for a long time for Voxforge.
In few next posts I'll probably cover some interesting topics in more detail. If you was at the conference or saw something interesting, comments are appreciated.