Written by Nickolay Shmyrev
on June 21, 2011

ICASSP 2011 Part 1 - Thoughts

It seems like ICASSP this year was a great event, it is pity I missed it. Just comparing the keynotes list, ICASSP beats Interspeech 4:0. ICASSP is very technical, Interspeech is for linguists. Compare the two:

Making Sense of a Zettabyte World vs Neural Representations of Word Meanings

New section formats like technical tracks and trends discussions are interesting though I am not sure how they felt in practice.

So this was the reason to spend few days in reading. 1000 papers on speech technology! Huh. Thanks to all authors for their hard work! Well, I found several duplicates in the end.

Main thing I noted is that topics of the research are very sparse, for example

Everyone does speaker recognition. Appealing problem statement here is that here is to detect a synthetic speaker. Paper titled "DETECTION OF SYNTHETIC SPEECH FOR THE PROBLEM OF IMPOSTURE" by De Leon at al. hints that there is no solution for that.
I got tired to skip pursuits, bandiths and compressive sensing
On the other side, increased portion of papers on non-speech signals, cocktail party problem, signal recovery is very interesting to read.
Things like DBN features or SCARF decoder are widely represented. You can read about applications of CRF from g2p algorithms to dialogs. But traditional things like search algorithms and adaptation are almost uncovered.
It was suprising to find the session dedictated to multimedia security which must be a gold mine of ideas in particular if you need a topic for a paper. Is there a company selling such products?

Overall I found several original problem statements as well as inspiring ideas covering very important technology issues. For example it would be nice to implement meeting transcription application with several iPhones to combine streams and later transcribe them using multichannel environment compensation. Several meeting transcription setups and channel separation methods are described in the conference proceedings.

After reading some amount of papers I found that conference papers are too short. While you see a nice title and an abstract you expect to read a detailed insight into the problem with history discourse and everything explained in detail, a deep investigation of the problem. But you get just a description of the technology and few figures from experiments. On the other side, I will not be able to read 100 papers 20 pages each.

Very interesting that this year awards are not related to speech technology. That will be the contents of Part 2. I just need to go through last 50 papers left.

← Top →