When information is already lost

April 09, 2018

In speech recognition we frequently deal with noisy or simply corrupted recordings. For example, in call center recordings you still get error rates like 50% or 60% even with the...

Learning with huge memory

January 03, 2017

Recently a set of papers were published about "memorization" in neural networks. For example:Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayeralsoUnderstanding deep learning requires rethinking generalizationIt seems that large memory...

Future of texts

July 18, 2016

It seems that people will loose the ability to read, comprehend and remember long texts soon, the question now is - is it possible to deliver very complex messages without...

The case against probabilistic models in metric spaces

April 19, 2016

A recent discussion on kaldi group about OOV words reminded me about this old problem.One of the things that makes modern recognizers so unnatural is probabilistic models behind them. It's...

IWSLT 2015

January 12, 2016

IWSLT 2015 proceedings recently appeared. This is an important competition in ASR focused on TED talks translation (and, more interesting for us, transcription).Best system from MITLL-AFRL had a nice WER...

Harmonic Noise Model in Speech Recognition

January 11, 2016

Recently I came around a nice demo about generation of natural sounds from physical models. This is really an exciting topic because while Hollywood can now draw almost everything like...

On SANE 2015 Videos on Signal Separation

November 24, 2015

Recently a great collection of videos from Speech and Audio in the Northeast (SANE) 2015 workshop has been shared. The main topic of the workshop was sound signal separation which I consider...

Should we listen our models

July 05, 2015

I've recently met an interesting paper worth considerationRethinking Algorithm Design and Development in Speech Processingby Thilo Stadelmann et alThis is not mainstream research, but it is exactly what makes it...

Very simple but very important thing to properly model the language

May 04, 2014

If I would be a scientific advisor I would give my student the following problem:Take a text, take an LM, computer perplexity:file test.txt: 107247 sentences, 1.7608e+06 words, 21302 OOVs 0...

System Combination WER

December 14, 2013

There is one thing I usually wonder about while reading the next conference paper on speech recognition. The usual paper limit is 4 pages and the authors usually want to...