Learning with huge memory

Recently a set of papers were published about "memorization" in neural networks. For example:Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayeralsoUnderstanding deep learning requires rethinking generalizationIt seems that large memory...

Future of texts

It seems that people will loose the ability to read, comprehend and remember long texts soon, the question now is - is it possible to deliver very complex messages without...

The case against probabilistic models in metric spaces

A recent discussion on kaldi group about OOV words reminded me about this old problem.One of the things that makes modern recognizers so unnatural is probabilistic models behind them. It's...

IWSLT 2015

IWSLT 2015 proceedings recently appeared. This is an important competition in ASR focused on TED talks translation (and, more interesting for us, transcription).Best system from MITLL-AFRL had a nice WER...

Harmonic Noise Model in Speech Recognition

Recently I came around a nice demo about generation of natural sounds from physical models. This is really an exciting topic because while Hollywood can now draw almost everything like...

On SANE 2015 Videos on Signal Separation

Recently a great collection of videos from Speech and Audio in the Northeast (SANE) 2015 workshop has been shared. The main topic of the workshop was sound signal separation which I consider...

Should we listen our models

I've recently met an interesting paper worth considerationRethinking Algorithm Design and Development in Speech Processingby Thilo Stadelmann et alThis is not mainstream research, but it is exactly what makes it...

Very simple but very important thing to properly model the language

If I would be a scientific advisor I would give my student the following problem:Take a text, take an LM, computer perplexity:file test.txt: 107247 sentences, 1.7608e+06 words, 21302 OOVs 0...

System Combination WER

There is one thing I usually wonder about while reading the next conference paper on speech recognition. The usual paper limit is 4 pages and the authors usually want to...

Mixer 6 database release by LDC & Librivox

LDC has recently announced availability of a very large speech database for acoustic model training. A database named Mixer 6 contains incredible amount of 15000 hours of transcribed speech data by...