Logo
  • Demo
  • News
  • Products
  • Open Source
  • Research
  • About
  • Рус
  • 中文

The case against probabilistic models in metric spaces

April 19, 2016

A recent discussion on kaldi group about OOV words reminded me about this old problem.One of the things that makes modern recognizers so unnatural is probabilistic models behind them. It's...

IWSLT 2015

January 12, 2016

IWSLT 2015 proceedings recently appeared. This is an important competition in ASR focused on TED talks translation (and, more interesting for us, transcription).Best system from MITLL-AFRL had a nice WER...

Harmonic Noise Model in Speech Recognition

January 11, 2016

Recently I came around a nice demo about generation of natural sounds from physical models. This is really an exciting topic because while Hollywood can now draw almost everything like...

On SANE 2015 Videos on Signal Separation

November 24, 2015

Recently a great collection of videos from Speech and Audio in the Northeast (SANE) 2015 workshop has been shared. The main topic of the workshop was sound signal separation which I consider...

Should we listen our models

July 05, 2015

I've recently met an interesting paper worth considerationRethinking Algorithm Design and Development in Speech Processingby Thilo Stadelmann et alThis is not mainstream research, but it is exactly what makes it...

Very simple but very important thing to properly model the language

May 04, 2014

If I would be a scientific advisor I would give my student the following problem:Take a text, take an LM, computer perplexity:file test.txt: 107247 sentences, 1.7608e+06 words, 21302 OOVs 0...

System Combination WER

December 14, 2013

There is one thing I usually wonder about while reading the next conference paper on speech recognition. The usual paper limit is 4 pages and the authors usually want to...

Mixer 6 database release by LDC & Librivox

August 21, 2013

LDC has recently announced availability of a very large speech database for acoustic model training. A database named Mixer 6 contains incredible amount of 15000 hours of transcribed speech data by...

Around noise-robust PNCC features

June 25, 2013

Last week I've been working on PNCC features which are famous features for speech recognition by Chanwoo Kim and Richard Stern. I made quite some experiments with parameters and research around PNCC. Here...

Building a Generic Langauge Model

January 02, 2013

I spent some time recently building a language model from the open Gutenberg texts, it has been released today:http://cmusphinx.sourceforge.net/2013/01/a-new-english-language-model-release/Unfortunately, it appeared that it's very hard to build a model which...

← 7 →