Written by
Nickolay Shmyrev
on
Testing ASR with Voxforge Database
In development and research the critical issue is proper testing. There was some buzz about that recently, for example at
MLoss blog where pros for using open data are considered. One interesting resource that started some time ago is
http://mlcomp.org/, which combines both open data and open algorithm automatically selecting the best method for the common data set. I think it's not that easily implementable idea because "best" is often different. Sometimes you need speed, sometimes generalization.
In our case by using open data you can easily solve the following problems:
- Test the changes you've made in speech decoder and trainer on a practical large-vocabulary database
- Estimate how recognition engine performs. It's not just about estimating the accuracy but also about other critical parameters like confidence score quality, decoding speed, lattice variability, noise robustness and so on.
- Share the bugs you've found. The situation is that we could definitely fix minor problems that are easy to reproduce. Any serious problem ultimately requires a reproducable test example.
I actually wanted to describe how this works in practice right now. The solution we propose for CMUsphinx developers is a
Voxforge database. It's not the only open data source out there, but I think it's most permissive one. Old an4 is good for quick tests, but it definitely doesn't satisfy our needs because everything except large vocabulary recognizer have little sense nowdays.
The database itself is about 75 hours of read speech taken from various sources. The speech was collected by web collection application, from public audiobooks and so on. There are number of accents, sound quality also varies. The information on the source is not very reliable, but that's something we should live with. The speech is segmeneted on utterances and for each utterance transcription is provided. Voxforge DB has some disadvantages like very limited vocabulary (around 5000 words), rather limited focus on read texts, but we need to work on them. I believe issues will be fixed soon.
Voxforge model for CMUSphinx is trained periodically, the recent one could be downloaded here:
http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/The model was trained with SphinxTrain and config could be found inside as well as other scripts to train it. Test results are also inside README file in model. The corresponding performance test is provided in sphinx4.
Steps to start with it are simple:
- Download model from the link above, unpack it.
- Check scripts in script subfolder to setup training:
- Download audio with wget
- Unpack it
- Convert flat co wav
- Make transcription file from PROMPTS
- Extract features
- Run the training with ./scripts_pl/RunAll.pl
- Test the recognition performance, setup other testing environment for various numbers you need.
The space required to train Voxforge is about 10Gb on disk. On a dual-core machine training should take about a day.
Here are the results of the sphinx4 performance test of the acoustic model voxforge-en-r0.1.3:
[java] # --------------- Summary statistics ---------
[java] Accuracy: 90,613% Errors: 4679 (Sub: 2904 Ins: 559 Del: 1216)
[java] Words: 43889 Matches: 39769 WER: 10,661%
[java] Sentences: 4682 Matches: 3050 SentenceAcc: 65,143%
[java] Total Time Audio: 22090,44s Proc: 116616,04s Speed: 5,28 X real time
[java] Mem Total: 1993,75 Mb Free: 806,02 Mb
[java] Used: This: 1187,73 Mb Avg: 1112,13 Mb Max: 1962,46 Mb
To run this test do the following:
- Checkout latest sphinx4
- Go to sphinx4/tests/performance/voxforge_en
- Download and unpack audio files into wav folder using script build.sh from voxforge-en-r0_1_3/scripts
- Download acoustic model and unpack it, creating etc folder with lm and voxforge_en_sphinx.cd_cont_3000 folder with model files
- Start test simply typing ant in command line
You see the decoding takes like 32 hours. Well, it can be faster on multicore machine, but you need to change the configuration of ThreadedScorer to explicetely start multiple scoring threads. Unfortunately automatic detection of number of cores doesn't work in SUN's JVM.
Speaking about Voxforge, we definitely need to thank Ken McLean (great work, Ken!) who is running Voxforge for several years already. It's also worth to mention that without contributors who submitted their speech this project will not be that thriving. So, start with using Voxforge for your developments, report bugs and send us comments. That would be appreciated.