Written by Nickolay Shmyrev
on June 09, 2020

Opus and MP3 for speech recognition

Recently got discussion what is worse for telephony audio compression - opus or mp3. I was under impression that opus is unconditionally better than mp3, but it doesn’t seem the case actually. At least at 32kbps MP3 is even better in terms of SISDR and OPUS and I believe in accuracy rate. I believe this due to non-streaming nature.

The problem of MP3 is spectral masking but it doesn’t happen at 32 kbps.

At the same time 24kbps is harmful.

Publications seem to confirm that actually, relevant ones:

MP3 and AAC Explained

Spectrally Selective Dithering for Distorted Speech Recognition

Dithering Techniques in Automatic Recognition of Speech Corrupted by MP3 Compression: Analysis, Solutions and Experiments

Adding controlled amount of noise to improve recognition of compressed and spectrally distorted speech.

Looks like major speech corruption reasons are not really related to mp3. Most likely those would be background noise, frame drop and bad denoises which people apply one sound to “improve” the recognition.

← Top →