Written by
Nickolay Shmyrev
on
Opus and MP3 for speech recognition
Recently got discussion what is worse for telephony audio compression - opus or mp3. I was under
impression that opus is unconditionally better than mp3, but it doesn’t seem the case actually.
At least at 32kbps MP3 is even better in terms of SISDR and OPUS and I believe in accuracy rate.
I believe this due to non-streaming nature.
The problem of MP3 is spectral masking but it doesn’t happen at 32 kbps.
At the same time 24kbps is harmful.
Publications seem to confirm that actually, relevant ones:
MP3 and AAC Explained
Spectrally Selective Dithering for Distorted Speech Recognition
Dithering Techniques in Automatic Recognition of Speech Corrupted by MP3 Compression: Analysis, Solutions and Experiments
Adding controlled amount of noise to improve recognition of compressed and spectrally distorted speech.
Looks like major speech corruption reasons are not really related to mp3.
Most likely those would be background noise, frame drop and bad denoises
which people apply one sound to “improve” the recognition.