Testing Facebook MMS and SeamlessMT4 Word Error Rate

Recently Facebook released MMS and Seamless models for multilingual ASR and translation

https://ai.meta.com/blog/multilingual-model-speech-recognition

https://ai.meta.com/blog/seamless-m4t

Releases got some coverage in media and frequently mentioned as a cool model on Github.

Claims from the paper they are very accurate, more accurate than Whisper V2 on Fleurs dataset.

Testing on standard datasets shows that models are a bit worse than other public models. Moreover, if we talk about Seamless model specifically, it not just have worse WER, it also replaces the words with synonyms, for example:

reference  : experience pre-meal kit
hypothesis : experience before meal

It is ok for translation but for ASR task it doesn’t seem right. Seamless model is not very practical outside translation because of that.

Still, accuracy is not bad and the models are impressive given they are multilingual and can do translation well.

Dataset MMS 1B + LM Seamless Medium Nemo RNN Whisper V2 Sherpa Multi EN 05.2023
Tedlium 5.9 20.1 5.2 6.5 4.0
Gigaspeech 14.7 28.7 14.1 16.5 11.9
CMUKids 5.8 7.9 4.1 4.5 4.6
Librispeech Test Clean 2.4 3.9 1.6 4.0 2.0