Written by
Nickolay Shmyrev
on
Testing Facebook MMS and SeamlessMT4 Word Error Rate
Recently Facebook released MMS and Seamless models for multilingual ASR and translation
https://ai.meta.com/blog/multilingual-model-speech-recognition
https://ai.meta.com/blog/seamless-m4t
Releases got some coverage in media and frequently mentioned as a cool model on Github.
Claims from the paper they are
very accurate, more accurate than Whisper V2 on Fleurs dataset.
Testing on standard datasets shows that models are a bit worse than other
public models. Moreover, if we talk about Seamless model specifically, it
not just have worse WER, it also replaces the words with synonyms, for
example:
reference : experience pre-meal kit
hypothesis : experience before meal
It is ok for translation but for ASR task it doesn’t seem right. Seamless model is not very
practical outside translation because of that.
Still, accuracy is not bad and the models are impressive given they are
multilingual and can do translation well.
Dataset |
MMS 1B + LM |
Seamless Medium |
Nemo RNN |
Whisper V2 |
Sherpa Multi EN 05.2023 |
Tedlium |
5.9 |
20.1 |
5.2 |
6.5 |
4.0 |
Gigaspeech |
14.7 |
28.7 |
14.1 |
16.5 |
11.9 |
CMUKids |
5.8 |
7.9 |
4.1 |
4.5 |
4.6 |
Librispeech Test Clean |
2.4 |
3.9 |
1.6 |
4.0 |
2.0 |