Written by
Nickolay Shmyrev
on
NVIDIA Nemo Conformer-CTC model test results
Not long after Citrinet Nvidia NeMo released Conformer-CTC model. As usual, forget about Citrinet now, Conformer-CTC is way better.
The model is available for download
here,
latest Nemo repo supports it.
We tested the model with the same datasets we tried before, see the results in the table below.
The model is very good.
Take a note it is very important to build the LM and rescore with the LM model. Ideally, rescore with Transformer model too. For
details see previous Citrinet post.
I believe if Nvidia implements AED rescorer instead of CTC, results will be even better. See the Gigaspeech overview for details.
Dataset |
Vosk Aspire |
Vosk Daanzu |
Facebook RASR |
Facebook Wav2Vec2.0 |
Nvidia Citrinet |
Nvidia Conformer-CTC |
Librispeech test-clean |
11.72 |
7.08 |
3.30 |
2.6 |
2.78 |
2.26 |
Tedlium test |
11.23 |
8.25 |
5.96 |
6.3 |
5.61 |
4.89 |
Google commands |
46.76 |
11.64 |
20.06 |
24.1 |
28.15 |
19.77 |
Non-native speech |
57.92 |
33.31 |
26.99 |
29.6 |
28.78 |
24.22 |
Children speech |
20.29 |
9.90 |
6.17 |
5.5 |
6.85 |
5.52 |
Podcasts |
19.85 |
21.21 |
15.06 |
17.0 |
14.82 |
13.79 |
Callcenter bot |
17.20 |
19.22 |
14.55 |
22.5 |
12.85 |
13.96 |
Callcenter 1 |
53.98 |
52.97 |
42.82 |
46.7 |
36.05 |
32.57 |
Callcenter 2 |
33.82 |
43.02 |
30.41 |
36.9 |
29.40 |
27.82 |
Callcenter 3 |
35.86 |
52.80 |
32.98 |
40.9 |
29.78 |
27.44 |