Written by
Nickolay Shmyrev
on
Bad prompts issue
After quite a lot of training of the model on a small part of database to test things I came to conclusion that the main issue is a bad prompts. Indeed the accuracy on the training set for 4 hours of data with the language model trained on the same training prompts is only 85%. Usually it should be around 93%. The issue here is that real testing prompts are also bad and they should stay that way, otherwise we'll be bounded to high quality speech only. I remember I tried a forced alignment with communicator model before but it didn't improve much just because of the testing set issue. Another try was to use skip state, that was not fruitful as well.
So the plan for now is to choose the subset with the forced alignment again and train the model to check if the hypothesis is true and bad prompts in an acoustic database is indeed a main issue. It looks like we are walking around by the circle.
I ended reading the article titled
"Lightly supervised model training"