Openfst troubleshooting

A bit of openfst troubleshooting when you try to build WFST with Juicer. Say you are running


fstcompose ${OUTLEXBFSM} ${OUTGRAMBFSM} | \
fstepsnormalize | \
fstdeterminize | \
fstencode --encode_labels - $CODEX | \
fstminimize - | \
fstencode --decode - $CODEX | \
fstpush --push_weights | \
fstarcsort

and get this


FATAL: StringWeight::Plus: unequal arguments (non-functional FST?)


Huh? Which arguments are not equal? What caused this? How to fix this? Definitely it should be more self-explaining. That's basically quite a common issue. You get just a short message that nobody including the author could understand. Go find out how to fix it.





In this particular case you go to the openfst sources and change the following line:


  if (w1 != w2)
    LOG(FATAL) << "StringWeight::Plus: unequal arguments "
               << "(non-functional FST?) " << w1 << " " << w2;

Wait another half an hour for it to compile (who decided to make it with pure templates!). See that it outputs arguments now at least. You run again and get

FATAL: StringWeight::Plus: unequal arguments (non-functional FST?) 833_9 832_9

Heh, also not very descriptive but at least some hint. Looking on the output states 833 and 832 you see
that they have identical pronunciation. That's it. Your dictionary shouldn't have identical pronunciation. Moreover, it shouldn't have identically pronounced trigrams. Things pronounced like "a b cd" vs "ab c d" make wfst non-deterministic. Why didn't it warn about the issue when it converted the dictionary? Who knows. Anyway, now you can read about lexgen and find the option to fight with identical pronunciation:

  -outputAuxPhones           -> indicates that auxiliary phones should be added to pronunciationsin the lexicon in order to disambiguate distinct words withidentical pronunciations 

This option should make things better.

I must admit CMUSphinx is also full of this. Bad error messages which doesn't describe the problem nor hint the solution. Compare too the output of recent maven

[ERROR] No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format : or :[:]:. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/NoGoalSpecifiedException


Maybe it's too verbose but I think it's the right way to do. So if you see something that is not clear in CMUSphinx, please report about it. We'll happily fix it.

Coming up next - what to do when openfst hangs or takes all your memory.