Written by
Nickolay Shmyrev
on
Initial value problem for MLLT
So far I've recently discovered with the help of
mchammer2007 the
problem with estimation of the initial matrix for MLLT training. The MLLT or Maximum Likelihood Linear Transform is suggested by R. A. Gopinath, "Maximum Likelihood Modeling with Gaussian Distributions for Classification", in proceedings of ICASSP 1998 and
implemented in Sphinxtrain.
The idea is that the matrix to modify feature space is trained to fix the optimization of the covariances and make covariance matrix look more like the diagonal. The optimization is quite simple gradient descendant but unfortunately it suffers from the initial value problem. That is if you choose proper initial value you could get much better results. So right now random matrix is used:
if A == None:
# Initialize it with a random positive-definite matrix of
# the same shape as the covariances
s = self.cov[0].shape
d = -1
while d < 0:
A = eye(s[0]) + 0.1 * random(s)
d = det(A)
And depending on your luck you could get better or worse recognition results. Sometimes even worse than the usual training without LDA/MLL.
SENTENCE ERROR: 55.4% (72/130) WORD ERROR RATE: 17.5% (135/773)
SENTENCE ERROR: 51.5% (66/130) WORD ERROR RATE: 16.6% (128/773)
SENTENCE ERROR: 50.0% (65/130) WORD ERROR RATE: 15.5% (119/773)
SENTENCE ERROR: 56.2% (73/130) WORD ERROR RATE: 16.9% (130/773)
SENTENCE ERROR: 62.3% (80/130) WORD ERROR RATE: 22.3% (172/773)
So the receipt for the training is the following - train several times and control the accuracy, choose the best MLLT matrix and use it in final trainings. If you have a large database, find best MLLT for a subset of it and use it as an initial value for MLLT estimation. No easier way until we'll find a better method for initial value estimation, quick look on the articles didn't give any.
From recent articles I also got quite a significant collection of LDA derivatives, discriminative ones, HLDA and so on. It would be nice to put them into a some review. Also some of them seems to be free from this initial value problem. It would be nice to get a proper review on this large topic.
Between you can see in the chunk of the code above that the comment is not quite correct. The positive-definiteness of the matrix should be checked differently, with the Silvester criterion for example. Though I think that since the condition det(A) > 0 seems to be enough for the feature space transform, the comment should be simply removed. But probably positive-defined matrix is required for optimization.