Sunday, April 12, 2015

[Trouble shooting] HTK WARNING [-2637] HeaviestMix: mix 1 in sp has v.small gConst

Problem
HHEd -A -H mix_moreA/hmmdefs -M mix_moreA 10.hedscript tiedlist
WARNING [-2637]  HeaviestMix: mix 1 in sp has v.small gConst [52.646065] in HHEd


Solution
Reference
From HTK books:

10.6 Mixture Incrementing
...
In HTK therefore, the conversion from single Gaussian HMMs to multiple mixture component HMMs is usually one of the final steps in building a system. The mechanism provided to do this is the HHEd MU command which will increase the number of components in a mixture by a process called mixture splitting. This approach to building a multiple mixture component system is extremely flexible since it allows the number of mixture components to be repeatedly increased until the desired level of performance is achieved.
The MU command has the form

MU n itemList

where n gives the new number of mixture components required and itemList defines the actual mixture distributions to modify. This command works by repeatedly splitting the ’heaviest’ mixture component until the required number of components is obtained. The ’heaviness’ score of a mixture component is defined as the mixture weight minus the number of splits involving that component that have already been carried out by the current MU command. Subtracting the number of splits discourages repeated splitting of the same mixture component. If the GCONST value of a component is more than four standard deviations smaller than the average gConst, a further adjustment is made to the ’heaviness’ score of the component in order to make it very unlikely that the component will be selected for splitting. The actual split is performed by copying the mixture component, dividing the weights of both copies by 2, and finally perturbing the means by plus or minus 0.2 standard deviations.
...
Normally, however, the number of components in all mixture distributions will be increased at the same time. Hence, a command of the form is more usual

MU 3 {*.state[2-4].mix}

It is usually a good idea to increment mixture components in stages, for example, by incrementing by 1 or 2 then re-estimating, then incrementing by 1 or 2 again and re-estimating, and so on until the required number of components are obtained. This also allows recognition performance to be monitored to find the optimum.
One final point with regard to multiple mixture component distributions is that all HTK tools ignore mixture components whose weights fall below a threshold value called MINMIX (defined in HModel.h). Such mixture components are called defunct. Defunct mixture components can be prevented by setting the -w option in HERest so that all mixture weights are floored to some level above MINMIX. If mixture weights are allowed to fall below MINMIX then the corresponding Gaussian parameters will not be written out when the model containing that component is saved. It is possible to recover from this, however, since the MU command will replace defunct mixtures
before performing any requested mixture component increment.

No comments:

Post a Comment