An Empirical Comparison of EM Initialization Methods and Model Choice Criteria for Mixtures of Skew-Normal Distributions

Una comparación empírica de algunos métodos de inicialización EM y criterios de selección de modelos para mezclas de distribuciones normales asimetricas

JOSÉ R. PEREIRA1, LEYNE A. MARQUES2, JOSÉ M. DA COSTA3

1Universidade Federal do Amazonas, Instituto de Ciências Exatas, Departamento de Estatística, Manaus, Brasil. Associate professor. Email: jrpereira@ufam.edu.br
2Universidade Federal do Amazonas, Instituto de Ciências Exatas, Departamento de Estatística, Manaus, Brasil. Assistant professor. Email: leyneabuim@gmail.com
3Universidade Federal do Amazonas, Instituto de Ciências Exatas, Departamento de Estatística, Manaus, Brasil. Assistant professor. Email: zemirufam@gmail.com


Abstract

We investigate, via simulation study, the performance of the EM algorithm for maximum likelihood estimation in finite mixtures of skew-normal distributions with component specific parameters. The study takes into account the initialization method, the number of iterations needed to attain a fixed stopping rule and the ability of some classical model choice criteria to estimate the correct number of mixture components. The results show that the algorithm produces quite reasonable estimates when using the method of moments to obtain the starting points and that, combining them with the AIC, BIC, ICL or EDC criteria, represents a good alternative to estimate the number of components of the mixture. Exceptions occur in the estimation of the skewness parameters, notably when the sample size is relatively small, and in some classical problematic cases, as when the mixture components are poorly separated.

Key words: EM algorithm, Mixture of distributions, Skewed distributions.


Resumen

El presente artículo muestra un estudio de simulación que evalúa el desempeño del algoritmo EM utilizado para determinar estimaciones por máxima verosimilitud de los parámetros de la mezcla finita de distribuciones normales asimétricas. Diferentes métodos de inicialización, así como el número de interacciones necesarias para establecer una regla de parada especificada y algunos criterios de selección del modelo para permitir estimar el número apropiado de componentes de la mezcla han sido considerados. Los resultados indican que el algoritmo genera estimaciones razonables cuando los valores iniciales son obtenidos mediante el método de momentos, que junto con los criterios AIC, BIC, ICL o EDC constituyen una eficaz alternativa en la estimación del número de componentes de la mezcla. Resultados insatisfactorios se verificaron al estimar los parámetros de simetría, principalmente seleccionando un tamaño pequeño para la muestra, y en los casos conocidamente problemáticos en los cuales los componentes de la mezcla están suficientemente separados.

Palabras clave: algoritmo EM, distribuciones asimétricas, mezcla de distribuciones.


Texto completo disponible en PDF


References

1. , G. J. M. & Krishnan, T. (2008), The EM Algorithm and Extensions, 2 edn, John Wiley and Sons.

2. , Z. D. B., Krishnaiah, P. R. & Zhao, L. C. (1989), `On rates of convergence of efficient detection criteria in signal processing with white noise´, IEEE Transactions on Information Theory 35, 380-388.

3. Akaike, H. (1974), `A new look at the statistical model identification´, IEEE Transactions on Automatic Control 19, 716-723.

4. Azzalini, A. (1985), `A class of distributions which includes the normal ones´, Scandinavian Journal of Statistics 12, 171-178.

5. Azzalini, A. (2005), `The skew-normal distribution and related multivariate families´, Scandinavian Journal of Statistics 32, 159-188.

6. Basso, R. M., Lachos, V. H., Cabral, C. R. B. & Ghosh, P. (2010), `Robust mixture modeling based on scale mixtures of skew-normal distributions´, Computational Statistics and Data Analysis 54, 2926-2941.

7. Bayes, C. L. & Branco, M. D. (2007), `Bayesian inference for the skewness parameter of the scalar skew-normal distribution´, Brazilian Journal of Probability and Statistics 21, 141-163.

8. Biernacki, C., Celeux, G. & Govaert, G. (2000), `Assessing a mixture model for clustering with the integrated completed likelihood´, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 719-725.

9. Biernacki, C., Celeux, G. & Govaert, G. (2003), `Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models.´, Computational Statistics and Data Analysis 41, 561-575.

10. Cabral, C. R. B., Lachos, V. H. & Prates, M. O. (2012), `Multivariate mixture modeling using skew-normal independent distributions´, Computational Statistics and Data Analysis 56, 126-142.

11. Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), `Maximum likelihood from incomplete data via the EM algorithm´, Journal of the Royal Statistical Society, Series B 39, 1-38.

12. DiCiccio, T. J. & Monti, A. C. (2004), `Inferential aspects of the skew exponential power distribution´, Journal of the American Statistical Association 99, 439-450.

13. Dias, J. G. & Wedel, M. (2004), `An empirical comparison of EM, SEM and MCMC performance for problematic gaussian mixture likelihoods´, Statistics and Computing 14, 323-332.

14. F. Greselin, & Ingrassia, S. (2010), `Constrained monotone EM algorithms for mixtures of multivariate t distributions´, Statistics and Computing 20(1), 9-22.

15. Frühwirth-Schnatter, S. (2006), Finite Mixture and Markov Switching Models, Springer Verlag.

16. Hastie, T., Tibshirani, R. & Friedman, J. (2009), The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Second Edition, Springer, USA.

17. Hathaway, R. J. (1985), `A constrained formulation of maximum-likelihood estimation for normal mixture models´, The Annals of Statistics 13, 795-800.

18. Henze, N. (1986), `A probabilistic representation of the skew-normal distribution´, Scandinavian Journal of Statistics 13, 271-275.

19. Ho, H. J., Pyne, S. & Lin, T. I. (2012), `Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms´, Statistics and Computing 22(1), 287-299.

20. Ingrassia, S. (2004), `A likelihood-based constrained algorithm for multivariate normal mixture models´, Statistical Methods and Applications 13, 151-166.

21. Ingrassia, S. & Rocci, R. (2007), `Constrained monotone EM algorithms for finite mixture of multivariate gaussians´, Computational Statistics and Data Analysis 51, 5339-5351.

22. Karlis, D. & Xekalaki, E. (2003), `Choosing initial values for the EM algorithm for finite mixtures´, Computational Statistics and Data Analysis 41, 577-590.

23. Lin, T. I. (2009), `Maximum likelihood estimation for multivariate skew normal mixture models´, Journal of Multivariate Analysis 100, 257-265.

24. Lin, T. I. (2010), `Robust mixture modeling using multivariate skew t distributions´, Statistics and Computing 20(3), 343-356.

25. Lin, T. I., Lee, J. C. & Hsieh, W. J. (2007), `Robust mixture modelling using the skew t distribution´, Statistics and Computing 17, 81-92.

26. Lin, T. I., Lee, J. C. & Ni, H. F. (2004), `Bayesian analysis of mixture modelling using the multivariate t distribution´, Statistics and Computing 14, 119-130.

27. Lin, T. I., Lee, J. C. & Yen, S. Y. (2007), `Finite mixture modelling using the skew normal distribution´, Statistica Sinica 17, 909-927.

28. Lin, T. & Lin, T. (2010), `Supervised learning of multivariate skew normal mixture models with missing information´, Computational Statistics 25(2), 183-201.

29. McLachlan, G. J. & Peel, G. J. (2000), Finite Mixture Models, John Wiley and Sons.

30. Meng, X. L. & Rubin, D. B. (1993), `Maximum likelihood estimation via the ECM algorithm: a general framework´, Biometrika 80, 267-278.

31. Nityasuddhi, D. & Böhning, D. (2003), `Asymptotic properties of the EM algorithm estimate for normal mixture models with component specific variances´, Computational Statistics and Data Analysis 41, 591-601.

32. Park, H. & Ozeki, T. (2009), `Singularity and slow convergence of the EM algorithm for gaussian mixtures´, Neural Process Letters 29, 45-59.

33. Peel, D. & McLachlan, G. J. (2000), `Robust mixture modelling using the t distribution´, Statistics and Computing 10, 339-348.

34. R Development Core Team, (2009), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.

35. Schwarz, G. (1978), `Estimating the dimension of a model´, Annals of Statistics 6, 461-464.

36. Shoham, S. (2002), `Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions´, Pattern Recognition 35, 1127-1142.

37. Shoham, S., Fellows, M. R. & Normann, R. A. (2003), `Robust, automatic spike sorting using mixtures of multivariate t-distributions´, Journal of Neuroscience Methods 127, 111-122.

38. Stephens, M. (2000), `Dealing with label switching in mixture models´, Journal of the Royal Statistical Society. Series B 62, 795-809.

39. Titterington, D. M., Smith, A. F. M. & Makov, U. E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley and Sons.

40. Yakowitz, S. J. & Spragins, J. D. (1968), `On the identifiability of finite mixtures´, The Annals of Mathematical Statistics 39, 209-214.

41. Yao, W. (2010), `A profile likelihood method for normal mixture with unequal variance´, Journal of Statistical Planning and Inference 140, 2089-2098.


[Recibido en agosto de 2011. Aceptado en octubre de 2012]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv35n3a08,
    AUTHOR  = {Pereira, José R. and Marques, Leyne A. and da Costa, José M.},
    TITLE   = {{An Empirical Comparison of EM Initialization Methods and Model Choice Criteria for Mixtures of Skew-Normal Distributions}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2012},
    volume  = {35},
    number  = {3},
    pages   = {455-476}
}