Comparación entre tres técnicas de clasificación

Comparison for three Classification Techniques

FREDDY HERNÁNDEZ BARAJAS1, JUAN CARLOS CORREA MORALES2

1Universidad de São Paulo, Instituto de Matemática y Estadística, Departamento de estadística, São Paulo, Brasil. Estudiante de doctorado. Email: fhernanb@ime.usp.br
2Universidad Nacional de Colombia, Facultad de Ciencias, Departamento de Estadística, Medellín, Colombia. Profesor asociado. Email: jccorrea@unalmed.edu.co


Resumen

En este artículo se muestran los resultados de un estudio de comparación mediante simulación de tres técnicas de clasificación, regresión logística multinomial (MLR), análisis discriminante no métrico (NDA) y análisis discriminante lineal (LDA). El desempeño de las técnicas se midió usando la tasa de clasificación errónea. Se encontró que las técnicas MLR y LDA tuvieron un desempeño similar y muy superior a NDA cuando la distribución multivariada de las poblaciones es normal o logit-normal; en el caso de distribuciones multivariadas log-normal y Sinh-1-normal la técnica MLR tuvo mejor desempeño.

Palabras clave: regresión logística multinomial, análisis discriminante no métrico, análisis discriminante lineal, clasificación.


Abstract

In this paper we show the results of a comparison simulation study for three classification techniques: Multinomial Logistic Regression (MLR), No Metric Discriminant Analysis (NDA) and Linear Discriminant Analysis (LDA). The measure used to compare the performance of the three techniques was the Error Classification Rate (ECR). We found that MLR and LDA techniques have similar performance and that they are better than DNA when the population multivariate distribution is Normal or Logit-Normal. For the case of log-normal and Sinh-1-normal multivariate distributions we found that MLR had the better performance.

Key words: Logistic regression, Nonparametric discriminant analysis, Multiple classification.


Texto completo disponible en PDF


Referencias

1. Anderson, J. (1972), `Separate Sample Logistic Discrimination´, Biometrica 23, 19-35.

2. Carroll, R. & Pederson, S. (1993), `On Robustness in the Logistic Regression Model´, Journal of the Royal Statistical Society 55, 693-706.

3. Cheng, T., Pia, M. & Feser, V. (2002), `High-Breakdown Estimation of Multivariate Mean and Covariance with Missing Observations´, British Journal of Mathematical and Statistical Psychology 55, 317-335.

4. Choulakian, V. & Almhana, J. (2001), `An Algorithm for Nonmetric Discriminant Analysis´, Computational Statistics & Data Analysis 35, 253-264.

5. Clunies, C. & Riffenburgh, R. (1960), `Geometry and Linear Discrimination´, Biometrics 47, 185-189.

6. Cornfield, J. (1962), `Joint Dependence of the Risk of Coronary Heart Disease on Serum Cholesterol and Systolic Blood Pressure: A Discriminant Function Analysis´, Proceedings of the Federal American Society of Experimental Biology 21, 58-61.

7. Cox, D. (1966), Some Procedures Associated with the Logistic Qualitative Response Curve, John Wiley & Sons, New York, United States.

8. Crawley, D. (1979), `Logistic Discrimination as an Alternative to Fisher's Linear Function´, New Zealand Statistician 14, 21-25.

9. Croux, C. & Dehon, C. (2001), `Robust Linear Discriminant Analysis Using S-Estimators´, Canadian Journal of Statistics/Revue Canadienne de Statistique 29, 473-493.

10. Day, N. & Kerridge, D. (1967), `A General Maximum Likelihood Discriminant´, Biometrics 23, 313-323.

11. Efron, B. (1975), `The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis´, Journal of American Statistical Association 70, 892-898.

12. Fisher, R. A. (1936), `The Use of Multiple Measurements in Taxonomic Problems´, Annual Eugenics 7, 179-188.

13. Guttman, L. (1998), `Eta, disco, odisco and F.´, Psychometrika 53, 393-405.

14. Hand, D. (1989), Discriminant Analysis for Psychiatric Screening, 2 edn, John Wiley & Sons, New York, United States.

15. Harrell, F. E. & Lee, K. L. (1985), A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality, `Biostatistics: Statistics in Biomedical, Public Health and Environmental Sciences´, North-Holland, New York, United States, p. 333-343.

16. Hawkins, D. & McLachan, J. (1997), `High-Breakdown Linear Discriminant Analysis´, Journal of American Statistical Asociation 92, 136-146.

17. Johnson, M. (1987), Multivariate Statistical Simulation, John Wiley & Sons, New York, United States.

18. Little, R. & Smith, P. (1987), `Editing and Imputing for Quantitative Survey Data´, Journal of the American Statistical Association 82, 58-68.

19. Morrison, D. (1990), Multivariate Statistical Methods, 3 edn, McGraw-Hill, New York, United States.

20. Pohar, M., Blas, M. & Turk, S. (2004), `Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation Study´, Metodolski Zvezki 1, 143-161.

21. Pregibon, D. (1981), `Logistic Regression Diagnostics´, The Annals of Statistics 9, 705-724.

22. R Development Core Team, (2008), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. *http://www.R-project.org

23. Rao, C. (1948), `The Utilization of Multiple Measurements in Problems of Biological Classification´, Journal of the Royal Statistical Society: Series B 10, 159-193.

24. Raveh, A. (1983), `Preference Structure Analysis: A Nonmetric Approach´, Patter Recognition 16, 253-259.

25. Raveh, A. (1989), `A Nonmetric Approach to Linear Discriminant Analysis´, Journal of the American Statistical Association 84, 176-183.

26. Rencher, A. (1998), Multivariate Statistical Inference and Applications, John Wiley & Sons, New York, United States.

27. Shelley, B. & Donner, A. (1987), `The Efficiency of Multinomial Logistic Regression Compared with Multiple Group Discriminant Analysis´, Journal of American Statistical Association 82, 1118-1122.

28. Trevor, F. & Ferry, G. (1991), `Robust Logistic Discrimination´, Biometrika 78, 841-849.

29. Welch, B. (1939), `Note on Discriminant Functions´, Biometrika 31, 218-220.


[Recibido en julio de 2008. Aceptado en noviembre de 2009]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv32n2a05,
    AUTHOR  = {Hernández Barajas, Freddy and Correa Morales, Juan Carlos},
    TITLE   = {{Comparación entre tres técnicas de clasificación}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2009},
    volume  = {32},
    number  = {2},
    pages   = {247-265}
}