Feature selection using a genetic algorithm-based hybrid approach
Selección de características usando modelo hibrido basado en algoritmos genéticos
DOI:
https://doi.org/10.15446/ing.investig.v26n3.14759Keywords:
feature selection, genetic algorithm, decision tree, the k nearest neighbor rule, relevancy (en)selección de características, algoritmos genéticos, árboles de decisión, k-vecinos más cercanos, relevancia (es)
Downloads
The present work proposes a hybrid feature selection model aimed at reducing training time whilst maintaining classification accuracy. The model includes adjusting a decision tree for producing feature subsets. Such subsets’ statistical relevance was evaluated from their resulting classification error. Evaluation involved using the k-nearest neighbors’ rule. Dimension reduction techniques usually assume an element of error; however, the hybrid selection model was tuned by means of genetic algorithms in this work. They simultaneously minimise the number of features and training error. Contrasting with conventional methods, this model also led to quantifying the relevance of each training set’s features. The model was tested on speech signals (hypernasality classification) and ECG identification (ischemic cardiopathy). In the case of speech signals, the database consisted of 90 children (45 recordings per sample); the ECG database had 100 electrocardiograph records (50 recordings per sample). Results showed average reduction rates of up to 88%, classification error being less than 6%.
En el artículo se propone un modelo híbrido de selección de características con el objeto de reducir la dimensión del espacio de entrenamiento, sin comprometer la precisión de clasificación. El modelo incluye la inducción de un árbol de decisión que genera subconjuntos de características, para las cuales seguidamente se evalúa su relevancia mediante el criterio del mínimo error de clasificación. El procedimiento de evaluación se desarrolla empleando la regla de los k-vecinos más cercanos. Usualmente, la reducción de espacios supone una cota de error de clasificación; sin embargo, en este trabajo la sintonización del modelo híbrido de selección se realiza usando algoritmos genéticos, con lo cual se obtiene de forma simultánea la minimización tanto del número de características de entrenamiento, como del error de clasificación. De manera adicional, a diferencia de las técnicas convencionales de selección, el modelo propuesto permite cuantificar el nivel de relevancia de cada característica perteneciente al conjunto reducido de entrenamiento. Las pruebas del modelo se realizan para la identificación de hipernasalidad, en el caso de voz, y cardiopatía isquémica, en el caso de registros de electrocardiografía. Las bases de datos corresponden a una población de 90 niños (45 registros por clase) y a 100 registros electrocardiográficos (50 por clase). Los resultados obtenidos muestran una efectividad promedio para la reducción del espacio de entrenamiento inicial hasta de un 88%, con una tasa promedio de error de clasificación inferior al 6%.
References
Back, T. y Shutz, M., Intelligent mutation rate control in canonical genetic algorithms: Lecture notes in artificial intelligence., 1996. DOI: https://doi.org/10.1007/3-540-61286-6_141
Bast, H., Dimension reduction: A powerful principle for automatically finding concepts in unstructured data., In proceedings of the international Workshop on Self-Properties in Complex Information Systems (SELF-STAR’04), 2004, pp 113-116.
Duda, R. O., Hart, P E. and Store, D. G., Pattern Classification., John Wiley & Sons, 2000.
De Jong, K., The Analysis of the Behavior of a Class of Genetic Adaptative Systems., Tesis presentada a la Universidad de Michigan, Ann Arbor, para optar por el título de Doctor of Philosophy, 1975.
Eiben, R., Hinterding, and Michalewichz, Z., Parameter control in evolutionary algorithms., IEEE Transactions on Evolutionary Computation, Vol. 3, No. 2, 1999, pp. 124-141. DOI: https://doi.org/10.1109/4235.771166
Grefenstette, J. J., Optimization of control parameters for genetic algorithms., IEEE Transactions on Systems, Man and Cybernetics, Vol. 16, No. 1, 1986, pp. 122-128. DOI: https://doi.org/10.1109/TSMC.1986.289288
Hong, J. H. and Cho, S. B., Efficient huge-scale feature selection with speciated genetic algorithm., PRL(27), No. 2, 15 January, 2006, pp. 143-150. DOI: https://doi.org/10.1016/j.patrec.2005.07.009
Jain, A. K., Duin, R. P W. and Mao, J., Statistical pattern recognition: a review., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, 2000, pp. 4-37. DOI: https://doi.org/10.1109/34.824819
Kim, K. M., Park, J. J., Song, M. H., Kim, I. C., and Suen, C. Y., Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip., En Canadian Al 2004, LNAI 3060, A.Y. Tawfik, S. D. Goodwin, editores. Springer-Verlag, Berlin Heidelberg, 2004, pp. 461-466. DOI: https://doi.org/10.1007/978-3-540-24840-8_38
Lee, C. S., Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems., Prentice- Hall, 1996.
Peña, D., Análisis de datos multivariantes., Mc Graw Hill, 2002.
Quinlan, J., Induction of decision trees., Machine Learning, Vol. 1, No. 1, 1986, pp. 81 - 106. DOI: https://doi.org/10.1007/BF00116251
Raymer, M. L., Punch, W. F, Goodman, E. D., Kuhn, L.A. and Jain, A. K., Dimensionality reduction using genetic algorithms., IEEE Transactions on Evolutionary Computation, Vol. 4, No.2, 2000, pp. 164-171. DOI: https://doi.org/10.1109/4235.850656
Yu, L. and Liu, H., Efficient feature selection via analysis of relevance and redundancy., Journal of Machine Learning Research, 5, 2004, pp. 1205-1224.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
CrossRef Cited-by
Dimensions
PlumX
Article abstract page views
Downloads
License
Copyright (c) 2006 Luis Felipe Giraldo, Edilson Delgado Trejos, Juan Carlos Riaño, Germán Castellanos Domínguez

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors or holders of the copyright for each article hereby confer exclusive, limited and free authorization on the Universidad Nacional de Colombia's journal Ingeniería e Investigación concerning the aforementioned article which, once it has been evaluated and approved, will be submitted for publication, in line with the following items:
1. The version which has been corrected according to the evaluators' suggestions will be remitted and it will be made clear whether the aforementioned article is an unedited document regarding which the rights to be authorized are held and total responsibility will be assumed by the authors for the content of the work being submitted to Ingeniería e Investigación, the Universidad Nacional de Colombia and third-parties;
2. The authorization conferred on the journal will come into force from the date on which it is included in the respective volume and issue of Ingeniería e Investigación in the Open Journal Systems and on the journal's main page (https://revistas.unal.edu.co/index.php/ingeinv), as well as in different databases and indices in which the publication is indexed;
3. The authors authorize the Universidad Nacional de Colombia's journal Ingeniería e Investigación to publish the document in whatever required format (printed, digital, electronic or whatsoever known or yet to be discovered form) and authorize Ingeniería e Investigación to include the work in any indices and/or search engines deemed necessary for promoting its diffusion;
4. The authors accept that such authorization is given free of charge and they, therefore, waive any right to receive remuneration from the publication, distribution, public communication and any use whatsoever referred to in the terms of this authorization.