Feature selection using a genetic algorithm-based hybrid approach

Luis Felipe Giraldo; Edilson Delgado Trejos; Juan Carlos Riaño; Germán Castellanos Domínguez

doi:10.15446/ing.investig.v26n3.14759

Published

2006-09-01

Feature selection using a genetic algorithm-based hybrid approach

Selección de características usando modelo hibrido basado en algoritmos genéticos

DOI:

https://doi.org/10.15446/ing.investig.v26n3.14759

Keywords:

feature selection, genetic algorithm, decision tree, the k nearest neighbor rule, relevancy (en)
selección de características, algoritmos genéticos, árboles de decisión, k-vecinos más cercanos, relevancia (es)

Downloads

PDF (Español)

Authors

Luis Felipe Giraldo Universidad de los Andes
Edilson Delgado Trejos Universidad Nacional de Colombia
Juan Carlos Riaño Universidad Nacional de Colombia
Germán Castellanos Domínguez Universidad Nacional de Colombia

Abstract (en)
Abstract (es)

The present work proposes a hybrid feature selection model aimed at reducing training time whilst maintaining classification accuracy. The model includes adjusting a decision tree for producing feature subsets. Such subsets’ statistical relevance was evaluated from their resulting classification error. Evaluation involved using the k-nearest neighbors’ rule. Dimension reduction techniques usually assume an element of error; however, the hybrid selection model was tuned by means of genetic algorithms in this work. They simultaneously minimise the number of features and training error. Contrasting with conventional methods, this model also led to quantifying the relevance of each training set’s features. The model was tested on speech signals (hypernasality classification) and ECG identification (ischemic cardiopathy). In the case of speech signals, the database consisted of 90 children (45 recordings per sample); the ECG database had 100 electrocardiograph records (50 recordings per sample). Results showed average reduction rates of up to 88%, classification error being less than 6%.

En el artículo se propone un modelo híbrido de selección de características con el objeto de reducir la dimensión del espacio de entrenamiento, sin comprometer la precisión de clasificación. El modelo incluye la inducción de un árbol de decisión que genera subconjuntos de características, para las cuales seguidamente se evalúa su relevancia mediante el criterio del mínimo error de clasificación. El procedimiento de evaluación se desarrolla empleando la regla de los k-vecinos más cercanos. Usualmente, la reducción de espacios supone una cota de error de clasificación; sin embargo, en este trabajo la sintonización del modelo híbrido de selección se realiza usando algoritmos genéticos, con lo cual se obtiene de forma simultánea la minimización tanto del número de características de entrenamiento, como del error de clasificación. De manera adicional, a diferencia de las técnicas convencionales de selección, el modelo propuesto permite cuantificar el nivel de relevancia de cada característica perteneciente al conjunto reducido de entrenamiento. Las pruebas del modelo se realizan para la identificación de hipernasalidad, en el caso de voz, y cardiopatía isquémica, en el caso de registros de electrocardiografía. Las bases de datos corresponden a una población de 90 niños (45 registros por clase) y a 100 registros electrocardiográficos (50 por clase). Los resultados obtenidos muestran una efectividad promedio para la reducción del espacio de entrenamiento inicial hasta de un 88%, con una tasa promedio de error de clasificación inferior al 6%.

References

Back, T. y Shutz, M., Intelligent mutation rate control in canonical genetic algorithms: Lecture notes in artificial intelligence., 1996. DOI: https://doi.org/10.1007/3-540-61286-6_141

Bast, H., Dimension reduction: A powerful principle for automatically finding concepts in unstructured data., In proceedings of the international Workshop on Self-Properties in Complex Information Systems (SELF-STAR’04), 2004, pp 113-116.

Duda, R. O., Hart, P E. and Store, D. G., Pattern Classification., John Wiley & Sons, 2000.

De Jong, K., The Analysis of the Behavior of a Class of Genetic Adaptative Systems., Tesis presentada a la Universidad de Michigan, Ann Arbor, para optar por el título de Doctor of Philosophy, 1975.

Eiben, R., Hinterding, and Michalewichz, Z., Parameter control in evolutionary algorithms., IEEE Transactions on Evolutionary Computation, Vol. 3, No. 2, 1999, pp. 124-141. DOI: https://doi.org/10.1109/4235.771166

Grefenstette, J. J., Optimization of control parameters for genetic algorithms., IEEE Transactions on Systems, Man and Cybernetics, Vol. 16, No. 1, 1986, pp. 122-128. DOI: https://doi.org/10.1109/TSMC.1986.289288

Hong, J. H. and Cho, S. B., Efficient huge-scale feature selection with speciated genetic algorithm., PRL(27), No. 2, 15 January, 2006, pp. 143-150. DOI: https://doi.org/10.1016/j.patrec.2005.07.009

Jain, A. K., Duin, R. P W. and Mao, J., Statistical pattern recognition: a review., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, 2000, pp. 4-37. DOI: https://doi.org/10.1109/34.824819

Kim, K. M., Park, J. J., Song, M. H., Kim, I. C., and Suen, C. Y., Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip., En Canadian Al 2004, LNAI 3060, A.Y. Tawfik, S. D. Goodwin, editores. Springer-Verlag, Berlin Heidelberg, 2004, pp. 461-466. DOI: https://doi.org/10.1007/978-3-540-24840-8_38

Lee, C. S., Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems., Prentice- Hall, 1996.

Peña, D., Análisis de datos multivariantes., Mc Graw Hill, 2002.

Quinlan, J., Induction of decision trees., Machine Learning, Vol. 1, No. 1, 1986, pp. 81 - 106. DOI: https://doi.org/10.1007/BF00116251

Raymer, M. L., Punch, W. F, Goodman, E. D., Kuhn, L.A. and Jain, A. K., Dimensionality reduction using genetic algorithms., IEEE Transactions on Evolutionary Computation, Vol. 4, No.2, 2000, pp. 164-171. DOI: https://doi.org/10.1109/4235.850656

Yu, L. and Liu, H., Efficient feature selection via analysis of relevance and redundancy., Journal of Machine Learning Research, 5, 2004, pp. 1205-1224.

How to Cite

APA

Giraldo, L. F., Trejos, E. D., Riaño, J. C. & Castellanos Domínguez, G. (2006). Feature selection using a genetic algorithm-based hybrid approach. Ingeniería e Investigación, 26(3), 113–119. https://doi.org/10.15446/ing.investig.v26n3.14759

ACM

[1]

Giraldo, L.F., Trejos, E.D., Riaño, J.C. and Castellanos Domínguez, G. 2006. Feature selection using a genetic algorithm-based hybrid approach. Ingeniería e Investigación. 26, 3 (Sep. 2006), 113–119. DOI:https://doi.org/10.15446/ing.investig.v26n3.14759.

ACS

(1)

Giraldo, L. F.; Trejos, E. D.; Riaño, J. C.; Castellanos Domínguez, G. Feature selection using a genetic algorithm-based hybrid approach. Ing. Inv. 2006, 26, 113-119.

ABNT

GIRALDO, L. F.; TREJOS, E. D.; RIAÑO, J. C.; CASTELLANOS DOMÍNGUEZ, G. Feature selection using a genetic algorithm-based hybrid approach. Ingeniería e Investigación, [S. l.], v. 26, n. 3, p. 113–119, 2006. DOI: 10.15446/ing.investig.v26n3.14759. Disponível em: https://revistas.unal.edu.co/index.php/ingeinv/article/view/14759. Acesso em: 6 mar. 2026.

Chicago

Giraldo, Luis Felipe, Edilson Delgado Trejos, Juan Carlos Riaño, and Germán Castellanos Domínguez. 2006. “Feature selection using a genetic algorithm-based hybrid approach”. Ingeniería E Investigación 26 (3):113-19. https://doi.org/10.15446/ing.investig.v26n3.14759.

Harvard

Giraldo, L. F., Trejos, E. D., Riaño, J. C. and Castellanos Domínguez, G. (2006) “Feature selection using a genetic algorithm-based hybrid approach”, Ingeniería e Investigación, 26(3), pp. 113–119. doi: 10.15446/ing.investig.v26n3.14759.

IEEE

[1]

L. F. Giraldo, E. D. Trejos, J. C. Riaño, and G. Castellanos Domínguez, “Feature selection using a genetic algorithm-based hybrid approach”, Ing. Inv., vol. 26, no. 3, pp. 113–119, Sep. 2006.

MLA

Giraldo, L. F., E. D. Trejos, J. C. Riaño, and G. Castellanos Domínguez. “Feature selection using a genetic algorithm-based hybrid approach”. Ingeniería e Investigación, vol. 26, no. 3, Sept. 2006, pp. 113-9, doi:10.15446/ing.investig.v26n3.14759.

Turabian

Giraldo, Luis Felipe, Edilson Delgado Trejos, Juan Carlos Riaño, and Germán Castellanos Domínguez. “Feature selection using a genetic algorithm-based hybrid approach”. Ingeniería e Investigación 26, no. 3 (September 1, 2006): 113–119. Accessed March 6, 2026. https://revistas.unal.edu.co/index.php/ingeinv/article/view/14759.

Vancouver

1.

Giraldo LF, Trejos ED, Riaño JC, Castellanos Domínguez G. Feature selection using a genetic algorithm-based hybrid approach. Ing. Inv. [Internet]. 2006 Sep. 1 [cited 2026 Mar. 6];26(3):113-9. Available from: https://revistas.unal.edu.co/index.php/ingeinv/article/view/14759

Download Citation

CrossRef Cited-by

0

Dimensions

PlumX

Article abstract page views

422

Downloads

Download data is not yet available.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The authors or holders of the copyright for each article hereby confer exclusive, limited and free authorization on the Universidad Nacional de Colombia's journal Ingeniería e Investigación concerning the aforementioned article which, once it has been evaluated and approved, will be submitted for publication, in line with the following items:

1. The version which has been corrected according to the evaluators' suggestions will be remitted and it will be made clear whether the aforementioned article is an unedited document regarding which the rights to be authorized are held and total responsibility will be assumed by the authors for the content of the work being submitted to Ingeniería e Investigación, the Universidad Nacional de Colombia and third-parties;

2. The authorization conferred on the journal will come into force from the date on which it is included in the respective volume and issue of Ingeniería e Investigación in the Open Journal Systems and on the journal's main page (https://revistas.unal.edu.co/index.php/ingeinv), as well as in different databases and indices in which the publication is indexed;

3. The authors authorize the Universidad Nacional de Colombia's journal Ingeniería e Investigación to publish the document in whatever required format (printed, digital, electronic or whatsoever known or yet to be discovered form) and authorize Ingeniería e Investigación to include the work in any indices and/or search engines deemed necessary for promoting its diffusion;

4. The authors accept that such authorization is given free of charge and they, therefore, waive any right to receive remuneration from the publication, distribution, public communication and any use whatsoever referred to in the terms of this authorization.

	IBN Publindex El Índice Bibliográfico Nacional Publindex es un sistema colombiano para la clasificación, actualización, escalafonamiento y certificación de las publicaciones científicas y tecnológicas. Es regido por COLCIENCIAS y el ICFES en Colombia.
	Directory of Open Access Journals DOAJ aumenta la visibilidad y la facilidad de uso de las revistas científicas y académicas de acceso abierto, pretende ser global y abarcar todas las revistas que utilizan un sistema de control de calidad para garantizar el contenido.
	SciELO Colombia SciELO Colombia es una librería virtual para América Latina, el Caribe, España y Portugal, fue creada por FAPESP en el año de 1997 en Sao Pablo Brasil, actualmente en Colombia es gestionada por la Universidad Nacional de Colombia.
	REDIB Portal donde se muestran las revistas electrónicas españolas y latinoamericanas de acceso abierto (Open Access). Fue creado en España.
	Science Citation Index Expanded^TM SCI es un prestigio sistema de indexación en línea que incorpora información bibliográfica y de citación de publicaciones científicas alrededor del mundo.
	Scopus Scopus es una base de datos bibliográfica de resúmenes y citas de artículos de revistas científicas. Cubre aproximadamente 19.500 títulos de más de 5.000 editores internacionales, incluyendo la cobertura de de 16.500 revistas.
	Latindex Latindex es producto de la cooperación de una red de instituciones latinoamericanas que funcionan de manera coordinada para reunir y diseminar información bibliográfica sobre las publicaciones científicas seriadas producidas en la región.
	Dialnet Dialnet es un portal de difusión de la producción científica hispana que inició su funcionamiento en el año 2001 especializado en ciencias humanas y sociales. Su base de datos, de acceso libre, fue creada por la Universidad de La Rioja (España).
see more

Published

Feature selection using a genetic algorithm-based hybrid approach

Selección de características usando modelo hibrido basado en algoritmos genéticos

DOI:

Keywords:

Downloads

Authors

References

How to Cite

APA

ACM

ACS

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

CrossRef Cited-by

Dimensions

PlumX

Article abstract page views

Downloads

License

Most read articles by the same author(s)

Make a Submission

Guide for authors

Peer-review process

Ethics

Journal Citation Reports™

Scimago Journal & Country Rank - SJR

Keywords

Journal Citation Reports^™