Published
INTRODUCCIÓN A KERNEL ACP Y OTROS MÉTODOS ESPECTRALES APLICADOS AL APRENDIZAJE NO SUPERVISADO
INTRODUCTION TO KERNEL PCA AND OTHER SPECTRAL METHODS APPLIED TO UNSUPERVISED LEARNING
Keywords:
método kernel, análisis de conglomerados, teoría de grafos, selección de modelo (es)Kernel method, Cluster analysis, Model selection, Graph theory (en)
Downloads
1Universidad Nacional de Colombia, Facultad de Ciencias Exactas y Naturales, Departamento de Matemáticas y Estadística, Manizales, Colombia. Estudiante de maestría. Email: lgsanchezg@unal.edu.co
2Universidad Nacional de Colombia, Facultad de Ciencias Exactas y Naturales, Departamento de Matemáticas y Estadística, Manizales, Colombia. Estudiante de maestría. Email: gaosorioz@unal.edu.co
3Universidad Nacional de Colombia, Facultad de Ciencias Exactas y Naturales, Departamento de Matemáticas y Estadística, Manizales, Colombia. Profesor asociado. Email: jfsuarezc@unal.edu.co
En el presente trabajo, se introducen las técnicas de kernel ACP (KACP) y conglomeramiento espectral con algunos ejemplos ilustrativos. Se pretende estudiar los efectos de aplicar ACP como preproceso sobre las observaciones que se desean agrupar, para lo cual se hacen experimentos con datos reales. Entre las tareas adicionales que requieren estos procedimientos está la sintonización de parámetros (ajuste de valores); el alineamiento del kernel se presenta como alternativa de solución. La técnica de alineamiento del kernel presenta buenos resultados al contrastar las curvas de alineamiento con los índices de Rand obtenidos para los datos evaluados. Finalmente, el estudio muestra que el éxito de ACP depende del problema y que no se tiene un criterio general para decidir.
Palabras clave: método kernel, análisis de conglomerados, teoría de grafos, selección de modelo.
In this work, the techniques of Kernel Principal Component Analysis (Kernel PCA or KPCA) and Spectral Clustering are introduced along with some illustrative examples. This work focuses on studying the effects of applying PCA as a preprocessing stage for clustering data. Several tests are carried out on real data to establish the pertinence of including PCA. The use of these methods requires of additional procedures such as parameter tuning; the kernel alignment is presented as an alternative for it. The results of kernel alignment expose a high level of agreement between the tuning curves their respective Rand indexes. Finally, the study shows that the success of PCA is problem-dependent and no general criteria can be established.
Key words: Kernel method, Cluster analysis, Model selection, Graph theory.
Texto completo disponible en PDF
Referencias
1. Alzate, C. & Suykens, J. A. K. (2006), A Weighted Kernel PCA Formulation with Out-of-Sample Extensions for Spectral Clustering Methods, `International Joint Conference on Neural Networks´, p. 138-144.
2. Bengio, Y., Delalleau, O., Roux, N. L., Paiement, J. F., Vincent, P. & Ouimet, M. (2004), `Learning eigenfunctions links spectral embedding and kernel PCA´, Neural Computation 16(10), 2197-2219.
3. Bengio, Y., Vincent, P., Paiement, J. F., Delalleau, O., Ouimet, M. & Le Roux, N. (2003), Spectral Clustering and Kernel PCA are Learning Eigenfunctions, Département d'Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques, Université de Montréal.
4. Chung, F. R. K. (1997), Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92) (Cbms Regional Conference Series in Mathematics), American Mathematical Society.
5. Cristianini, N., Elisseeff, A., Shawe-Taylor, J. & Kandola, J. (2001), On kernel-target alignment, `Proceedings Neural Information Processing Systems´.
6. Cristianini, N., Shawe-Taylor, J. & Kandola, J. (2002), Spectral kernel methods for clustering, `Proceedings Neural Information Processing Systems´.
7. Jolliffe, I. T. (2002), Principal Component Analysis, Springer Series in Statistics, Second edn, Springer.
8. LeCun, Y. & Cortés, C. (2008), `The MNIST Database´. Tomado en octubre de 2007 de la página web. *http://yann.lecun.com/exdb/mnist/
9. Luxburg, U. V. (2006), A Tutorial on Spectral Clustering, Max-Planck-Institut für biologische Kybernetik.
10. Peña, D. (2002), Análisis de datos multivariantes, McGraw-Hill.
11. Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K. R., Ratsch, G. & Smola, A. J. (1999), `Input Space vs. Feature Space in Kernel-Based Methods´, IEEE Transactions on Neural Networks 10(5), 1000-1017.
12. Schölkopf, B. & Smola, A. (2002), Learning with Kernels Support Vector Machines, Regularization, Optimization and Beyond, MIT Press, Cambridge, United States.
13. Schölkopf, B., Smola, A. & Müller, K. R. (1996), Nonlinear Component Analysis as a Kernel Eigenvalue Problem, 44, Max-Planck-Institut für biologische Kybernetik.
14. Yeung, K. Y. & Ruzzo, W. L. (2000), An empirical study on Principal Component Analysis for Clustering Gene Expression Data, Department of Computer Science and Engineering, University of Washington.
Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:
@ARTICLE{RCEv31n1a02,
AUTHOR = {Sánchez, Luis Gonzalo and Osorio, Germán Augusto and Suárez, Julio Fernando},
TITLE = {{Introducción a kernel ACP y otros métodos espectrales aplicados al aprendizaje no supervisado}},
JOURNAL = {Revista Colombiana de Estadística},
YEAR = {2008},
volume = {31},
number = {1},
pages = {19-40}
}
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
Article abstract page views
Downloads
License
Copyright (c) 2008 Revista Colombiana de Estadística
This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).