CMIN - a CRISP-DM-based case tool for supporting data mining projects
CMIN - herramienta case basada en CRISP-DM para el soporte de proyectos de minería de datos
DOI:
https://doi.org/10.15446/ing.investig.v30n3.18177Keywords:
Data mining, CRISP-DM, CASE tools, workflow, reflection. (en)minería de datos, CRISP-DM, herramientas CASE, workflow, reflexión (es)
Downloads
This paper introduces CMIN, an integrated computer aided software engineering (CASE) tool based on cross-industry standard process for data mining (CRISP-DM) 1.0 designed to support carrying out data mining projects. It is “integrated” in the sense that it supports all phases of a process. A general overview of how CMIN works is presented first, including a treatment of processes, templates and project management. CMIN´s capacity for easily and intuitively monitoring projects is highlighted, as is the manner in which CMIN allows a user to increase knowledge regarding using CRISP-DM or any other process defined in the CASE tool through the help and information presented in each step. Next, it is shown how CMIN can bind new data mining algorithms in runtime (without the need to recompile the tool) to support modelling tasks (based on a Workflow) and evaluate data mining projects. Finally, the results of two evaluations of the tool, some conclusions and suggestions for future work are presented.
En este artículo se presenta la CMIN, una herramienta CASE (Computer Aided Software Engineering) integrada (que soporta todas las fases de un proceso) basada en CRISP-DM 1.0 (Cross - Industry Standard Process for Data Mining) para soportar el desarrollo de proyectos de minería de datos. Primero se expone la funcionalidad general de CMIN, lo que incluye la gestión de procesos, plantillas y proyectos, y se destaca la capacidad de CMIN para realizar el seguimiento de los proyectos de una forma fácil e intuitiva y la manera como CMIN posibilita que el usuario incremente su conocimiento en el uso de CRISP-DM o de cualquier otro proceso que se defina en la herramienta a través de las ayudas e información que se ofrece en cada paso del proceso. Después, se detalla cómo CMIN permite enlazar en tiempo de ejecución (sin necesidad de volver a compilar la herramienta) nuevos algoritmos de minería de datos que apoyen la labor de modelado (basada en un flujo de trabajo o workflow) en un proyecto de minería de datos. Finalmente, se ofrecen los resultados de dos evaluaciones de la herramienta, las conclusiones y el trabajo futuro.
Downloads
References
Asuncion, A., Newman, D. J., UCI Machine Learning Repository 2008., 2007. from http://www.ics.uci.edu/~mlearn/ML Repository.html
Borges de Barros Pereira, H. Análisis experimental de los criterios de evaluación de usabilidad de aplicaciones multimedia en entornos de educación y formación a distancia Unpublished Doctoral., Universitat Politecnica de Catalunya, Barcelona, 2002.
Britos, P., Fernández, E., Ochoa, M., Merlino, H., Diez, E., García, R., Metodología de Selección de Herramientas de Explotación de Datos., Paper presented at the II Workshop de Ingeniería del Software y Bases de Datos. XI Congreso Argentino de Ciencias de la Computación, 2005.
CRISP-DM., CRoss Industry Standard Process for Data Mining., 2006. from http://www.crisp-dm.org/
Chand, M., Creating C# Class Library (DLL) Using Visual Studio .NET [Electronic Version]., C# Corner, (2000). from http://www.c-harpcorner.com/UploadFile/mahesh/dll12222005064058AM/dll.aspx
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., CRISP-DM 1.0: Step-by-step data mining guide: CRISP-DM Consortium., 2000.
Gondar Nores, J.-E., Metodologías para la Realización de Proyectos de Data Mining [Electronic Version]., 2004. from http://www.estadistico.com/arts.html?20040426
Holmes, G., Donkin, A., Witten, I. H., WEKA: a machine learning workbench., Paper presented at the Intelligent Information Systems,1994., Proceedings of the 1994 Second Australian and New Zealand Conference on, 1994.
INEI., Herramientas CASE. Lima, Perú: Instituto Nacional de Estadística e Informática., 1999.
Insightful-Corporation., Insightful Miner., from http://www.insightful.com/products/iminer/default.asp
Kdnuggets., Tools data mining., 2005. from http://www.kdnuggets.com/polls/2005/data_mining_tools.htm
Khabaza, T., Shearer, C., Data mining with Clementine., Paper presented at the Knowledge Discovery in Databases, [IEE Colloquium on], 1995.
Mai, C. K., Krishna, I. V. M., Reddy, A. V. Polyanalyst application for forest data mining., Paper presented at the Geoscience and Remote Sensing Symposium, 2005, IGARSS '05. Proceedings. 2005 IEEE International, 2005.
Megaputer., PolyAnalyst 6.0 - simplify your analytics., 2009. from http://www.megaputer.com/
MetaGroup., METAspectrum Market Summary., 2004. from http://www.oracle.com/technology/products/bi/odm/pdf/odm_metaspectrum_1004.pdf
Microsoft-Corporation., interface (C# Reference), 2009a. from http://msdn.microsoft.com/en-us/library/87d83y5b.aspx
Microsoft-Corporation., Reflection Overview [Electronic Version]. .NET Framework Developer's Guide., 2009b. from http://msdn.microsoft.com/en-us/library/f7ykdhsy.aspx
Miren Begoña, A.-R., A retrospective view of CASE tools adoption., SIGSOFT Softw. Eng. Notes, 25(2), 2000, pp. 46-50.
Rippa, S., Lendyuk, T. Selection of Alternative Projects Using Data Mining., Paper presented at the 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS, 2007.
Salford-System., Classification and Regression Trees (CART)., 2009. from http://www.salfordsystems.com/cart.php
SAS., Data mining with SAS® Enterprise Miner., 2009a. from http://www.sas.com/technologies/analytics/datamining/miner/
SAS. SAS Enterprise Miner - SEMMA., 2009b. from http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.html
SPSS-Inc., Clementine., 2009. from http://www.spss.com/es/clementine/
University-of-Waikato., Weka 3: Data Mining Software in Java., 2009. from http://www.cs.waikato.ac.nz/ml/weka/
License
Copyright (c) 2010 Carlos Cobos, Jhon Zuñiga, Juan Guarin, Elizabeth León, Martha Mendoza

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors or holders of the copyright for each article hereby confer exclusive, limited and free authorization on the Universidad Nacional de Colombia's journal Ingeniería e Investigación concerning the aforementioned article which, once it has been evaluated and approved, will be submitted for publication, in line with the following items:
1. The version which has been corrected according to the evaluators' suggestions will be remitted and it will be made clear whether the aforementioned article is an unedited document regarding which the rights to be authorized are held and total responsibility will be assumed by the authors for the content of the work being submitted to Ingeniería e Investigación, the Universidad Nacional de Colombia and third-parties;
2. The authorization conferred on the journal will come into force from the date on which it is included in the respective volume and issue of Ingeniería e Investigación in the Open Journal Systems and on the journal's main page (https://revistas.unal.edu.co/index.php/ingeinv), as well as in different databases and indices in which the publication is indexed;
3. The authors authorize the Universidad Nacional de Colombia's journal Ingeniería e Investigación to publish the document in whatever required format (printed, digital, electronic or whatsoever known or yet to be discovered form) and authorize Ingeniería e Investigación to include the work in any indices and/or search engines deemed necessary for promoting its diffusion;
4. The authors accept that such authorization is given free of charge and they, therefore, waive any right to receive remuneration from the publication, distribution, public communication and any use whatsoever referred to in the terms of this authorization.










