Published

2021-07-06

Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics

Problema de selección de modelo completo y tuberías para bases de datos de series de tiempo: contrastando metaheurísticas basadas en población y de un solo punto de búsqueda

DOI:

https://doi.org/10.15446/ing.investig.v41n3.79308

Keywords:

Full Model Selection, Temporal Databases, Time-Series (en)
Selección del Modelo Completo, Bases de Datos Temporales, Series de Tiempo (es)

Downloads

Authors

The increasing production of temporal data, especially time series, has motivated valuable knowledge to understand phenomena or for decision-making. As the availability of algorithms to process data increases, the problem of choosing the most suitable one becomes more prevalent. This problem is known as the Full Model Selection (FMS), which consists of finding an appropriate set of methods and hyperparameter optimization to perform a set of structured tasks as a pipeline. Multiple approaches (based on metaheuristics) have been proposed to address this problem, in which automated pipelines are built for multitasking without much dependence on user knowledge. Most of these approaches propose pipelines to process non-temporal data. Motivated by this, this paper proposes an architecture for finding optimized pipelines for time-series tasks. A micro-differential evolution algorithm (µ-DE, population-based metaheuristic) with different variants and continuous encoding is compared against a local search (LS, single-point search) with binary and mixed encoding. Multiple experiments are carried out to analyze the performance of each approach in ten time-series databases. The final results suggest that the µ-DE approach with rand/1/bin variant is useful to find competitive pipelines without sacrificing performance, whereas a local search with binary encoding achieves the lowest misclassification error rates but has the highest computational cost during the training stage.

La creciente producción de datos temporales, especialmente de series de tiempo, ha motivado la extracción analítica de conocimiento valioso para comprender fenómenos o para la toma de decisiones. A medida que aumenta la disponibilidad de algoritmos para procesar datos, el problema de elegir el más adecuado se vuelve más frecuente. Este problema se conoce como la Selección del Modelo Completo (SMC), que consiste en encontrar un conjunto apropiado de métodos y la optimización de hiperparámetros para realizar un conjunto de tareas estructuradas como una tubería. Se han propuesto múltiples enfoques (basados en metaheurísticas) para abordar este problema, en los que se construyen tuberías automatizadas para realizar múltiples tareas sin mucha dependencia del conocimiento del usuario. La mayoría de estos enfoques proponen tuberías para procesar datos no temporales. Motivado por esto, este artículo propone una arquitectura para encontrar tuberías optimizadas para tareas de series de tiempo. El algoritmo de micro-Evolución Diferencial (µ-ED, metaheurística basada en población) con diferentes variantes y codificación continua, es comparado contra una búsqueda local (BL, búsqueda de un solo punto) con codificación binaria y mixta. Se realizan múltiples experimentos para analizar el rendimiento de cada enfoque en diez bases de datos de series de tiempo. Los resultados finales sugieren que el enfoque de µ-ED con la variante rand/1/bin es útil para encontrar tuberías competitivas sin sacrificar el rendimiento, mientras que la BL con codificación binaria logra las tasas de error de clasificación incorrecta más bajas, pero tiene el costo computacional más alto durante la etapa de entrenamiento.

References

Al-Jowder, O., Kemsley, E., and Wilson, R. H. (2002). Detection of adulteration in cooked meat products by mid-infrared spectroscopy. Journal of Agricultural and Food Chemistry, 50(6), 1325–1329. https://doi.org/10.1021/jf0108967

Ali, M., Alqahtani, A., Jones, M. W., and Xie, X. (2019). Clustering and classification for time series data in visual analytics: A survey. IEEE Access, 7, 181314–181338. https://doi.org/10.1109/ACCESS.2019.2958551

Aly, A., Guadagni, G., and Dugan, J. B. (2019). Derivativefree optimization of neural networks using local search. In IEEE (Eds.) 2019 IEEE 10th Annual Ubiquitous Computing, Electronics Mobile Communication Conference (UEMCON) (pp. 0293–0299). New York, NY: IEEE. https://doi.org/10.1109/UEMCON47517.2019.8993007

Bagnall, A., Davis, L., Hills, J., and Lines, J. (2012). Transformation based ensembles for time series classification. In SIAM (Eds.) Proceedings of the 2012 SIAM international conference on data mining (pp. 307– 318). Philadelphia, PA: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972825.27

Baijal, S., Singh, S., Rani, A., and Agarwal, S. (2016). Performance evaluation of s-golay and ma filter on the basis of white and flicker noise. In Proceedings of Second International Symposium on Signal Processing and Intelligent Recognition Systems (SIRS-2015) (pp. 245–255). New York, NY: Springer. https://doi.org/10.1007/978-3-319-28658-7_21

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(2), 281–305. https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a

Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., and Jones, Z. M. (2016). mlr: Machine learning in R. The Journal of Machine Learning Research, 17(170), 1–5. http://jmlr.org/papers/v17/15-066.html

Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer.

Boullé, N., Dallas, V., Nakatsukasa, Y., and Samaddar, D. (2020). Classification of chaotic time series with deep learning. Physica D: Nonlinear Phenomena, 403, 132261. https://doi.org/10.1016/j.physd.2019.132261

Buza, K., Nanopoulos, A., and Schmidt-Thieme, L. (2011). Insight: Efficient and effective instance selection for time-series classification. In Huang, J. Z., Cao, L., and Srivastava, J. (Eds.) Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 149–160). Heidelberg/Berlin, Germany: Springer DOI: https://doi.org/10.1007/978-3-642-20847-8_13

Caraffini, F., Neri, F., and Poikolainen, I. (2013). Microdifferential evolution with extra moves along the axes. In IEEE (Eds.) 2013 IEEE Symposium on Differential Evolution (SDE) (pp. 46–53). New York, NY: IEEE. https://doi.org/10.1109/SDE.2013.6601441

Cleveland, W. S. and Loader, C. (1996). Smoothing by local regression: Principles and methods. In Hardle, W., and Scmiek, M. G. (Eds.) Statistical Theory and Computational Aspects of Smoothing (pp. 10–49). Heidelberg, Germany: Physica-Verlag HD. https://doi.org/10.1007/978-3-642-48425-4_2

de Sa, A. G. C., Pinto, W. J. G. S., Oliveira, L. O. V. B., and Pappa, G. L. (2017). RECIPE: A grammar-based framework for automatically evolving classification pipelines. In McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., and García-Sánchez, P. (Eds.) European Conference on Genetic Programming (pp. 246-261), Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-55696-3_16

Díaz-Pacheco, A., Gonzalez-Bernal, J. A., Reyes-García, C. A., and Escalante-Balderas, H. J. (2018). Full model selection in big data. In Castro, F., Miranda-Jiménez, S., and González-Mendoza, M. (Eds.) Advances in Soft Computing (pp. 279–289). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-030-02837-4_23

Eads, D. R., Hill, D., Davis, S., Perkins, S. J., Ma, J., Porter, R. B., and Theiler, J. P. (2002). Genetic algorithms and support vector machines for time series classification. In Bosacchi, B., Fogel, D. B., and Bezdek, J. C. (Eds.) Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V (vol. 4787, pp. 74-85). Bellingham, WA: International Society for Optics and Photonics. https://doi.org/10.1117/12.453526

Escalante, H. J., Montes, M., and Sucar, E. (2010). Ensemble particle swarm model selection. In IEEE (Eds.)The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). New York, NY: IEEE. https://doi.org/10.1109/IJCNN.2010.5596915

Escalante, H. J., Montes, M., and Sucar, L. E. (2009). Particle swarm model selection. Journal of Machine Learning Research, 10(2), 405–440. http://jmlr.org/papers/v10/escalante09a.html

Esling, P. and Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR), 45(1), 1–12. https://doi.org/10.1145/2379776.2379788

Fu, T.-c. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1), 164–181. 10.1016/j.engappai.2010.09.00

Gantza, J. and Reisel, D. (2012). The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the Future, 2007(2012), 1–16. https://www.speicherguide.de/download/dokus/IDC-Digital-Universe-Studie-iView-11.12.pdf

Garcia, S., Derrac, J., Cano, J., and Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE transactions on pattern analysis and machine intelligence, 34(3), 417– 435. https://doi.org/10.1109/TPAMI.2011.142

García, S., Fernández, A., Luengo, J., and Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10), 2044–2064. https://doi.org/10.1016/j.ins.2009.12.010

Giron-Sierra, J. (2018). Digital Signal Processing with Matlab Examples, Volume 3: Model-Based Actions and Sparse Representation. Singapore: Springer Singapore.

Gong, Z., Chen, H., Yuan, B., and Yao, X. (2019). Multiobjective learning in the model space for time series classification. IEEE Transactions on Cybernetics, 49(3), 918–932. https://doi.org/10.1109/TCYB.2018.2789422

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. https://doi.org/10.1145/1656274.1656278

Hutter, F., Kotthoff, L., and Vanschoren, J. (2019). Automated Machine Learning: Methods, Systems, Challenges. New York, NY: Springer. https://doi.org/10.1007/978-3-030-05318-5

Jastrzebska, A. (2019). Time series classification through visual pattern recognition. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.12.012

Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3), 263–286. https://doi.org/10.1007/PL00011669

Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., and Ratanamahatana, C. A. (2011). The UCR Time Series Classification/Clustering Homepage. https://www.cs.ucr.edu/~eamonn/time_series_data/

Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007). Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2), 107–144. https://doi.org/10.1007/s10618-007-0064-z

Olguín-Carbajal, M., Herrera-Lozada, J. C., Sandoval-Gutierrez, J., Vasquez-Gomez, J. I., Serrano-Talamantes, J. F., Chavez-Estrada, F. A., Rivera-Zarate, I., and Hernandez- Boláos, M. (2019). A micro-differential evolution algorithm for continuous complex functions. IEEE Access, 7, 172783–172795. https://doi.org/10.1109/ACCESS.2019.2954296

Olson, R. S., Urbanowicz, R. J., Andrews, P. C., Lavender, N. A., Kidd, L. C., and Moore, J. H. (2016). Automating biomedical data science through tree-based pipeline optimization. In Squillero, G., and Burelli, P. (Eds.) European Conference on the Applications of Evolutionary Computation (pp. 123–137). Cham, Germany: Springer. https://doi.org/10.1007/978-3-319-31204-0_9

Olszewski, R. T. (2001). Generalized feature extraction for structural pattern recognition in time-series data (Doctoral thesis, Carnegie Mellon University, Pittsburgh, PA). https://apps.dtic.mil/sti/pdfs/ADA457624.pdf

Page, R. M., Lischeid, G., Epting, J., and Huggenberger, P. (2012). Principal component analysis of time series for identifying indicator variables for riverine groundwater extraction management. Journal of Hydrology, 432, 137– 144. https://doi.org/10.1016/j.jhydrol.2012.02.025

Parsopoulos, K. E. (2009). Cooperative micro-differential evolution for high-dimensional problems. In ACM (Eds.) GECCO ’09: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (pp. 531– 538). New York, NY: ACM. https://doi.org/10.1145/1569901.1569975

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., and Cournapeau, D. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.5555/1953048.2078195

Pérez-Castro, N., Acosta-Mesa, H., Mezura-Montes, E., and Cruz-Ramírez, N. (2015). Towards the full model selection in temporal databases by using microdifferential evolution. an empirical study. In IEEE (Eds.) 2015 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) (pp. 1–6). New York, NY: IEEE. https://doi.org/10.1109/ROPEC.2015.7395161

Rashid, A. and Hossain, M. A. (2012) Challenging issues of spatio-temporal data mining. Computer Engineering and Intelligent Systems, 3(4), 55–63. https://www.iiste.org/Journals/index.php/CEIS/article/view/1484

Ratanamahatana, C. A. and Keogh, E. (2005). Three myths about dynamic time warping data mining. In SIAM (Eds.) Proceedings of the 2005 SIAM International Conference on Data Mining (pp. 506–510). Philadelphia, PA: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972757.50

Rice, J. R. (1976). The algorithm selection problem. In Rubinoff, M. and Yovits, M. C. (Eds.) Advances in computers (vol. 15, pp. 65-118). Amsterdam, Netherlands: Elsevier. https://doi.org/10.1016/S0065-2458(08)60520-3

Rosales-Pérez, A., Escalante, H. J., Gonzalez, J. A., Reyes- Garcia, C. A., and Coello-Coello, C. A. (2013). Bias and variance multi-objective optimization for support vector machines model selection. In Sanches, J. a. M., Micó, l., and Cardoso, J. S. (Eds.) Iberian Conference on Pattern Recognition and Image Analysis (pp. 108-116). Berlin/Heidelberg, Germany: Springer. https://doi.org/10.1007/978-3-642-38628-2_12

Rosales-Pérez, A., Gonzalez, J. A., Coello-Coello, C. A., Escalante, H. J., and Reyes-Garcia, C. A. (2015). Surrogate-assisted multi-objective model selection for support vector machines. Neurocomputing, 150, 163– 172. https://doi.org/10.1016/j.neucom.2014.08.075

Rosales-Pérez, A., Gonzalez, J. A., Coello-Coello, C. A., Escalante, H. J., and Reyes-Garcia, C. A. (2014). Multiobjective model type selection. Neurocomputing, 146, 83–94. https://doi.org/10.1016/j.neucom.2014.05.077

Roverso, D. (2000). Multivariate temporal classification by windowed wavelet decomposition and recurrent neural networks. In ANS (Eds.) 3rd ANS international topical meeting on nuclear plant instrumentation, control and human-machine interface (vol. 20, pp. 527–538). La Grange Park, IL: American Nuclear Society

Rydning, D. R.-J. G.-J. (2018). The digitization of the world from edge to core. http://cloudcode.me/media/1014/idc.pdf

Saito, N. (2000). Local feature extraction and its applications using a library of bases. In Coifman, R. (Ed.) Topics in Analysis and Its Applications: Selected Theses (pp. 269- 451). https://doi.org/10.1142/9789812813305_0005

Salehinejad, H., Rahnamayan, S., and Tizhoosh, H. R. (2017). Micro-differential evolution: Diversity enhancement and a comparative study. Applied Soft Computing, 52, 812– 833 https://doi.org/10.1016/j.asoc.2016.09.042

Savitzky, A. and Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36(8), 1627–1639. https://doi.org/10.1021/ac60214a047

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1), 148–175. https://doi.org/10.1109/JPROC.2015.2494218

Sun, J., Yang, Y., Liu, Y., Chen, C., Rao, W., and Bai, Y. (2019). Univariate time series classification using information geometry. Pattern Recognition, 95, 24-35. https://doi.org/10.1016/j.patcog.2019.05.040

Sun, Q., Pfahringer, B., and Mayo, M. (2013). Towards a framework for designing full model selection and optimization systems. In Zhou, Z.-H., Roli, F., and Kittler, J. (Eds.) International Workshop on Multiple Classifier Systems (pp. 259-270). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38067-9_23

Talbi, E. (2009). Metaheuristics: From Design to Implementation. John Wiley & Sons. DOI: https://doi.org/10.1002/9780470496916

Viveros-Jiménez, F., Mezura-Montes, E., and Gelbukh, A. (2012). Empirical analysis of a micro-evolutionary algorithm for numerical optimization. International Journal of Physical Sciences, 7(8), 1235–1258. https://doi.org/10.5897/IJPS11.303

Yang, M., Li, C., Cai, Z., and Guan, J. (2015). Differential evolution with auto-enhanced population diversity. IEEE transactions on cybernetics, 45(2), 302–315. https://doi.org/10.1109/TCYB.2014.2339495

Yang, Y. (2017). Chapter 2 - temporal data mining. In Y. Yang (Ed.) Temporal Data Mining Via Unsupervised Ensemble Learning (pp. 9–18). Amsterdam, Netherlands: Elsevier. https://doi.org/10.1016/B978-0-12-811654-8.00002-6

Yu, T. and Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. https://arxiv.org/pdf/2003.05689

How to Cite

APA

Pérez-Castro, N., Acosta-Mesa, H. G., Mezura-Montes, E. & Cruz-Ramírez, N. (2021). Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics. Ingeniería e Investigación, 41(3), e79308. https://doi.org/10.15446/ing.investig.v41n3.79308

ACM

[1]
Pérez-Castro, N., Acosta-Mesa, H.G., Mezura-Montes, E. and Cruz-Ramírez, N. 2021. Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics. Ingeniería e Investigación. 41, 3 (May 2021), e79308. DOI:https://doi.org/10.15446/ing.investig.v41n3.79308.

ACS

(1)
Pérez-Castro, N.; Acosta-Mesa, H. G.; Mezura-Montes, E.; Cruz-Ramírez, N. Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics. Ing. Inv. 2021, 41, e79308.

ABNT

PÉREZ-CASTRO, N.; ACOSTA-MESA, H. G.; MEZURA-MONTES, E.; CRUZ-RAMÍREZ, N. Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics. Ingeniería e Investigación, [S. l.], v. 41, n. 3, p. e79308, 2021. DOI: 10.15446/ing.investig.v41n3.79308. Disponível em: https://revistas.unal.edu.co/index.php/ingeinv/article/view/79308. Acesso em: 16 mar. 2026.

Chicago

Pérez-Castro, Nancy, Héctor Gabriel Acosta-Mesa, Efrén Mezura-Montes, and Nicandro Cruz-Ramírez. 2021. “Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics”. Ingeniería E Investigación 41 (3):e79308. https://doi.org/10.15446/ing.investig.v41n3.79308.

Harvard

Pérez-Castro, N., Acosta-Mesa, H. G., Mezura-Montes, E. and Cruz-Ramírez, N. (2021) “Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics”, Ingeniería e Investigación, 41(3), p. e79308. doi: 10.15446/ing.investig.v41n3.79308.

IEEE

[1]
N. Pérez-Castro, H. G. Acosta-Mesa, E. Mezura-Montes, and N. Cruz-Ramírez, “Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics”, Ing. Inv., vol. 41, no. 3, p. e79308, May 2021.

MLA

Pérez-Castro, N., H. G. Acosta-Mesa, E. Mezura-Montes, and N. Cruz-Ramírez. “Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics”. Ingeniería e Investigación, vol. 41, no. 3, May 2021, p. e79308, doi:10.15446/ing.investig.v41n3.79308.

Turabian

Pérez-Castro, Nancy, Héctor Gabriel Acosta-Mesa, Efrén Mezura-Montes, and Nicandro Cruz-Ramírez. “Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics”. Ingeniería e Investigación 41, no. 3 (May 10, 2021): e79308. Accessed March 16, 2026. https://revistas.unal.edu.co/index.php/ingeinv/article/view/79308.

Vancouver

1.
Pérez-Castro N, Acosta-Mesa HG, Mezura-Montes E, Cruz-Ramírez N. Full Model Selection Problem and Pipelines for Time-Series Databases: Contrasting Population-Based and Single-point Search Metaheuristics. Ing. Inv. [Internet]. 2021 May 10 [cited 2026 Mar. 16];41(3):e79308. Available from: https://revistas.unal.edu.co/index.php/ingeinv/article/view/79308

Download Citation

CrossRef Cited-by

CrossRef citations2

1. María Mercedes Vidal-Ramírez, Nancy Pérez-Castro, Felipe Becerril Morales, Ariel López-Rodríguez, Tania Zúñiga-Marroquín, Sergio Fabián Ruíz-Paz, Gabriela Díaz-Félix†. (2023). Genetic Algorithm-driven Image Processing Pipeline for Classifying Three Bird Species: An Empirical Study of Two Encoding. 2023 Mexican International Conference on Computer Science (ENC). , p.1. https://doi.org/10.1109/ENC60556.2023.10508665.

2. Die Hu, Cheng Wang, Jianwei Chen. (2024). Intercity customized passenger transportation service plan optimization design with spatial-temporal accessibility based on BIRCH-VNS. Neural Computing and Applications, 36(21), p.13127. https://doi.org/10.1007/s00521-024-09759-y.

Dimensions

PlumX

Article abstract page views

857

Downloads

Download data is not yet available.

Similar Articles

You may also start an advanced similarity search for this article.