Published
Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean
Evaluación del algoritmo de subsecuencias conocidas (KSSA) para su uso óptimo en series de tiempo de pesca de atún en el Océano Pacífico
DOI:
https://doi.org/10.15446/ing.investig.115627Keywords:
missing data, machine learning, imputation, data (en)aprendizaje automático, imputación, datos faltantes, datos (es)
Downloads
An important limitation in fisheries research using time series is the presence of missing data. An inadequate handling of these data can negatively impact the results of statistical analyses, leading to erroneous decision-making. A potential solution to this problem is the estimation of missing values through imputation methods, but none of them can be universally applied to all time series. In fact, their effectiveness largely depends on the structure of the data and the distribution of the missing values. Recently, a solution to this problem was proposed which employs the known sub-sequence algorithm (KSSA), a machine learning tehnique designed to compare the performance of different imputation methods and validate them within the time series that contains missing data. However, due to its recent development, there is no published evidence regarding its efficiency and reliability. This research aimed to assess the efficiency of the KSSA on tuna fisheries data for the Pacific Ocean by imputing simulated missing data for seven time series with distinct structures. The results demonstrated that the algorithm is robust for accurately validating a wide combination of properties, such as length, seasonality, trend, autocorrelation structure, and percentage of missing data. Additionally, the algorithm’s hyperparameters can be easily adjusted to achieve optimal results for each time series.
Una limitación importante en la investigación pesquera que utiliza series temporales es la presencia de datos faltantes. Un manejo inadecuado de estos datos puede afectar negativamente los resultados de los análisis estadísticos, conllevando una toma de decisiones errónea. Una posible solución a este problema es la estimación de valores faltantes mediante métodos de imputación, pero ninguno de ellos puede aplicarse universalmente a todas las series temporales. De hecho, su eficacia depende en gran medida de la estructura de los datos y de la distribución de los valores faltantes. Recientemente, se propuso una solución a este problema que emplea el algoritmo de sub-secuencias conocidas (KSSA), una técnica de aprendizaje automático diseñada para comparar el rendimiento de diferentes métodos de imputación y validarlos dentro de la serie temporal que contiene datos faltantes. Sin embargo, debido a su desarrollo reciente, no hay evidencia publicada sobre su eficiencia y fiabilidad. Esta investigación tuvo por objetivo evaluar la eficiencia del KSSA en datos de pesca de atún para el Océano Pacífico mediante la imputación de datos faltantes simulados para siete series temporales con estructuras distintas. Los resultados demostraron que el algoritmo es robusto para validar con precisión una amplia combinación de propiedades como la longitud, la estacionalidad, la tendencia, la estructura de autocorrelación y el porcentaje de datos faltantes. Además, los hiperparámetros del algoritmo pueden ajustarse fácilmente para lograr resultados óptimos en cada serie temporal.
References
[1] F. Parra, “Estadística y machine learning con R,” 2019. https://bookdown.org/content/2274/series-temporales.html
[2] N. Bokde, M. W. Beck, F. M. Álvarez, and K. Kulat, “A novel imputation methodology for time series based on pattern sequence forecasting,” Pattern Recognit. Lett., vol. 116, pp. 88–96, 2018. https://doi.org/10.1016/j.patrec.2018.09.020
[3] E. A. Yamoah, U. A. Mueller, S. M. Taylor, and A. J. Fisher, “Missing data imputation of high-resolution temporal climate time series data,” Meteorol. Appl., vol. 27, no. 1, Jan. 2020. https://doi.org/10.1002/met.1873
[4] M. W. Beck, N. Bokde, G. Asencio-Cortés, and K. Kulat, “R package imputetestbench to compare imputation methods for univariate time series,” The R Journal, vol. 10, no. 1, pp. 218–233, 2018. https://doi.org/10.32614/rj-2018-024
[5] S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork, “Comparison of different Methods for Univariate Time Series Imputation in R,” 2015. https://doi.org/10.48550/arXiv.1510.03924
[6] H. Demirhan and Z. Renwick, “Missing value imputation for short to mid-term horizontal solar irradiance data,” Appl. Energy, vol. 225, pp. 998–1012, Sep. 2018. https://doi.org/10.1016/j.apenergy.2018.05.054
[7] I. F. Benavides, M. Santacruz, J. P. Romero-Leiton, C. Barreto, and J. J. Selvaraj, “kssa: Known sub-sequence algorithm,” Aquac. Fish., vol. 8, no. 5, pp. 587–599, Jun. 2022. https://doi.org/10.1016/J.AAF.2021.12.013
[8] J. Honaker et al., “What to do about missing values in time-series cross-section data,” Am. J. Polit. Sci., vol. 54, no. 2, pp. 561–581, 2010. DOI: https://doi.org/10.1111/j.1540-5907.2010.00447.x
[9] N. Golyandina and A. Korobeynikov, “Basic singular spectrum analysis and forecasting with R,” Comput. Stat. Data Anal., vol. 71, pp. 934–954, Mar. 2014. https://doi.org/10.1016/j.csda.2013.04.009
[10] I. F. Benavides, M. Santacruz, J. P. Romero-Leiton, C. Barreto, and J. J. Selvaraj, “Assessing methods for multiple imputation of systematic missing data in marine fisheries time series with a new validation algorithm,” Aquac. Fish., vol. 8, no. 5, pp. 587–599, Sep. 2023. https://doi.org/10.1016/J.AAF.2021.12.013
[11] T. Pohlert, “Non-parametric trend tests and change-point detection [R package trend version 1.1.5],” 2023, https://CRAN.R-project.org/package=trend
[12] R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The forecast package for R,” J. Stat. Softw., vol. 27, no. 3, pp. 1–22, July 2008. https://doi.org/10.18637/jss.v027.i03
[13] R. C. Team, “R: A language and environment for statistical computing,” 2022. https://www.r-project.org/
[14] D. Lüdecke, “sjmisc: Data and variable transformation functions,” J. Open Source Softw., vol. 3, no. 26, p. 754, Jun. 2018. https://doi.org/10.21105/JOSS.00754
[15] R. Tobias, “missMethods: Methods for Missing Data. R package version 0.3.0,” 2022. https://cran.r-project.org/web/packages/missMethods/index.html
[16] H. Ben and F. Michael, “Metrics: Evaluation metrics for machine learning. R package metrics version 0.1.4,” 2018, https://CRAN.R-project.org/package=Metrics
[17] W. Hadley, M. Evan, and S. Danny, “haven: Import and export ‘SPSS’, ‘Stata’ and ‘SAS’ files,” 2022, https://CRAN.R-project.org/package=haven
[18] P. Christen, “Data linkage: The big picture,” Harv. Data Sci. Rev., vol. 1, no. 2, p. 2019, Nov. 2019. https://doi.org/10.1162/99608F92.84DEB5C4
[19] S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork, “Comparison of different methods for univariate time series imputation in R,” 2015. https://doi.org/10.48550/arXiv.1510.03924
[20] C. Yozgatligil, S. Aslan, C. Iyigun, and I. Batmaz, “Comparison of missing value imputation methods in time series: The case of Turkish meteorological data,” Theor. Appl. Climatol., vol. 112, Apr. 2012. https://doi.org/10.1007/s00704-012-0723-x
[21] N. Savarimuthu and S. Karesiddaiah, “An unsupervised neural network approach for imputation of missing values in univariate time series data,” Concurr. Comput. Pract. Exp., vol. 33, no. 9, art. e6156, 2021. https://doi.org/10.1002/cpe.6156
[22] R. Wei et al., “Missing value imputation approach for mass spectrometry-based metabolomics data,” Sci. Rep., vol. 8, no. 1, art. 663, Jan. 2018. https://doi.org/10.1038/s41598017-19120-0
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License
Copyright (c) 2025 Julian Gomez, Ivan Benavides, John Selvaraj

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors or holders of the copyright for each article hereby confer exclusive, limited and free authorization on the Universidad Nacional de Colombia's journal Ingeniería e Investigación concerning the aforementioned article which, once it has been evaluated and approved, will be submitted for publication, in line with the following items:
1. The version which has been corrected according to the evaluators' suggestions will be remitted and it will be made clear whether the aforementioned article is an unedited document regarding which the rights to be authorized are held and total responsibility will be assumed by the authors for the content of the work being submitted to Ingeniería e Investigación, the Universidad Nacional de Colombia and third-parties;
2. The authorization conferred on the journal will come into force from the date on which it is included in the respective volume and issue of Ingeniería e Investigación in the Open Journal Systems and on the journal's main page (https://revistas.unal.edu.co/index.php/ingeinv), as well as in different databases and indices in which the publication is indexed;
3. The authors authorize the Universidad Nacional de Colombia's journal Ingeniería e Investigación to publish the document in whatever required format (printed, digital, electronic or whatsoever known or yet to be discovered form) and authorize Ingeniería e Investigación to include the work in any indices and/or search engines deemed necessary for promoting its diffusion;
4. The authors accept that such authorization is given free of charge and they, therefore, waive any right to receive remuneration from the publication, distribution, public communication and any use whatsoever referred to in the terms of this authorization.










