Published

2025-08-22

Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean

Evaluación del algoritmo de subsecuencias conocidas (KSSA) para su uso óptimo en series de tiempo de pesca de atún en el Océano Pacífico

DOI:

https://doi.org/10.15446/ing.investig.115627

Keywords:

missing data, machine learning, imputation, data (en)
aprendizaje automático, imputación, datos faltantes, datos (es)

Downloads

Authors

An important limitation in fisheries research using time series is the presence of missing data. An inadequate handling of these data can negatively impact the results of statistical analyses, leading to erroneous decision-making. A potential solution to this problem is the estimation of missing values through imputation methods, but none of them can be universally applied to all time series. In fact, their effectiveness largely depends on the structure of the data and the distribution of the missing values. Recently, a solution to this problem was proposed which employs the known sub-sequence algorithm (KSSA), a machine learning tehnique designed to compare the performance of different imputation methods and validate them within the time series that contains missing data. However, due to its recent development, there is no published evidence regarding its efficiency and reliability. This research aimed to assess the efficiency of the KSSA on tuna fisheries data for the Pacific Ocean by imputing simulated missing data for seven time series with distinct structures. The results demonstrated that the algorithm is robust for accurately validating a wide combination of properties, such as length, seasonality, trend, autocorrelation structure, and percentage of missing data. Additionally, the algorithm’s hyperparameters can be easily adjusted to achieve optimal results for each time series.

Una limitación importante en la investigación pesquera que utiliza series temporales es la presencia de datos faltantes. Un manejo inadecuado de estos datos puede afectar negativamente los resultados de los análisis estadísticos, conllevando una toma de decisiones errónea. Una posible solución a este problema es la estimación de valores faltantes mediante métodos de imputación, pero ninguno de ellos puede aplicarse universalmente a todas las series temporales. De hecho, su eficacia depende en gran medida de la estructura de los datos y de la distribución de los valores faltantes. Recientemente, se propuso una solución a este problema que emplea el algoritmo de sub-secuencias conocidas (KSSA), una técnica de aprendizaje automático diseñada para comparar el rendimiento de diferentes métodos de imputación y validarlos dentro de la serie temporal que contiene datos faltantes. Sin embargo, debido a su desarrollo reciente, no hay evidencia publicada sobre su eficiencia y fiabilidad. Esta investigación tuvo por objetivo evaluar la eficiencia del KSSA en datos de pesca de atún para el Océano Pacífico mediante la imputación de datos faltantes simulados para siete series temporales con estructuras distintas. Los resultados demostraron que el algoritmo es robusto para validar con precisión una amplia combinación de propiedades como la longitud, la estacionalidad, la tendencia, la estructura de autocorrelación y el porcentaje de datos faltantes. Además, los hiperparámetros del algoritmo pueden ajustarse fácilmente para lograr resultados óptimos en cada serie temporal.

References

[1] F. Parra, “Estadística y machine learning con R,” 2019. https://bookdown.org/content/2274/series-temporales.html

[2] N. Bokde, M. W. Beck, F. M. Álvarez, and K. Kulat, “A novel imputation methodology for time series based on pattern sequence forecasting,” Pattern Recognit. Lett., vol. 116, pp. 88–96, 2018. https://doi.org/10.1016/j.patrec.2018.09.020

[3] E. A. Yamoah, U. A. Mueller, S. M. Taylor, and A. J. Fisher, “Missing data imputation of high-resolution temporal climate time series data,” Meteorol. Appl., vol. 27, no. 1, Jan. 2020. https://doi.org/10.1002/met.1873

[4] M. W. Beck, N. Bokde, G. Asencio-Cortés, and K. Kulat, “R package imputetestbench to compare imputation methods for univariate time series,” The R Journal, vol. 10, no. 1, pp. 218–233, 2018. https://doi.org/10.32614/rj-2018-024

[5] S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork, “Comparison of different Methods for Univariate Time Series Imputation in R,” 2015. https://doi.org/10.48550/arXiv.1510.03924

[6] H. Demirhan and Z. Renwick, “Missing value imputation for short to mid-term horizontal solar irradiance data,” Appl. Energy, vol. 225, pp. 998–1012, Sep. 2018. https://doi.org/10.1016/j.apenergy.2018.05.054

[7] I. F. Benavides, M. Santacruz, J. P. Romero-Leiton, C. Barreto, and J. J. Selvaraj, “kssa: Known sub-sequence algorithm,” Aquac. Fish., vol. 8, no. 5, pp. 587–599, Jun. 2022. https://doi.org/10.1016/J.AAF.2021.12.013

[8] J. Honaker et al., “What to do about missing values in time-series cross-section data,” Am. J. Polit. Sci., vol. 54, no. 2, pp. 561–581, 2010. DOI: https://doi.org/10.1111/j.1540-5907.2010.00447.x

[9] N. Golyandina and A. Korobeynikov, “Basic singular spectrum analysis and forecasting with R,” Comput. Stat. Data Anal., vol. 71, pp. 934–954, Mar. 2014. https://doi.org/10.1016/j.csda.2013.04.009

[10] I. F. Benavides, M. Santacruz, J. P. Romero-Leiton, C. Barreto, and J. J. Selvaraj, “Assessing methods for multiple imputation of systematic missing data in marine fisheries time series with a new validation algorithm,” Aquac. Fish., vol. 8, no. 5, pp. 587–599, Sep. 2023. https://doi.org/10.1016/J.AAF.2021.12.013

[11] T. Pohlert, “Non-parametric trend tests and change-point detection [R package trend version 1.1.5],” 2023, https://CRAN.R-project.org/package=trend

[12] R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The forecast package for R,” J. Stat. Softw., vol. 27, no. 3, pp. 1–22, July 2008. https://doi.org/10.18637/jss.v027.i03

[13] R. C. Team, “R: A language and environment for statistical computing,” 2022. https://www.r-project.org/

[14] D. Lüdecke, “sjmisc: Data and variable transformation functions,” J. Open Source Softw., vol. 3, no. 26, p. 754, Jun. 2018. https://doi.org/10.21105/JOSS.00754

[15] R. Tobias, “missMethods: Methods for Missing Data. R package version 0.3.0,” 2022. https://cran.r-project.org/web/packages/missMethods/index.html

[16] H. Ben and F. Michael, “Metrics: Evaluation metrics for machine learning. R package metrics version 0.1.4,” 2018, https://CRAN.R-project.org/package=Metrics

[17] W. Hadley, M. Evan, and S. Danny, “haven: Import and export ‘SPSS’, ‘Stata’ and ‘SAS’ files,” 2022, https://CRAN.R-project.org/package=haven

[18] P. Christen, “Data linkage: The big picture,” Harv. Data Sci. Rev., vol. 1, no. 2, p. 2019, Nov. 2019. https://doi.org/10.1162/99608F92.84DEB5C4

[19] S. Moritz, A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork, “Comparison of different methods for univariate time series imputation in R,” 2015. https://doi.org/10.48550/arXiv.1510.03924

[20] C. Yozgatligil, S. Aslan, C. Iyigun, and I. Batmaz, “Comparison of missing value imputation methods in time series: The case of Turkish meteorological data,” Theor. Appl. Climatol., vol. 112, Apr. 2012. https://doi.org/10.1007/s00704-012-0723-x

[21] N. Savarimuthu and S. Karesiddaiah, “An unsupervised neural network approach for imputation of missing values in univariate time series data,” Concurr. Comput. Pract. Exp., vol. 33, no. 9, art. e6156, 2021. https://doi.org/10.1002/cpe.6156

[22] R. Wei et al., “Missing value imputation approach for mass spectrometry-based metabolomics data,” Sci. Rep., vol. 8, no. 1, art. 663, Jan. 2018. https://doi.org/10.1038/s41598017-19120-0

How to Cite

APA

Gomez, J., Benavides, I. & Selvaraj, J. (2025). Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean. Ingeniería e Investigación, 45(1), e115627. https://doi.org/10.15446/ing.investig.115627

ACM

[1]
Gomez, J., Benavides, I. and Selvaraj, J. 2025. Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean. Ingeniería e Investigación. 45, 1 (Mar. 2025), e115627. DOI:https://doi.org/10.15446/ing.investig.115627.

ACS

(1)
Gomez, J.; Benavides, I.; Selvaraj, J. Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean. Ing. Inv. 2025, 45, e115627.

ABNT

GOMEZ, J.; BENAVIDES, I.; SELVARAJ, J. Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean. Ingeniería e Investigación, [S. l.], v. 45, n. 1, p. e115627, 2025. DOI: 10.15446/ing.investig.115627. Disponível em: https://revistas.unal.edu.co/index.php/ingeinv/article/view/115627. Acesso em: 25 dec. 2025.

Chicago

Gomez, Julian, Ivan Benavides, and John Selvaraj. 2025. “Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean”. Ingeniería E Investigación 45 (1):e115627. https://doi.org/10.15446/ing.investig.115627.

Harvard

Gomez, J., Benavides, I. and Selvaraj, J. (2025) “Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean”, Ingeniería e Investigación, 45(1), p. e115627. doi: 10.15446/ing.investig.115627.

IEEE

[1]
J. Gomez, I. Benavides, and J. Selvaraj, “Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean”, Ing. Inv., vol. 45, no. 1, p. e115627, Mar. 2025.

MLA

Gomez, J., I. Benavides, and J. Selvaraj. “Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean”. Ingeniería e Investigación, vol. 45, no. 1, Mar. 2025, p. e115627, doi:10.15446/ing.investig.115627.

Turabian

Gomez, Julian, Ivan Benavides, and John Selvaraj. “Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean”. Ingeniería e Investigación 45, no. 1 (March 31, 2025): e115627. Accessed December 25, 2025. https://revistas.unal.edu.co/index.php/ingeinv/article/view/115627.

Vancouver

1.
Gomez J, Benavides I, Selvaraj J. Evaluation of the Known Sub-Sequence Algorithm (KSSA) for Optimal Use in Time Series of Tuna Fishing in the Pacific Ocean. Ing. Inv. [Internet]. 2025 Mar. 31 [cited 2025 Dec. 25];45(1):e115627. Available from: https://revistas.unal.edu.co/index.php/ingeinv/article/view/115627

Download Citation

CrossRef Cited-by

CrossRef citations0

Dimensions

PlumX

Article abstract page views

145

Downloads

Download data is not yet available.