DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA

Freddy López

Published

2011-01-01

DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA

WHEREIN ARE SHOWN SOME RESULTS OF AUTORSHIP ATTRIBUTION TO CERVANTES’ WORK

Keywords:

análisis discriminante, árboles de clasificación, máquinas de aprendizaje, regla de Bayes, regresión logística, validación cruzada (es)
Bayes rule, Classification tree, Cross validation, Discriminant Analysis, Logistic regression, Machine learning (en)

Downloads

Authors

Freddy López Instituto Venezolano de Investigaciones Científicas

Abstract (es)
Abstract (en)

En este artículo se aplican algunos métodos de clasificación a un conjunto de textos con el objetivo de estudiar la probabilidad que el libro Novela de la tía fingida haya sido escrita por Miguel de Cervantes. Esta novela se le ha atribuido históricamente, pero existen algunas posiciones encontradas al respecto. Los métodos usados en este artículo contemplan: regresión logística, regresión logística aditiva, análisis discriminante lineal, cuadrático, regularizado, de mezclas y flexible, árboles de clasificación, método de los k-ésimos vecinos más cercanos, método de Bayes ingenuo y máquinas de soporte vectorial. Los métodos fueron calibrados y aplicados utilizando un corpus de autores contemporáneos a Cervantes (Lope de Vega, Jerónimo de Pasamonte, Alonso Fernández de Avellaneda, Mateo Alemán y Francisco de Quevedo) junto con más de cuarenta variables, principalmente palabras y signos de puntuación, medidas sobre muestras de los textos escritos por estos autores. Con respecto a estos métodos, la mayoría clasifica la obra como cervantina; sin embargo, es recomendable ampliar el corpus utilizado para el estudio e incluir más autores para la comparación.

In this paper, some classification methods are applied to a set of texts with the aim of studying the probability that the book Novela de la tía fingida has been written by Miguel de Cervantes. This novel has been historically attributed to him but there are some encountered positions about this. The methods used in this paper range from: logistic regression, additive logistic regression, linear, quadratic, regularized, mixture and flexible discriminant analysis, classification tree, k-nearest neighbour, Naive Bayes method and support vector machines. Methods were trained and applied using a corpus of authors contemporary to Cervantes as Lope de Vega, Jerónimo de Pasamonte, Alonso Fernández de Avellaneda, Mateo Alemán, and Francisco de Quevedo and more than forty variables, mainly words and punctuation marks, measured over written texts by these authors. Respect to these methods, most of them classify the novel as another Cervantes’ work; however, is our recommendation to include more texts from these authors and more authors.

Untitled Document Donde se muestran algunos resultados de atribución de autor en torno a la obra cervantina

Wherein are Shown some Results of Autorship Attribution to Cervantes' Work FREDDY LÓPEZ1

1Instituto Venezolano de Investigaciones Científicas, Departamento de Matemáticas, Estado Miranda, Venezuela. Estudiante de postgrado. Email:freddy.vate01@gmail.com

Resumen

En este artículo se aplican algunos métodos de clasificación a un conjunto de textos con el objetivo de estudiar la probabilidad que el libro Novela de la tía fingida haya sido escrita por Miguel de Cervantes. Esta novela se le ha atribuido históricamente, pero existen algunas posiciones encontradas al respecto. Los métodos usados en este artículo contemplan: regresión logística, regresión logística aditiva, análisis discriminante lineal, cuadrático, regularizado, de mezclas y flexible, árboles de clasificación, método de los k-ésimos vecinos más cercanos, método de Bayes ingenuo y máquinas de soporte vectorial.
Los métodos fueron calibrados y aplicados utilizando un corpus de autores contemporáneos a Cervantes (Lope de Vega, Jerónimo de Pasamonte, Alonso Fernández de Avellaneda, Mateo Alemán y Francisco de Quevedo) junto con más de cuarenta variables, principalmente palabras y signos de puntuación, medidas sobre muestras de los textos escritos por estos autores.
Con respecto a estos métodos, la mayoría clasifica la obra como cervantina; sin embargo, es recomendable ampliar el corpus utilizado para el estudio e incluir más autores para la comparación.

Palabras clave: análisis discriminante, árboles de clasificación, máquinas de aprendizaje, regla de Bayes, regresión logística, validación cruzada.

Abstract

In this paper, some classification methods are applied to a set of texts with the aim of studying the probability that the book Novela de la tía fingida has been written by Miguel de Cervantes. This novel has been historically attributed to him but there are some encountered positions about this. The methods used in this paper range from: logistic regression, additive logistic regression, linear, quadratic, regularized, mixture and flexible discriminant analysis, classification tree, k-nearest neighbour, Naive Bayes method and support vector machines.
Methods were trained and applied using a corpus of authors contemporary to Cervantes as Lope de Vega, Jerónimo de Pasamonte, Alonso Fernández de Avellaneda, Mateo Alemán, and Francisco de Quevedo and more than forty variables, mainly words and punctuation marks, measured over written texts by these authors.
Respect to these methods, most of them classify the novel as another Cervantes work; however, is our recommendation to include more texts from these authors and more authors.

Key words: Bayes rule, Classification tree, Cross validation, Discriminant analysis, Logistic regression, Machine learning.

Texto completo disponible en PDF

Referencias

1. Aylward, E. T. (1982), Cervantes: Pioneer and Plagiarist, Tamesis Books Limited, Londres, UK.

2. Baum, L. F. (2001), The Royal Book of Oz, Dover Publications, New York, States United. Escrito con colaboraci\'on' de R. Thompson.

3. Binongo, J. (2003), `Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution´, Chance 16(2), 9-17.

4. Bird, S., Klein, E. & Loper, E. (2009), Natural Language Processing with Python, O'Really, Sebastopol, States United.

5. Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), `Maximum likelihood from incomplete data via the EM algorithm´, Pattern Recognition 39, 1-38.

6. Gardner, M. (1998), Visitors from Oz: The Wild Adventures of Dorothy, the Scarecrow, and the Tin Woodman, St Martins Press, New York, States United.

7. Grieve, J. (2007), `Quantitative Authorship Attribution: an Evaluation of Techniques´, Literacy and Linguistic Computing 22(3), 251-270.

8. Hastie, T., Tibshirani, R. & Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 edn, Springer, New York, States United.

9. Hoover, D. L. (2002), `Multivariate Analysis and Study of Style Variation´, Literacy and Linguistic Computing18(4), 341-360.

10. Hosmer, D. & Lemeshow, S. (2000), Applied Logistic Regression, 2 edn, Wiley, New York, States United.

11. Jockers, M., Witten, D. & Criddle, C. (2008), `Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification´, Literacy and Linguistic Computing 23(4), 465-491.

12. Johnson, R. & Wichern, D. (1998), Applied Multivariate Statistical Analysis, Fourth edn, Prentice Hall, New York, States United.

13. Jolliffe, I. T. (2002), Principal Component Analysis, 2 edn, Springer, New York, States United.

14. Joula, P. (2006), `Authorship Attribution´, Foundations and Trends in Information Retrieval 1(3), 233-334.

15. Koppel, M., Schler, J. & Argamon, S. (2009), `Computational methods in authorship attribution´, Journal of the American Society for Information Science and Technology 60(1), 9-26.

16. Lebart, L., Morineau, A. & Warwick, K. (1984), Multivariate Descriptive Statistical Analysis, John Wiley & Sons, New York, States United.

17. Madrigal, J. L. (2003), `De c\'omo y por qu\'e La t\'\ia fingida es de Cervantes´, Artifara(2).

18. Rencher, A. (2002), Methods of Multivariate Analysis, Second edn, Wiley, New York, States United.

19. Ripley, B. D. (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, UK.

20. Tibshirani, R., Hastie, T., Narashimhan, B. & Chu, G. (2003), `Class prediction by nearest shrunken centroids with applications to DNA microarrays´, Statistical Science 18(1), 104-117.

21. Venables, W. N. & Ripley, B. D. (2002), Modern Applied Statistics with S, Fourth edn, Springer, New York, States United. *http://www.stats.ox.ac.uk/pub/MASS4

22. Witten, I. H. & Frank, E. (2005), Data Mining: Practical Machine Learning Tools and Techniques, 2 edn, Elsevier, San Francisco, States United.

23. Yu, B. (2008), `An evaluation of text classification methods for literacy studies´, Literacy and Linguistic Computing 23(3), 327-343.

[Recibido en abril de 2010. Aceptado en enero de 2011]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv34n1a02,
    AUTHOR = {López, Freddy},
    TITLE = {{Donde se muestran algunos resultados de atribución de autor en torno a la obra cervantina}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2011},
    volume = {34},
    number = {1},
    pages = {15-37}
}

How to Cite

APA

López, F. (2011). DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA. Revista Colombiana de Estadística, 34(1), 15–37. https://revistas.unal.edu.co/index.php/estad/article/view/29882

ACM

[1]

López, F. 2011. DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA. Revista Colombiana de Estadística. 34, 1 (Jan. 2011), 15–37.

ACS

(1)

López, F. DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA. Rev. colomb. estad. 2011, 34, 15-37.

ABNT

LÓPEZ, F. DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA. Revista Colombiana de Estadística, [S. l.], v. 34, n. 1, p. 15–37, 2011. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/29882. Acesso em: 12 may. 2026.

Chicago

López, Freddy. 2011. “DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA”. Revista Colombiana De Estadística 34 (1):15-37. https://revistas.unal.edu.co/index.php/estad/article/view/29882.

Harvard

López, F. (2011) “DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA”, Revista Colombiana de Estadística, 34(1), pp. 15–37. Available at: https://revistas.unal.edu.co/index.php/estad/article/view/29882 (Accessed: 12 May 2026).

IEEE

[1]

F. López, “DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA”, Rev. colomb. estad., vol. 34, no. 1, pp. 15–37, Jan. 2011.

MLA

López, F. “DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA”. Revista Colombiana de Estadística, vol. 34, no. 1, Jan. 2011, pp. 15-37, https://revistas.unal.edu.co/index.php/estad/article/view/29882.

Turabian

López, Freddy. “DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA”. Revista Colombiana de Estadística 34, no. 1 (January 1, 2011): 15–37. Accessed May 12, 2026. https://revistas.unal.edu.co/index.php/estad/article/view/29882.

Vancouver

1.

López F. DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA. Rev. colomb. estad. [Internet]. 2011 Jan. 1 [cited 2026 May 12];34(1):15-37. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/29882

Download Citation

Article abstract page views

370

Downloads

Download data is not yet available.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

	IBN Publindex
	El Índice Bibliográfico Nacional Publindex es un sistema colombiano para la clasificación, actualización, escalafonamiento y certificación de las publicaciones científicas y tecnológicas. Es regido por COLCIENCIAS y el ICFES en Colombia.
	SciELO Colombia
	SciELO Colombia es una librería virtual para América Latina, el Caribe, España y Portugal, fue creada por FAPESP en el año de 1997 en Sao Pablo Brasil, actualmente en Colombia es gestionada por la Universidad Nacional de Colombia.
	REDIB
	Portal donde se muestran las revistas electrónicas españolas y latinoamericanas de acceso abierto (Open Access). Fue creado en España.
	Scopus
	Scopus es una base de datos bibliográfica de resúmenes y citas de artículos de revistas científicas. Cubre aproximadamente 19.500 títulos de más de 5.000 editores internacionales, incluyendo la cobertura de de 16.500 revistas.
	Latindex
	Latindex es producto de la cooperación de una red de instituciones latinoamericanas que funcionan de manera coordinada para reunir y diseminar información bibliográfica sobre las publicaciones científicas seriadas producidas en la región.
	Dialnet
	Dialnet es un portal de difusión de la producción científica hispana que inició su funcionamiento en el año 2001 especializado en ciencias humanas y sociales. Su base de datos, de acceso libre, fue creada por la Universidad de La Rioja (España).
	Zentralblatt Math
	Zentralblatt MATH (zbMATH) es el servicio de resumen y revisión más completo y de más larga duración del mundo en matemática pura y aplicada. Está editado por la European Mathematical Society (EMS), la Academia de Ciencias y Humanidades de Heidelberg y FIZ Karlsruhe. El trabajo editorial lo realiza la oficina de Berlín de FIZ Karlsruhe que, como miembro de la Asociación Leibniz, es una empresa sin fines de lucro y una organización reconocida de interés público. zbMATH es distribuido por Springer Nature.

Revista Colombiana de Estadística

Published

DONDE SE MUESTRAN ALGUNOS RESULTADOS DE ATRIBUCIÓN DE AUTOR EN TORNO A LA OBRA CERVANTINA

WHEREIN ARE SHOWN SOME RESULTS OF AUTORSHIP ATTRIBUTION TO CERVANTES’ WORK

Keywords:

Downloads

Authors

How to Cite

APA

ACM

ACS

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

Article abstract page views

Downloads

License

Make a Submission

Information for Authors

Scimago Journal & Country Rank (SJR)

Keywords