Recibido: 15 de enero de 2023; Aceptado: 12 de abril de 2023; : 1 de julio de 2023
Rhythm-Based Authorship Recognition in Syllabic and Accentual-Syllabic Verse
Reconocimiento de autoría basado en el ritmo en verso silábico y silábico-acentual
Reconhecimento da autoria baseado no ritmo em verso silábico e acentual-silábico
Abstract
This contribution explores the extent to which rhythm-based features of poetic texts can contribute meaningfully to authorship recognition. We show that, although a binary categorization of languages as syllabic vs. accentual-syllabic fails to fully explain the differences. However, once we formalize accentual regularity as a continuum, our analysis shows that authorship attribution results improve as we move from the most to the least accentually regular languages. This result supports our hypothesis that accentual regularity is an inhibiting factor in authorship attribution as long as accentual regularity is understood as a continuous property.
Keywords:
poetry, authorship recognition, poetic rhythm, versification.Resumen
Este trabajo explora en qué medida las características rítmicas de los textos poéticos pueden contribuir al reconocimiento de la autoría. Se demuestra, en primer lugar, que una categorización de las lenguas como silábicas o silábico-acentuales no explica completamente las diferencias. Sin embargo, nuestros análisis muestran que, si la regularidad acentual se formaliza como una continuidad, el reconocimiento de autoría mejora cuando pasamos de las lenguas más regularmente acentuadas a las menos. Los resultados apoyan la hipótesis según la cual la regularidad acentual es un factor inhibidor en el reconocimiento de autoría si se le considera como una característica continua.
Palabras clave:
poesía, reconocimiento de autoría, ritmo poético, versificación.Resumo
Esta contribuição explora em que medida as características rítmicas dos textos poéticos podem contribuir de forma significativa para o reconhecimento da autoria. Mostramos que embora uma categorização binária das línguas como silábicas versus acentual-silábicas não consiga explicar completamente as diferenças, uma vez que formalizamos a regularidade acentual como um continuum, a nossa análise mostra que os resultados da atribuição de autoria melhoram à medida que passamos das línguas mais regulares acentualmente para as menos regulares. Este resultado apoia a nossa hipótese de que a regularidade acentual é um fator inibidor na atribuição da autoria, desde que a regularidade acentual seja entendida como uma propriedade contínua.
Palavras-chave:
poesia, reconhecimento da autoria, ritmo poético, versificação.Introduction
In recent years, the use of rhythm-based features in authorship recognition of poetic texts has been studied thoroughly in several languages, including Czech, German, Spanish (Plecháč; Plecháč & Birnbaum), Portuguese (Mittmann), Latin (Nagy), Old English (Neidorf et al.), and Russian (Šeļa; Orekhov). In this contribution we expand our analysis of the relationship between versification type and authorship recognition accuracy provided in Plecháč & Birnbaum by performing an experiment with poetic texts in six languages: Czech (CS), German (DE), English (EN), Spanish (ES), Italian (IT), and Russian (RU).
Method
From each corpus we extract one or more subcorpora containing 11-syllable lines (except for EN, where we use 10-syllable iambic pentameter) by five authors born in a specific time span (see Table 1). Ten 100-line samples are drawn at random for each author and each sample is represented by bitstrings that encode the stressed and unstressed syllables in particular lines (e.g., The curfew tolls the knell of parting day ~ 0101010101). Leave-one-out cross-validation is performed to evaluate Support Vector Machine models (linear kernel) for bitstring-based authorship recognition. The entire procedure is repeated 10 times, resulting in 10 accuracy estimations for each subcorpus.
Table 1: List of subcorpora1
subcorpus
time span
authors
cs1
1833-1838
V. Hálek, A. Heyduk, J. Neruda, G. Pfleger Moravský, V. Šolc
cs2
1841-1853
S. Čech, E. Krásnohorská, J. V. Sládek, J. Vrchlický, J. Zeyer
cs3
1854-1861
B. Kaminský, K. Kučera, F. Kvapil, E. A. Mužík, A. Škampa
de1
1772-1802
A. von Chamisso, F. Grillparzer, N. Lenau, F. Schlegel, L. Tieck
de2
1806-1830
E. Geibel, A. Grün, P. Heyse, G. Keller, L. Otto
en1
1840-1865
A. Bierce, T. Hardy, A. Lang, O. Wilde, W. B. Yeats
es1
1490-1591
H. de Acunya, F. de Borja, J. Boscan, G. de Cetina, F. de La Torre
es2
1534-1562
B. Argensola, M. de Cervantes y Saavedra, L. de Góngora, F. de Herrera, L. de Vega
es3
1580-1603
G. Bocangel y Unzueta, F. de Quevedo, P. Soto de Rojas, J. de Tassis y Peralta, L. de Ulloa y Pereira
it1
1775-1814
G. Giusti, A. Guadagnoli, G. Leopardi, C. Porta, G. Prati
ru1
1783-1821
Y. Lermontov, A. N. Majkov, A. S. Pushkin, M. A. K. Tolstoj, V. A. Zhukovskij
Results
Figure 1 shows the results. All subcorpora significantly outperform the random baseline (0.2), yet there are substantial differences across languages.2 We hypothesize that one of the key factors may be the versification type: while in syllabic versification only the number of syllables in a line is constrained and the distribution of stress is left to the author’s preferences, in accentual-syllabic versification both syllable count and stress placement are subject to constraints, which leaves considerably less space for authors to individualize their rhythm.3
Figure 1: Rhythm-based authorship recognition accuracy; 30 random samplings per subcorpus; leave-one-out cross-validation; linear SVM. Boxplots are constructed in a following way: the box gives the interquartile range (IQR) of the 30 samplings; the horizontal line gives the median value; whiskers are plotted at the 1.5 IQR values; data beyond the whiskers are plotted as individual points.
Interpretation
At first glance, this hypothesis fails to explain the differences. Accentual-syllabic CS ranks among the best scoring languages and accentual-syllabic EN outperforms part of syllabic ES. We need, however, to keep in mind that the traditional categorical notion of versification types is misleading-it is, rather, a continuous scale (cf. Gasparov). To formalize the degree of accentual regularity, we measure the entropy of bitstrings in particular subcorpora:
As Figure 2 shows, there is some sort of association between accentual regularity and the accuracy of authorship recognition as we proceed from strictly organized RU and DE; across semi-organized CS, EN, and ES; to loose IT, the accuracy tends to increase (linear regression, R 2 = 0.5).
Figure 2: Relationship between rhythm-based authorship recognition accuracy and bitstrings entropy; linear regression (R
2
= 0.5)
Conclusions
The experiment described here yields two types of results. The first, which extends our earlier work in Plecháč & Birnbaum, is a demonstration that rhythmic organization (and, specifically, accentual regularity) can function as a meaningful feature for authorship recognition. The second is our observation that the categorical identification of verse traditions as either syllabic or accentual-syllabic masks the actually continuous nature of the feature, and we propose a formula that remedies that limitation by quantifying the accentual regularity of verse corpora as a continuous property.
Acknowledgements
Acknowledgments
The study was supported by the Czech Science Foundation, project GA23-07727S (European Poetry: Distant Reading).
Data and code are available at https://github.com/versotym/Versification-Type-Authorship.
Data sources:
CS: Corpus of Czech Verse (https://github.com/versotym/corpusczechverse)
DE: Metricalizer (https://metricalizer.de/en/)
EN: Gutenberg English Poetry Corpus (https://doi.org/10.3389/fdigh.2018.00005)
ES: Corpus of Spanish Golden-Age Sonnets (https://github.com/bncolorado/CorpusSonetosSigloDeOro)
IT: Biblioteca Italiana (http://www.bibliotecaitaliana.it/)
RU: Russian Poetry Corpus (https://ruscorpora.ru/en/page/corpus-poetic/)
Cited works
- Gasparov, Mikhail, and Marina Tarlinskaja. “A Probability Model of Verse (English, Latin, French, Italian, Spanish, Portuguese)”. Style, vol. 21, no. 3, 1987, pages 322-358. 🠔
- Mittmann, Adiel, et al. “What rhythmic signature says about poetic corpora”. Quantitative Approaches to Versification. Edited by Petr Plecháč et al. Prague, ICL, 2019, pages 153-172. 🠔
- Orekhov, Boris. “Mikrodiakhroniia stikhovedcheskikh parametrov u russkikh poetov”. VAProsy Iazykoznaniia. Edited by A. A. Kibrik et al. Moscow, Buki Vedi 2020. pages 161-164. 🠔
- Plecháč, Petr, and David J. Birnbaum. “Assessing the reliability of stress as a feature of authorship attribution in syllabic and accentual syllabic Verse”. Quantitative Approaches to Versification. Edited by Petr Plecháč et al. Prague, ICL , 2019, pages 201-210. 🠔
- Šeļa, Artjoms, et al. “Fenomen Batenkova i problema verifikacii avtorstva”. Acta Slavica Estonica, vol. 12, 2020, pages. 131-165. 🠔