Identificação de classes rítmicas de língua: modelagem de cadeias categorizadas da sonoridade usando árvores probabilísticas

Identifying Rhythmic Classes of Languages: Modeling Symbolics Chains of the Sonority Using Trees of Probability

JUVÊNCIO NOBRE1

1Universidade Federal do Ceará, Departamento de Estatística e Matemática Aplicada, Fortaleza, Brasil. Professor adjunto I. Email: juvencio@ufc.br


Resumo

Recentemente, vários autores sugerem métodos para discriminar classes rítmicas de língua (Ramus et al. 1999, Duarte et al. 2001, Galves et al. 2002). Baseado no conceito de sonoridade, definido em Galves et al. (2002) e Cassandro et al. (2007), é proposto um modelo paramétrico para a família de processos estocásticos dos tempos de evolução da sonoridade para diferentes línguas, denotada por família de cadeias categorizadas ligadas. O objetivo do presente trabalho é modelar, para as diferentes línguas, as correspondentes cadeias categorizadas via cadeias de Markov de alcance variável (VLMC) e avaliar a conjectura de que estas resumem toda informação relevante dada pela sonoridade.

Palavras chave: sonoridade, cadeias categorizadas ligadas, cadeias de Markov de alcance variável.


Abstract

Recently, several authors suggest methods to discriminate rhythmic classes of language (Ramus et al. 1999, Duarte et al. 2001, Galves et al. 2002). Based on sonority concept, defined in Galves et al. (2002), and Cassandro et al. (2007), a parametric model for the family of stochastic processes of sonority time evolution for different languages is proposed, denoted by family of tied quantized chains. The objective of this paper is to model, for the different languages, the correspondent quantized chains using Variable Length Markov Chains (VLMC) and evaluate the conjectures that summarize all relevant information given by the sonority.

Key words: Sonority, Tied quantized chains, Variable length Markov chain.


Texto completo disponible en PDF


Referências

1. Agresti, A. (2002), Categorical Data Analysis, John Wiley & Sons, New York, United States edition second.

2. Bühlmann, P. (2000), `Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm´, Ann. Inst. Statist. Math. 25, 287-315.

3. Bühlmann, P. & Wyner, A. J. (1999), `Variable Length Markov Chains´, Annals of Statistics 27, 480-513.

4. Cassandro, M., Collet, P., Duarte, D., Galves, A. & Garcia, J. (2007), `A stochastic Model for the Speech Sonority: Tied Quantized Chains and Cross-Linguistic Estimation of the Cut-Points´, Math. & Sci. hum. 180, 43-55. Mathematical Social Sciences, 45 année.

5. Chao, W. H. & Kosorok, M. R. (1995), Asymptotic Properties of Markov Regression Models for Longitudinal Categorical Data in Continuous Time, Biostatistic technical report , Department of Statistic, University of Wisconsin.

6. Cuesta-Albertos, J., Fraiman, R., Galves, A. & Garcia, J. (2007), `Identifying Rhythmic Classes of Languages Using their Sonority: A Kolmogorov-Smirnov Approach´, Journal of Applied Statistics 34, 749-761.

7. Duarte, D., Galves, A., Lopes, N. & Maronna, R. (2001), Robust Test for Equality of Variances the Statistical Analysis of Acoustic Correlates of Speech Rhythm, `Parameter setting and language change´, Workshop on rhythmic patterns, University of Bielefeld, , . *http://www.physik.uni-bielefeld.de/complexity/duarte.pdf

8. Ferrari, F. & Wyner, A. (2003), `Estimation of General Stationary Processe by Variable Length Markov Chains´, Scandinavian Journal of Statistics 30, 459-480.

9. Galves, A., Garcia, J., Duarte, D. & Galves, C. (2002), Sonority as a Basis for Rhythmic Class Discrimination, `Speech Prosody´. *www.lpl.uinv-aix.fr/sp2002/pdf/galves-et-al.pdf

10. Lindsey, J. K. (1999), Models for Repeated Measurements, second edn, Oxford Statistical series, New York, United States.

11. Mehler, J., Dupoux, E., Nazzi, T. & Dehaene-Lambertz, G. (1996), Coping with Linguistic Diversity: The Infant's Viewpoint, `Signal to syntax: bootstraping from speech to grammar in early acquisition´.

12. Mäechler, M. (2006), VLMC: VLMC-Variable Length Markov Chains. R package version 1.3-10.

13. Mäechler, M. & Bühlmann, P. (2004), `Variable Length Markov Chains: Methodology, Computing and Software´, Journal of Computational & Graphical Statistics 13, 435-455.

14. R Development Core Team, (2007), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. *http://www.R-project.org

15. Ramus, F., Nespor, M. & Mehler, J. (1999), `Correlates of Linguistic Rhythm in the Speech Signal´, Cognition 73, 265-292.

16. Reboussin, D. V. (1990), Discovering Markov Structure in Group Sequential Methods for Longitudinal Studies, Biostatistic technical report 61, Department of Statistic, University of Wisconsin.

17. Rissanen, J. (1983), `A Universal Data Compression System´, IEEE Trans. Inform. Theory 29, 656-664.

18. Ware, J., Lipsitz, S. & Speizer, F. (1988), `Issues in the Analysis of Repeated Categorical Outcomes´, Statistics in Medicine 7, 95-107.


[Recibido en febrero de 2008. Aceptado en octubre de 2008]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv31n2a07,
    AUTHOR  = {Nobre, Juvêncio},
    TITLE   = {{Identificação de classes rítmicas de língua: modelagem de cadeias categorizadas da sonoridade usando árvores probabilísticas}},
    JOURNAL = {Revista Colombiana de Estadística},
    YEAR    = {2008},
    volume  = {31},
    number  = {2},
    pages   = {229-240}
}