Speaker verification system based on articulatory information from ultrasound recordings
Sistema de verificación de hablantes utilizando información articulatoria de grabaciones de ultrasonido
DOI:
https://doi.org/10.15446/dyna.v87n213.81772Palabras clave:
speech processing, speaker verification, articulatory parameters, ultrasound, i-vectors, GMMs (en)procesamiento de señales del habla, verificación de hablantes, parámetros articulatorios, ultrasonido, i-vectors, GMMs (es)
Descargas
Referencias
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V. and Wellekens, C., Automatic speech recognition and speech variability: a review. Speech Communication. 49(10), pp. 763-786, 2007. DOI: 10.1016/j.specom.2007.02.006
O’Shaughnessy, D., Speech communications: human and machine, 2nd Ed., Wiley-IEEE Press, New York, USA, 1999, 548 P.
Kitapci, K. and Galbrun, L., Perceptual analysis of the speech intelligibility and soundscape of multilingual environments. Applied Acoustics, 151, pp. 124-136, 2019. DOI: 10.1016/j.apacoust.2019.03.001.
Rix, A.W., Beerends, J.G., Hollier, M.P. and Hekstra, A.P., Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in: International Conference on Acoustics, Speech, and Signal Processing, Proceedings. IEEE, Salt Lake City, USA, 2001, pp. 749-752.
Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), pp. 12-40, 2010. DOI: 10.1016/j.specom.2009.08.009.
Reynolds, D.A., Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), pp. 639-643, 1994. DOI: 10.1109/89.326623.
Doddington, G.R., Przybocki, M.A., Martin, A.F. and Reynolds, D.A., The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective. Speech Communication. 31(2-3), pp. 225-254, June, 2000. DOI: 10.1016/S0167-6393(99)00080-1.
Bishop, C.M., Pattern recognition and machine learning, 1st Ed., Springer, New York, USA, 2006.
Duin, PW. R. and Pekalska, E., Dissimilarity representation for pattern recognition, the: foundations and applications. 1st Ed., World scientific, New Jersey, USA, 2005.
Dehak, N., Kenny, P.J, Dehak, R., Dumouchel, P. and Ouellet, P., Front-end factor analysis for speaker verification. Transactions on Audio, Speech, and Language Processing, 19(4), pp.788-798, 2011. DOI: 10.1109/TASL.2010.2064307.
Kenny, P., Boulianne, G., Ouellet, P., Dumochel, P., Joint Factor Analysis Versus Eigenchannels in Speaker Recognition. In: IEEE Transactions on Audio, Speech, and Language Processing, 15(4), pp. 1435-1447, 2007. DOI: 10.1109/TASL.2006.881693.
Sreenivasa, K. and Sarkar, S., Robust speaker recognition in noisy environments, 1st Ed., Springer, New York, USA, 2014, pp. 2-49.
Leung, K., Mak, M., Siu, M. and Kung, S., Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification. Speech Communication, 48(1), pp. 71-84, 2006. DOI: 10.1016/j.specom.2005.05.013.
Dromey, C. and Sanders, M., Intra-speaker variability in palatometric measures of consonant articulation. Journal of Communication Disorders, 42(6), pp. 397-407, 2009. DOI: 10.1016/j.jcomdis.2009.05.001.
Serruirer, A., Badin, P., Bo, L., Lamalle, L. and Nesuchaefer-Rube, C., Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French, in: Speech and Language Processing, 4th, 2017, Interspeech, Stockholm, Sweden, 2017.
Ghosh, P.K. and Narayanan, S., A generalized smoothness criterion for acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 128(4), art. 2172, 2010. DOI: 10.1121/1.3455847.
Sepúlveda, A., Capobianco, R. and Castellanos, G., Estimation of relevant time frequency features using Kendall coefficient for articulator position inference. Speech Communication, 55(1), pp. 99-110, 2013. DOI: 10.1016/J.SPECOM.2012.06.005.
Potard, B., Laprie, Y. and Ouni, S., Incorporation of phonetic constraints in acoustic-to-articulatory inversion. Journal of the Acoustical Society of America, 123(4), pp. 2310-2323, 2008. DOI: 10.1121/1.2885747.
Ghosh, P.K. and Narayanan, S., Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 130(4), pp. EL251-EL257, 2011. DOI: 10.1121/1.3634122.
Li, M., Kim, J., Lammert, A., Ghosh, P.K., Ramanarayanan, V. and Narayanan, S., Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Computer Speech & Language, 36, pp. 196-211, 2016. DOI: 10.1016/j.csl.2015.05.003.
Aravind, I. and Ghosh, P.K., Inferring speaker identity from articulatory motion during speech, in: Workshop on Machine Learning in Speech and Language Processing 5th Interspeech, 2018, Hyderabad, India, 2018.
Aron, M., Kerrien, E., Berger, M. and Laprie, Y., Coupling electromagnetic sensors and ultrasound images for tongue tracking in International Seminar on Speech Production, 7th, ISSP, 2006, Ubatuba-SP, Brazil. 2006.
Narayanan, S., Toutios, A., Ramanarayanan, V., Lammert, A., Kim, J., Lee, S., Nayak, K., Kim, Y., Zhu, Y., Goldstein, L., Byrd, D., Bresch, E., Ghosh, P., Katsamanis, A. and Proctor, M., Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). The Journal of the Acoustical Society of America, 136(3), pp. 1307-1311, 2014. DOI: 10.1121/1.4890284.
Prasad, A., Periyasamy, V. and Ghosh, P., Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification, in: International Conference on Acoustics, Speech and Signal Processing (ICASSP), 40th, IEEE, 2015, Brisbane, Australia, 2015, pp. 4265-4269.
Porras, D., Sepúlveda, A. and Csapó, G., DNN-based acoustic-to-articulatory inversion using ultrasound tongue imaging, in: 21st, International Join Conference on Neural Networks, IJCNN, Budapest, Hungary, 2019.
Scobbie, J., Wrench, A. and van der Linden, M., Head-probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement, in: 8th International seminar on speech production. Proceedings of the 8th International seminar on speech production, Strasbourg, France, 2008, pp. 373-376.
Whalen, D., Iskarous, K., Tiede, M., Ostry, D., Lehnert-LeHouillier, H., Vatikiotis-Bateson, E. and Hailey, D., The Haskins optically corrected ultrasound system (HOCUS). Journal of Speech, Language, and Hearing Research, 48(3), pp. 543-553, 2005. DOI: 10.1044/1092-4388(2005/037).
Castillo, M., Rubio, F., Porras, D., Contreras, S. and Sepúlveda, A., A small vocabulary database of ultrasound image sequences of vocal tract dynamics, in: 2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA) Conference Proceedings, IEEE, Bucaramanga, Colombia, 2019, pp. 1-5.
Li, M., Kambhamettu, C. and Stone, M., Automatic contour tracking in ultrasound images. Clinical Linguistics & Phonetics, 19(6-7), pp. 545-554, 2005. DOI: 10.1080/02699200500113616.
Kass, M., Witkin, A. and Terzopoulos, D., Snakes: active contour models. International Journal of Computer Vision, 1(4), pp. 321-331, 1988. DOI: 10.1007/BF00133570.
Proakism, J. and Manolakis, D., Digital signal processing: principles, algorithms, and applications, 3rd Ed, Prentice Hall, N.J., United States of America, 1996.
Childers, D.G., Skinner, D.P. and Kemerait, R.C., The cepstrum: a guide to processing. Proceedings of the IEEE, 65(10), pp. 1428-1443, 1977. DOI: 10.1109/PROC.1977.10747.
O'Shaughnessy, D., Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recognition, 41(10), pp. 2965-2979, 2008. DOI: 10.1016/j.patcog.2008.05.008.
Ichikawa, O., Fukuda, T. and Nishimura, M., Dynamic features in the linear-logarithmic hybrid domain for automatic speech recognition in a reverberant environment. IEEE Journal of Selected Topics in Signal Processing, 4(5), pp. 816-823, 2010. DOI: 10.1109/JSTSP.2010.2057191.
Zheng, F., Zhang, G. and Song, Z., Comparison of different implementations of MFCC, Journal of Computer Science and Technology, 16(6), pp. 582-589, 2001.
Young, S., Evermann, G., Kershaw, D., Moore, G., Gales, M., Odell, J., Hain, T., Liu, X., Ollason, D., Povey, D., Valtchev, V. and Woodland, P. The htk book [online], version 3.2, Cambridge University Engineering Department, Cambridge, UK, 2002. [Consulted, September 14th of 2018]. Available at: http://www.dsic.upv.es/docs/posgrado/20/RES/materialesDocentes/alejandroViewgraphs/htkbook.pdf.
Brookes, M., Voicebox: speech processing toolbox for matlab. [online]. 2nd Ed., Department of Electrical & Electronic Engineering, Imperial College, London, UK, 2011. [Consulted, September 20, 2018]. Available at: http://www.ee.ic.ac.uk/hp/staff/dmb/ voicebox/voicebox.html.
Sadjadi, S.O., Slaney, M. and Heck, L., Msr identity toolbox v1.0: a Matlab toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, 1(4), pp.1-32, 2013. [39] Reynolds, D.A., Quatieri, T.F. and Dunn, R.B., Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1-3), pp.19-41, 2000. DOI: 10.1006/dspr.1999.0361.
Bousquet, P-M., Matrouf, D. and Bonastre, J-F., Intersession compensation and scoring methods in the i-vectors space for speaker recognition, in: 12th Annual Conference of the International Speech Communication Association Proceedings, ISCA, Florence, Italy, 2011, pp. 485-488.
Sizov, A., Lee, K.A. and Kinnunen, T., Unifying probabilistic linear discriminant analysis variants in biometric authentication, in: 10th The Joint Biannual Event Statistical Pattern Recognition Technique and Structural and Syntactical Pattern Recognition, Proc. S+SSPR, IAPR, Joensuu, Finland, 2014, pp. 464-475.
Simon, J.D., Prince and Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity, in: 11th International Conference on Computer Vision, Proceedings ICCV, IEEE, Rio de Janeiro, Brazil, 2007, pp. 1-8.
Zhang, Y., Long, Y., Shen, X., Wei, H., Yang, M., Ye, H. and Mao, H., Articulatory movement features for short-duration text-dependent speaker verification. International Journal of Speech Technology, 20(4), pp. 753-759, 2017. DOI: 10.1007/s10772-017-9447-8.
Martin, A., Doddington, G., Kamm, T., Ordowski, M. and Przybocki, M., The det curve in assessment of detection task performance, Gaithersburg, USA, National Institute of Standards and Technology, 1997.
Cómo citar
IEEE
ACM
ACS
APA
ABNT
Chicago
Harvard
MLA
Turabian
Vancouver
Descargar cita
CrossRef Cited-by
1. Naren Arley Mantilla Ramírez, Iván Darío Porras Gómez, Alexander Sepúlveda Sepúlveda. (2022). Detección de especies maderables mediante sensores químicos de olor, aplicando regularización L1 y modelos de mezclas gaussianas. Revista Logos Ciencia & Tecnología, 15(1), p.8. https://doi.org/10.22335/rlct.v15i1.1642.
Dimensions
PlumX
Visitas a la página del resumen del artículo
Descargas
Licencia
Derechos de autor 2020 DYNA

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.
El autor o autores de un artículo aceptado para publicación en cualquiera de las revistas editadas por la facultad de Minas cederán la totalidad de los derechos patrimoniales a la Universidad Nacional de Colombia de manera gratuita, dentro de los cuáles se incluyen: el derecho a editar, publicar, reproducir y distribuir tanto en medios impresos como digitales, además de incluir en artículo en índices internacionales y/o bases de datos, de igual manera, se faculta a la editorial para utilizar las imágenes, tablas y/o cualquier material gráfico presentado en el artículo para el diseño de carátulas o posters de la misma revista.




