Publicado

2017-04-01

Blind speaker identification for audio forensic purposes

Identificación del hablante de forma ciega para fines de audio forense

Palabras clave:

Speaker identification, cochleagram, fuzzy logic, true acceptance, false acceptance (en)
Identificación del hablante, cocleagrama, lógica difusa, aceptación correcta, aceptación incorrecta (es)

Autores/as

This paper presents a blind method for speaker identification for audio forensics purposes. It is based on a decision system with fuzzy rules and works with the correlation between the cochleagrams of the audio proof and of the audios of the suspects. Our proposed system can give a null output, a unique selected suspect or a group of identified suspects. According to several tests, our Overall Accuracy (OA) is 0.97 with agreement (κappa index, κ) of 0.75. Additionally, unlike typical systems in which a low false acceptance (FP) implies high false rejection (FN), our system can work simultaneously with FN and FP equal to zero (i.e. OA=1; κ=1). Finally, our system works with blind identification, it means, without preliminary knowledge of the audio recordings or a training step; an imperative characteristic for audio forensics.
Este artículo presenta un método ciego para identificación del hablante, con fines de audio forense. Se basa en un sistema de decisión que trabajo con reglas difusas y la correlación entre los cocleagramas del audio de prueba y de los audios de los sospechosos. Nuestro sistema proporciona salida nula, con único sospechoso o con un grupo de sospechosos. De acuerdo a las pruebas realizadas, el desempeño global del sistema (OA) es 0.97 con un valor de concordancia (índice kappa) de 0.75. Adicionalmente, a diferencia de sistemas clásicos en los que un bajo valor de selección incorrecta (FP) implica un alto valor de rechazo incorrecto (FN), nuestro sistema puede trabajar con valores de FP y FN igual a cero, de forma simultánea. Finalmente, nuestro sistema trabaja con identificación ciega, es decir, no es necesaria una fase de entrenamiento o conocimiento previo de los audios; característica importante para audio forense.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Almaadeed, N., Aggoun, A. and Amira, A., Speaker identification using multimodal neural networks and wavelet analysis. IET Biometrics, 4, pp. 18-28, 2015. DOI: 10.1049/iet-bmt.2014.0011

Avci, E. and Avci, D., The speaker identification by using genetic wavelet adaptive network based fuzzy inference system. Expert Systems with Applications, 36, pp. 9928-9940, 2009. DOI: 10.1016/j.eswa.2009.01.081

Daqroug, K. and Tutunji, T.A., Speaker identification using vowels features through a combined method of formants, wavelet, and neural network classifiers. Applied Soft Computing, 27(2), pp. 231-239, 2015. DOI: 10.1016/j.asoc.2014.11.016

Pham, T., Genetic learning of multi-attribute interactions in speaker verification, Proceedings of the 2000 Congress on Evolutionary Computation, 2000, pp. 379-383. DOI: 10.1109/CEC.2000.870320

Morrison, G., Sahito, F., Jardine, G., Djokic, D., Clavet, S., Berghs, S., et al., INTERPOL survey of the use of speaker identification by law enforcement agencies. Forensic Science International, 263(6), pp. 92-100, 2016. DOI: 10.1016/j.forsciint.2016.03.044

Kober, V., Diaz-Ramirez, V.H. and Sandoval-Ibarra, Y., Speech enhancement with local adaptive rank-order filtering. Computación y Sistemas, 18(1), pp. 123-136, 2014. DOI: 10.13053/CyS-18-1-2014-023

Ajmera, P.K., Jadhav, D.V. and Holambe, R.S., Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram. Pattern Recognition, 44(10), pp. 2749-2759, 2011. DOI: 10.1016/j.patcog.2011.04.009

Maher, R.C., Audio forensic examination. IEEE Signal Processing Magazine, 26, pp. 84-94, 2009. DOI: 10.1109/MSP.2008.931080

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F. and Li, H., Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66(2), pp. 130-153, 2015. DOI: 10.1016/j.specom.2014.10.005

Gao, B., Woo, W. and Khor, L., Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation. The Journal of the Acoustical Society of America, 135, pp. 1171-1185, 2014. DOI: 10.1121/1.4864294

Zhao, X., Shao, Y. and Wang, D., CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20, pp. 1608-1616, 2012. DOI:

Shao, Y. and Wang, D., Robust speaker identification using auditory features and computational auditory scene analysis, IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 1589-1592. DOI: 10.1109/ICASSP.2008.4517928

Patterson, R.D. Holdsworth, J. and Allerhand, M., Auditory models as preprocessors for speech recognition. In: Schouten, M.E., Ed., The Auditory Processing of Speech, 1992, pp. 67-84. DOI: 10.1515/9783110879018.67

Beigi, H., Fundamentals of speaker recognition. Springer Science and Business Media, 2011. DOI: 10.1007/978-1-4419-5906-5_747

Mazaira-Fernandez, L.M., Álvarez-Marquina, A., and Gómez-Vilda, P., Improving speaker recognition by biometric voice deconstruction. Frontiers in bioengineering and biotechnology, 2015, vol. 3. DOI: 10.3389/fbioe.2015.00126