Published

2021-12-17

Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings

Evaluación automática de la personalidad a partir de las transliteraciones de los vlogs de YouTube mediante el uso de incrustaciones de palabras clásicas

DOI:

https://doi.org/10.15446/ing.investig.93803

Keywords:

Personality, Word Embeddings, YouTube, Regression, Classification (en)
Personalidad, Incrustaciones de Palabras, YouTube, Regresión, Clasificación (es)

Downloads

Authors

The study of automatic personality recognition has gained attention in the last decade thanks to a variety of applications that derive from this field. The big five model (also known as OCEAN) constitutes a well-known method to label different personality traits. This work considers transliterations of video recordings collected from YouTube (originally provided by the Idiap research institute) and automatically generated scores for the five personality traits which also were provided in the database. The transliterations are modeled with two different word embedding approaches, Word2Vec and GloVe and three different levels of analysis are included: regression to predict the score of each personality trait, binary classification between strong vs. weak presence of each trait, and the tri-class classification according to three different levels of manifestations in each trait (low, medium, and high). According to our findings, the proposed approach provides similar results to others reported in the state-of-the-art. We think that further research is required to find better results. Our results, as well as others reported in the literature, suggest that there is a big gap in the study of personality traits based on linguistic patterns, which make it necessary to work on collecting and labeling data considering the knowledge of expert psychologists and psycholinguists.

El estudio del reconocimiento automático de la personalidad ha ganado atención en la última década gracias a las diversas aplicaciones que se derivan de este campo. El modelo de los cinco grandes (también conocido como OCEAN) constituye un método bien conocido para etiquetar diferentes rasgos de personalidad. En este trabajo se consideran transliteraciones de grabaciones de vídeo recogidas de YouTube (proporcionadas originalmente por el instituto de investigación Idiap) y puntuaciones generadas automáticamente para los cinco rasgos de personalidad que también se proporcionaron en la base de datos. Las transliteraciones se modelan con dos enfoques diferentes de incrustación de palabras, Word2Vec y GloVe, y se incluyen tres niveles diferentes de análisis: regresión para predecir la puntuación de cada rasgo de personalidad, clasificación binaria entre presencia fuerte y débil de cada rasgo, y la clasificación tri-clase según tres niveles diferentes de manifestaciones en cada rasgo (bajo, medio y alto). Según nuestros resultados, el enfoque propuesto proporciona resultados similares a otros reportados en el estado del arte. Creemos que es necesario seguir investigando para encontrar mejores resultados. Nuestros resultados, así como otros reportados en la literatura, sugieren que existe un gran vacío en el estudio de los rasgos de personalidad basados en patrones lingüísticos, lo que hace necesario trabajar en la recolección y etiquetado de datos considerando el conocimiento de psicólogos y psicolingüistas expertos.

References

Alam, F., and Riccardi, G. (2014, November). Predicting personality traits using multimodal information. Proceedings of the 2014 ACM multi media on workshop on computational personality recognition, 15-18. https://dl.acm.org/doi/10.1145/2659522.2659531{10.1145/2659522.2659531

Alammar, J. (June 27, 2018). The Illustrated Transformer [Blog post] http://jalammar.github.io/illustrated-transformer/

Allport, G. W. (1937). Personality: A psychological interpretation.

Bellei, C. (2018). The backpropagation algorithm for Word2Vec. Marginalia http://www.claudiobellei.com/2018/01/06/backprop-word2vec/

Biel, J. I., Tsiminaki, V., Dines, J., and Gatica-Perez, D. (2013, December). Hi YouTube! Personality impressions and verbal content in social video. Proceedings of the 15th ACM on International conference on multimodal interaction, 119-126. https://doi.org/10.1145/2522848.2522877

Buhrmester, M., Kwang, T., and Gosling, S. D. (2016). Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research, 133-139. American Psychological Association. https://psycnet.apa.org/doi/10.1037/14805-009

Cambria, E., Das, D., Bandyopadhyay, S., and Feraco, A. (2017). Affective computing and sentiment analysis. A practical guide to sentiment analysis, 1-10. Springer, Cham. https://doi.org/10.1007/978-3-319-55394-8_1

Celli, F. (2012). Unsupervised personality recognition for social network sites. Proc. of sixth international conference on digital society, 59-62.

Celli, F., Lepri, B., Biel, J. I., Gatica-Perez, D., Riccardi, G., and Pianesi, F. (2014, November). The workshop on computational personality recognition 2014. Proceedings of the 22nd ACM international conference on Multimedia, 1245-1246. https://doi.org/10.1145/2647868.2647870

da Silva, B. B. C., and Paraboni, I. (2018, September). Personality recognition from Facebook text. International Conference on Computational Processing of the Portuguese Language, 107-114. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_11

Das, K. G., and Das, D. (2017, December). Developing Lexicon and Classifier for Personality Identification in Texts. Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), 362-372.

Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint https://arxiv.org/abs/1810.04805

Dey, S. (2018, April). Implementing a Soft-Margin Kernelized Support Vector Machine Binary Classifier with Quadratic Programming in R and Python. Simple Data Science. https://sandipanweb.wordpress.com/2018/04/23/implementing-a-soft-margin-kernelized-support-vector-machine-binary-classifier-with-quadratic-programming-in-r-and-python

Gosling, S. D., Rentfrow, P. J., and Swann Jr, W. B. (2003). A very brief measure of the Big-Five personality domains. Journal of Research in personality, 37(6), 504-528. https://doi.org/10.1016/S0092-6566(03)00046-1

Guan, Z., Wu, B., Wang, B., and Liu, H. (2020, July). Personality2vec: Network Representation Learning for Personality. 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), 30-37. IEEE. https://doi.org/10.1109/DSC50466.2020.00013

Hassanein, M., Hussein, W., Rady, S., and Gharib, T. F. (2018, December). Predicting personality traits from social media using text semantics. 2018 13th International Conference on Computer Engineering and Systems (ICCES), 184-189. IEEE. https://doi.org/10.1109/ICCES.2018.8639408

Jiang, H., Zhang, X., and Choi, J. D. (2020, April). Automatic Text-Based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 34(10), 13821-13822. https://doi.org/10.1609/aaai.v34i10.7182

John, O. P., Donahue, E. M., and Kentle, R. L. (1991). Big five inventory. Journal of Personality and Social Psychology. https://psycnet.apa.org/doi/10.1037/t07550-000

John, O. P., Naumann, L. P., and Soto, C. J. (2008). Paradigm shift to the integrative Big Five trait taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory and research, 114-158. The Guilford Press.

Kazameini, A., Fatehi, S., Mehta, Y., Eetemadi, S., and Cambria, E. (2020, October). Personality trait detection using bagged svm over bert word embedding ensembles. arXiv preprint https://arxiv.org/abs/2010.01309

Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, 14(2), 1137-1145.

Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., and Stillwell, D. (2015, September). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American psychologist, 70(6), 543. https://psycnet.apa.org/doi/10.1037/a0039210

Mao, Y., Zhang, D., Wu, C., Zheng, K., and Wang, X. (2018, December). Feature analysis and optimisation for computational personality recognition. 2018 IEEE 4th International Conference on Computer and Communications (ICCC), 2410-2414. IEEE. https://doi.org/10.1109/CompComm.2018.8780801

Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., and Eetemadi, S. (2020, November). Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. 2020 IEEE International Conference on Data Mining (ICDM), 1184-1189. IEEE. https://doi.org/10.1109/ICDM50108.2020.00146

Mehta, Y., Majumder, N., Gelbukh, A., and Cambria, E. (2020, April). Recent trends in deep learning based personality detection. Artificial Intelligence Review, 53(4), 2313-2339. https://doi.org/10.1007/s10462-019-09770-z

Milgram, J., Cheriet, M., and Sabourin, R. (2006, October). ``One against one'' or ``one against all'': Which one is better for handwriting recognition with SVMs?. tenth international workshop on Frontiers in handwriting recognition. Suvisoft. https://hal.inria.fr/inria-00103955

Mikolov, T. (2015). word2vec: Tool for computing continuous distributed representations of words. Google Code https://code.google.com/archive/p/word2vec/

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, September). Efficient estimation of word representations in vector space. arXiv preprint https://arxiv.org/abs/1301.3781

Mohammad, S., and Kiritchenko, S. (2013, June). Using nuances of emotion to identify personality. Seven International AAAI Conference on Weblogs and Social Media.

Onan, A. (2015, June). Classifier and feature set ensembles for web page classification. Journal of Information Science, 42(2), 150-165. https://doi.org/10.1177/0165551515591724

Onan, A. (2016, December). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 1-20. https://doi.org/10.1177/0165551516677911

Onan, A. (2017a). Hybrid supervised clustering based ensemble scheme for text classification. Kybernetes, 46(2), 330-348. https://doi.org/10.1108/K-10-2016-0300

Onan, A. (2017b, October). A K-medoids based clustering scheme with an application to document clustering. 2017 international conference on computer science and engineering (UBMK), 354-359. IEEE. https://doi.org/10.1109/UBMK.2017.8093409

Onan, A. (2018, April). Sentiment analysis on Twitter based on ensemble of psychological and linguistic feature sets. Balkan Journal of Electrical and Computer Engineering 6(2), 69-77. https://doi.org/10.17694/bajece.419538

Onan, A. (2019a, October). Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering. IEEE Access, 7, 145614-145633. https://doi.org/10.1109/ACCESS.2019.2945911

Onan, A. (2019b, November). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. https://doi.org/10.1002/cae.22179

Onan, A. (2020, June). Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks. Concurrency and Computation: Practice and Experience, e5909. https://doi.org/10.1002/cpe.5909

Onan, A., and Korukoglu, S. (2015, November). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 1, 1-14. https://doi.org/10.1177/0165551515613226

Onan, A., Korukoğlu, S., and Bulut, H. (2016a, March). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232-247. https://doi.org/10.1016/j.eswa.2016.03.045

Onan, A., Korukoğlu, S., and Bulut, H. (2016b, June). LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis. Int. J. Comput. Linguistics Appl., 7(1), 101-119. https://doi.org/10.1016/j.eswa.2016.06.005

Onan, A., Korukoğlu, S., and Bulut, H. (2016c, November). A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications, 62, 1-16. https://doi.org/10.1016/j.eswa.2016.06.005

Pennebaker, J. W., and King, L. A. (1999). Linguistic styles: language use as an individual difference. Journal of personality and social psychology, 77(6), 1296-1312. https://psycnet.apa.org/doi/10.1037/0022-3514.77.6.1296

Pennington, J.(2014). GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/projects/glove/

Pennington, J., Socher, R., and Manning, C. D. (2014, October). Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/D14-1162

Perez, P. A. (2020). WEBERT: Word Embeddings using BERT. https://doi.org/10.5281/zenodo.3964244

Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint https://arxiv.org/abs/2010.16061

Pratama, B. Y., and Sarno, R. (2015, November). Personality classification based on Twitter text using Naive Bayes, KNN and SVM. 2015 International Conference on Data and Software Engineering (ICoDSE), 170-174. IEEE. https://doi.org/10.1109/ICODSE.2015.7436992

Ranković, V., Grujović, N., Divac, D., and Milivojević, N. (2014). Development of support vector regression identification model for prediction of dam structural behaviour. Structural Safety, 48, 33-39. https://doi.org/10.1016/j.strusafe.2014.02.004

Rehurek, R., and Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks.

Salminen, J., Rao, R. G., Jung, S. G., Chowdhury, S. A., and Jansen, B. J. (2020, July). Enriching Social Media Personas with Personality Traits: A Deep Learning Approach Using the Big Five Classes. International Conference on Human-Computer Interaction, 101-120. Springer, Cham. https://doi.org/10.1007/978-3-030-50334-5_7

Sarkar, C., Bhatia, S., Agarwal, A., and Li, J. (2014, November). Feature analysis for computational personality recognition using youtube personality data set. Proceedings of the 2014 ACM multi media on workshop on computational personality recognition, 11-14. https://doi.org/10.1145/2659522.2659528

Sch¨olkopf, B., Smola, A. J., and Bach, F. (2002).Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.

Smola, A. J., and Sch¨olkopf, B. (2004). A tutorial on support vector regression. Statistics and computing, 14(3), 199-222. https://doi.org/10.1023/B:STCO.0000035301.49549.88

Sun, X., Liu, B., Meng, Q., Cao, J., Luo, J., and Yin, H. (2019). Group-level personality detection based on text generated networks. World Wide Web, 23(3), 1887-1906. https://doi.org/10.1007/s11280-019-00729-2

Vapnik, V. (1995). The nature of statistical learning theory. Springer science and business media. DOI: https://doi.org/10.1007/978-1-4757-2440-0

Vinciarelli, A., and Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3), 273-291. https://doi.org/10.1109/TAFFC.2014.2330816

White, J. K., Hendrick, S. S., and Hendrick, C. (2004). Big five personality variables and relationship constructs. Personality and individual differences, 37(7), 1519-1530. https://doi.org/10.1016/j.paid.2004.02.019

Xue, D., Hong, Z., Guo, S., Gao, L., Wu, L., Zheng, J., and Zhao, N. (2017). Personality recognition on social media with label distribution learning. IEEE Access, 5, 13478-13488. https://doi.org/10.1109/ACCESS.2017.2719018

How to Cite

APA

López Pabón, F. O. & Orozco Arroyave, J. R. (2022). Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings. Ingeniería e Investigación, 42(2), e93803. https://doi.org/10.15446/ing.investig.93803

ACM

[1]
López Pabón, F.O. and Orozco Arroyave, J.R. 2022. Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings. Ingeniería e Investigación. 42, 2 (Apr. 2022), e93803. DOI:https://doi.org/10.15446/ing.investig.93803.

ACS

(1)
López Pabón, F. O.; Orozco Arroyave, J. R. Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings. Ing. Inv. 2022, 42, e93803.

ABNT

LÓPEZ PABÓN, F. O.; OROZCO ARROYAVE, J. R. Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings. Ingeniería e Investigación, [S. l.], v. 42, n. 2, p. e93803, 2022. DOI: 10.15446/ing.investig.93803. Disponível em: https://revistas.unal.edu.co/index.php/ingeinv/article/view/93803. Acesso em: 16 apr. 2026.

Chicago

López Pabón, Felipe Orlando, and Juan Rafael Orozco Arroyave. 2022. “Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings”. Ingeniería E Investigación 42 (2):e93803. https://doi.org/10.15446/ing.investig.93803.

Harvard

López Pabón, F. O. and Orozco Arroyave, J. R. (2022) “Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings”, Ingeniería e Investigación, 42(2), p. e93803. doi: 10.15446/ing.investig.93803.

IEEE

[1]
F. O. López Pabón and J. R. Orozco Arroyave, “Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings”, Ing. Inv., vol. 42, no. 2, p. e93803, Apr. 2022.

MLA

López Pabón, F. O., and J. R. Orozco Arroyave. “Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings”. Ingeniería e Investigación, vol. 42, no. 2, Apr. 2022, p. e93803, doi:10.15446/ing.investig.93803.

Turabian

López Pabón, Felipe Orlando, and Juan Rafael Orozco Arroyave. “Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings”. Ingeniería e Investigación 42, no. 2 (April 1, 2022): e93803. Accessed April 16, 2026. https://revistas.unal.edu.co/index.php/ingeinv/article/view/93803.

Vancouver

1.
López Pabón FO, Orozco Arroyave JR. Automatic Personality Evaluation from Transliterations of YouTube Vlogs Using Classical and State of the art Word Embeddings. Ing. Inv. [Internet]. 2022 Apr. 1 [cited 2026 Apr. 16];42(2):e93803. Available from: https://revistas.unal.edu.co/index.php/ingeinv/article/view/93803

Download Citation

CrossRef Cited-by

CrossRef citations7

1. Hao Lin, Xiaolei Li, Shijie Jia, Huajun Dong. (2023). Big five personality prediction based on pre-training language model and sentiment knowledge base. Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023). , p.172. https://doi.org/10.1117/12.3004082.

2. Sreekantha Desai Karanam, R.S. Kamath, Sampath Kini K. (2025). Exploring the Impact of Generative AI on Education: A Twitter Sentiment Study with Transformer Models. 2025 International Conference on Intelligent Systems and Pioneering Innovations in Robotics and Electric Mobility (INSPIRE). , p.732. https://doi.org/10.1109/INSPIRE67328.2025.11300605.

3. Mohmad Azhar Teli, Manzoor Ahmad Chachoo. (2022). Lingual markers for automating personality profiling: background and road ahead. Journal of Computational Social Science, 5(2), p.1663. https://doi.org/10.1007/s42001-022-00184-6.

4. 星瑶 郭. (2024). AI-Powered Personality Recognition Based on Social Media Text. Artificial Intelligence and Robotics Research, 13(04), p.788. https://doi.org/10.12677/airr.2024.134081.

5. Mohmad Azhar Teli, Manzoor Ahmad Chachoo. (2023). Pre-trained Word Embeddings In Deep Multi-label Personality Classification Of YouTube Transliterations. 2023 International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC). , p.1. https://doi.org/10.1109/ISACC56298.2023.10084047.

6. Fatima Habib, Zeeshan Ali, Akbar Azam, Komal Kamran, Fahad Mansoor Pasha. (2024). Navigating pathways to automated personality prediction: a comparative study of small and medium language models. Frontiers in Big Data, 7 https://doi.org/10.3389/fdata.2024.1387325.

7. Kunal Biswas, Shivakumara Palaiahnakote, Umapada Pal, Ram Sarkar. (2025). Personality Traits Prediction Methods: A Survey. International Journal of Pattern Recognition and Artificial Intelligence, 39(12) https://doi.org/10.1142/S0218001425300024.

Dimensions

PlumX

Article abstract page views

1040

Downloads

Download data is not yet available.