Published
A Unified Approach to Link Prediction in Collaboration Networks
Un enfoque unificado para la predicción de enlaces en redes de colaboración
DOI:
https://doi.org/10.15446/rce.v48n2.117558Keywords:
Collaboration networks, Exponential random graph model, Graph convolutional network, Word2Vec, Social networks analysis. (en)Redes de colaboración, Modelo exponencial de grafos aleatorios, Red de convolución sobre grafos, Word2Vec, Análisis de redes sociales. (es)
Downloads
This article investigates and compares three approaches to link prediction in colaboration networks, namely, an ERGM (Exponential Random Graph Model; Robins et al. 2007), a GCN (Graph Convolutional Network; Kipf & Welling 2017), and a Word2Vec+MLP model (Word2Vec model combined with a multilayer neural network; Mikolov, Chen, Corrado & Dean 2013 and Goodfellow et al. 2016). The ERGM, grounded in statistical methods, is employed to capture general structural patterns within the network, while the GCN andWord2Vec+MLP models leverage deep learning techniques to learn adaptive structural representations of nodes and their relationships. The predictive performance of the models is assessed through extensive simulation exercises using cross-validation, with metrics based on the receiver operating characteristic curve. The results clearly show the superiority of machine learning approaches in link prediction, particularly in large networks, where traditional models such as ERGM exhibit limitations in scalability and the ability to capture inherent complexities. These findings highlight the potential benefits of integrating statistical modeling techniques with deep learning methods to analyze complex networks, providing a more robust and effective framework for future research in this field.
Este artículo investiga y compara tres enfoques para la predicción de enlaces en redes de colaboración: un ERGM (Exponential Random Graph Model; Robins et al., 2007), una GCN (Graph Convolutional Network; Kipf & Welling, 2017) y un modelo Word2Vec+MLP (modelo Word2Vec combinado con una red neuronal multicapa; Mikolov, Chen, Corrado & Dean (2013), y Goodfellow et al. (2016)). El ERGM, basado en métodos estadísticos, se emplea para capturar patrones estructurales generales dentro de la red, mientras que los modelos GCN y Word2Vec+MLP utilizan técnicas de aprendizaje profundo para aprender representaciones estructurales adaptativas de los nodos y sus relaciones. El desempeño predictivo de los modelos se evalúa mediante extensos ejercicios de simulación con validación cruzada, utilizando métricas basadas en la curva característica operativa del receptor (ROC). Los resultados muestran claramente la superioridad de los enfoques de aprendizaje automático en la predicción de enlaces, particularmente en redes grandes, donde los modelos tradicionales como el ERGM presentan limitaciones en escalabilidad y en la capacidad de capturar complejidades inherentes. Estos hallazgos resaltan los posibles beneficios de integrar técnicas de modelado estadístico con métodos de aprendizaje profundo para analizar redes complejas, proporcionando un marco más robusto y efectivo para futuras investigaciones en este campo.
References
Amarasinghe, S. et al. (2024), Explainable Artificial Intelligence: Second World Conference, xAI 2024, Springer. https://www.springer.com/
Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S. & Hsieh, C.-J. (2019), Clustergcn: An efficient algorithm for training deep and large graph convolutional networks, in 'Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining', KDD '19, Association for Computing Machinery, New York, NY, USA, p. 257-266. DOI: https://doi.org/10.1145/3292500.3330925
Davis, J. & Goadrich, M. (2006), The relationship between precision-recall and roc curves, in 'Proceedings of the 23rd International Conference on Machine Learning', ACM, pp. 233-240. DOI: https://doi.org/10.1145/1143844.1143874
Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A. & Adams, R. P. (2015), 'Convolutional Networks on Graphs for Learning Molecular Fingerprints'.
Erdos, P. & Rényi, A. (1960), 'On the evolution of random graphs', Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.
Fawcett, T. (2006), 'An introduction to roc analysis', Pattern Recognition Letters 27(8), 861-874. DOI: https://doi.org/10.1016/j.patrec.2005.10.010
Gamerman, D. & Lopes, H. F. (2006), Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2 edn, Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781482296426
Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press. http://www.deeplearningbook.org
Grover, A. & Leskovec, J. (2016), node2vec: Scalable feature learning for networks, in 'Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining', pp. 855-864. DOI: https://doi.org/10.1145/2939672.2939754
Hamilton, W. L., Ying, R. & Leskovec, J. (2017a), Inductive representation learning on large graphs, in 'Advances in Neural Information Processing Systems (NeurIPS)'.
Hamilton, W. L., Ying, R. & Leskovec, J. (2017b), 'Representation learning on graphs: Methods and applications', IEEE Data Engineering Bulletin 40(3), 52-74.
Handcock, M., Hunter, D., Butts, C., Goodreau, S. & Morris, M. (2008), 'Statnet: Software tools for the representation, visualization, analysis and simulation of network data', Journal of statistical software 24, 1548-7660. DOI: https://doi.org/10.18637/jss.v024.i01
Hoff, P. (2007), 'Modeling homophily and stochastic equivalence in symmetric relational data', Advances in neural information processing systems 20.
Hoff, P. D., Raftery, A. E. & Handcock, M. S. (2002), 'Latent space approaches to social network analysis', Journal of the american Statistical association 97(460), 1090-1098. DOI: https://doi.org/10.1198/016214502388618906
Kipf, T. N. & Welling, M. (2017), Semi-supervised classification with graph convolutional networks, in 'Proceedings of the International Conference on Learning Representations (ICLR)'.
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y. & Porter, M. A. (2014), 'Multilayer networks', Journal of Complex Networks 2(3), 203-271. DOI: https://doi.org/10.1093/comnet/cnu016
Kolaczyk, E. D. & Csárdi, G. (2020), Statistical analysis of network data with R, Use R!, 2nd ed edn, Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-44129-6
Lee, Y., Lee, I. W. & Feiock, R. C. (2012), 'Interorganizational Collaboration Networks in Economic Development Policy: An Exponential Random Graph Model Analysis*', Policy Studies Journal 40(3), 547-573. DOI: https://doi.org/10.1111/j.1541-0072.2012.00464.x
Lu, L. & Zhou, T. (2011), 'Link prediction in complex networks: A survey', Physica A: Statistical Mechanics and Its Applications 390(6), 1150-1170. DOI: https://doi.org/10.1016/j.physa.2010.11.027
Luke, D. (2015), A User's Guide to Network Analysis in R, Use R!, Springer International Publishing, Cham. DOI: https://doi.org/10.1007/978-3-319-23883-8
Lusher, D., Koskinen, J. & Robins, G. (2013), Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications, Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511894701
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013), Efficient estimation of word representations in vector space, in 'Proceedings of the International Conference on Learning Representations (ICLR)'.
Mikolov, T., Yih, W.-t. & Zweig, G. (2013), 'Linguistic regularities in continuous space word representations', Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies pp. 746-751.
Newman, M. E. J. (2001), 'The structure of scientific collaboration networks', Proceedings of the National Academy of Sciences 98(2), 404-409. DOI: https://doi.org/10.1073/pnas.021544898
Newman, M. E. J., Strogatz, S. H. & Watts, D. J. (2001), 'Random graphs with arbitrary degree distributions and their applications', Physical Review E64(2), 026118. DOI: https://doi.org/10.1103/PhysRevE.64.026118
Pareja, A., Domeniconi, G., Chen, J., Ma, T., Suzumura, T., Kanezashi, H., Kaler, T., Schardl, T. B. & Leiserson, C. E. (2020), Evolvegcn: Evolving graph convolutional networks for dynamic graphs, in 'Proceedings of the AAAI Conference on Artificial Intelligence'. DOI: https://doi.org/10.1609/aaai.v34i04.5984
Perozzi, B., Al-Rfou, R. & Skiena, S. (2014), Deepwalk: Online learning of social representations, in 'Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining', pp. 701-710. DOI: https://doi.org/10.1145/2623330.2623732
Robins, G., Pattison, P., Kalish, Y. & Lusher, D. (2007), 'An introduction to exponential random graph (p*) models for social networks', Social Networks 29(2), 173-191. DOI: https://doi.org/10.1016/j.socnet.2006.08.002
Rossi, E., Kenlay, H., Gorinova, M., Bronstein, M. & Chamberlain, B. (2020), 'Temporal graph networks for deep learning on dynamic graphs', arXiv preprint arXiv:2006.10637.
Rumelhart, D. E. & McClelland, J. L. (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/5236.001.0001
Salter-Townshend, M. & Murphy, T. B. (2013), 'Variational bayesian inference for the latent position cluster model for network data', Computational Statistics & Data Analysis 57(1), 661-671. DOI: https://doi.org/10.1016/j.csda.2012.08.004
Silge, J. & Robinson, D. (2017), Text Mining with R: A Tidy Approach, O'Reilly Media, Inc., Sebastopol, CA. https://www.oreilly.com/library/view/textminingwith/9781491981658/
Skiena, S. S. (2008), The Algorithm Design Manual, Springer London, London. DOI: https://doi.org/10.1007/978-1-84800-070-4
Skiena, S. S. (2017), The Data Science Design Manual, Texts in Computer Science, Springer International Publishing, Cham. DOI: https://doi.org/10.1007/978-3-319-55444-0
Skvoretz, J. (1990), 'Biased net theory: Approximations, simulations and observations', Social Networks 12(3), 217-238. DOI: https://doi.org/10.1016/0378-8733(90)90006-U
Snijders, T. A. B. (2002), 'Markov chain monte carlo estimation of exponential random graph models', Journal of Social Structure 3(2), 1-40.
Sokolova, M. & Lapalme, G. (2009), 'A systematic analysis of performance measures for classification tasks', Information Processing & Management 45(4), 427-437. DOI: https://doi.org/10.1016/j.ipm.2009.03.002
Sosa, J. & Buitrago, L. (2021), 'A review of latent space models for social networks', Revista Colombiana de Estadística 44(1), 171-200. DOI: https://doi.org/10.15446/rce.v44n1.89369
Strauss, D. & Ikeda, M. (1990), 'Pseudolikelihood estimation for social networks', Journal of the American Statistical Association 85(409), 204-212. DOI: https://doi.org/10.1080/01621459.1990.10475327
van der Maaten, L. & Hinton, G. (2008), 'Visualizing data using t-sne', Journal of Machine Learning Research 9(Nov), 2579-2605.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C. & Philip, S. Y. (2021), 'A comprehensive survey on graph neural networks', IEEE Transactions on Neural Networks and Learning Systems 32(1), 4-24. DOI: https://doi.org/10.1109/TNNLS.2020.2978386
Xu, M. (2021), 'Understanding graph embedding methods and their applications', SIAM Review 63(4), 825-853. DOI: https://doi.org/10.1137/20M1386062
Yang, Z., Algesheimer, R. & Tessone, C. J. (2015), 'Evaluating link prediction methods', Knowledge-Based Systems 74, 87-96.
Yao, L., Mao, C. & Luo, Y. (2019), Graph convolutional networks for text classification, in 'Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence'. DOI: https://doi.org/10.1609/aaai.v33i01.33017370
Ying, Z., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. (2019), Gnnexplainer: Generating explanations for graph neural networks, in 'Advances in Neural Information Processing Systems (NeurIPS)'.
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z. & Shen, Y. (2020), Graph contrastive learning with augmentations, in 'Advances in Neural Information Processing Systems (NeurIPS)'.
Zhang, Z., Cui, P. & Zhu, W. (2020), 'Deep learning on graphs: A survey', IEEE Transactions on Knowledge and Data Engineering 34(1), 249-270. DOI: https://doi.org/10.1109/TKDE.2020.2981333
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).






