Published

2025-07-01

A Unified Approach to Link Prediction in Collaboration Networks

Un enfoque unificado para la predicción de enlaces en redes de colaboración

DOI:

https://doi.org/10.15446/rce.v48n2.117558

Keywords:

Collaboration networks, Exponential random graph model, Graph convolutional network, Word2Vec, Social networks analysis. (en)
Redes de colaboración, Modelo exponencial de grafos aleatorios, Red de convolución sobre grafos, Word2Vec, Análisis de redes sociales. (es)

Downloads

Authors

  • Juan Sosa Universidad Nacional de Colombia
  • Diego Martínez Universidad Nacional de Colombia
  • Nicolás Guerrero Universidad Nacional de Colombia

This article investigates and compares three approaches to link prediction in colaboration networks, namely, an ERGM (Exponential Random Graph Model; Robins et al. 2007), a GCN (Graph Convolutional Network; Kipf & Welling 2017), and a Word2Vec+MLP model (Word2Vec model combined with a multilayer neural network; Mikolov, Chen, Corrado & Dean 2013 and Goodfellow et al. 2016). The ERGM, grounded in statistical methods, is employed to capture general structural patterns within the network, while the GCN andWord2Vec+MLP models leverage deep learning techniques to learn adaptive structural representations of nodes and their relationships. The predictive performance of the models is assessed through extensive simulation exercises using cross-validation, with metrics based on the receiver operating characteristic curve. The results clearly show the superiority of machine learning approaches in link prediction, particularly in large networks, where traditional models such as ERGM exhibit limitations in scalability and the ability to capture inherent complexities. These findings highlight the potential benefits of integrating statistical modeling techniques with deep learning methods to analyze complex networks, providing a more robust and effective framework for future research in this field.

Este artículo investiga y compara tres enfoques para la predicción de enlaces en redes de colaboración: un ERGM (Exponential Random Graph Model; Robins et al., 2007), una GCN (Graph Convolutional Network; Kipf & Welling, 2017) y un modelo Word2Vec+MLP (modelo Word2Vec combinado con una red neuronal multicapa; Mikolov, Chen, Corrado & Dean (2013), y Goodfellow et al. (2016)). El ERGM, basado en métodos estadísticos, se emplea para capturar patrones estructurales generales dentro de la red, mientras que los modelos GCN y Word2Vec+MLP utilizan técnicas de aprendizaje profundo para aprender representaciones estructurales adaptativas de los nodos y sus relaciones. El desempeño predictivo de los modelos se evalúa mediante extensos ejercicios de simulación con validación cruzada, utilizando métricas basadas en la curva característica operativa del receptor (ROC). Los resultados muestran claramente la superioridad de los enfoques de aprendizaje automático en la predicción de enlaces, particularmente en redes grandes, donde los modelos tradicionales como el ERGM presentan limitaciones en escalabilidad y en la capacidad de capturar complejidades inherentes. Estos hallazgos resaltan los posibles beneficios de integrar técnicas de modelado estadístico con métodos de aprendizaje profundo para analizar redes complejas, proporcionando un marco más robusto y efectivo para futuras investigaciones en este campo.

References

Amarasinghe, S. et al. (2024), Explainable Artificial Intelligence: Second World Conference, xAI 2024, Springer. https://www.springer.com/

Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S. & Hsieh, C.-J. (2019), Clustergcn: An efficient algorithm for training deep and large graph convolutional networks, in 'Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining', KDD '19, Association for Computing Machinery, New York, NY, USA, p. 257-266. DOI: https://doi.org/10.1145/3292500.3330925

Davis, J. & Goadrich, M. (2006), The relationship between precision-recall and roc curves, in 'Proceedings of the 23rd International Conference on Machine Learning', ACM, pp. 233-240. DOI: https://doi.org/10.1145/1143844.1143874

Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A. & Adams, R. P. (2015), 'Convolutional Networks on Graphs for Learning Molecular Fingerprints'.

Erdos, P. & Rényi, A. (1960), 'On the evolution of random graphs', Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.

Fawcett, T. (2006), 'An introduction to roc analysis', Pattern Recognition Letters 27(8), 861-874. DOI: https://doi.org/10.1016/j.patrec.2005.10.010

Gamerman, D. & Lopes, H. F. (2006), Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2 edn, Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781482296426

Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press. http://www.deeplearningbook.org

Grover, A. & Leskovec, J. (2016), node2vec: Scalable feature learning for networks, in 'Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining', pp. 855-864. DOI: https://doi.org/10.1145/2939672.2939754

Hamilton, W. L., Ying, R. & Leskovec, J. (2017a), Inductive representation learning on large graphs, in 'Advances in Neural Information Processing Systems (NeurIPS)'.

Hamilton, W. L., Ying, R. & Leskovec, J. (2017b), 'Representation learning on graphs: Methods and applications', IEEE Data Engineering Bulletin 40(3), 52-74.

Handcock, M., Hunter, D., Butts, C., Goodreau, S. & Morris, M. (2008), 'Statnet: Software tools for the representation, visualization, analysis and simulation of network data', Journal of statistical software 24, 1548-7660. DOI: https://doi.org/10.18637/jss.v024.i01

Hoff, P. (2007), 'Modeling homophily and stochastic equivalence in symmetric relational data', Advances in neural information processing systems 20.

Hoff, P. D., Raftery, A. E. & Handcock, M. S. (2002), 'Latent space approaches to social network analysis', Journal of the american Statistical association 97(460), 1090-1098. DOI: https://doi.org/10.1198/016214502388618906

Kipf, T. N. & Welling, M. (2017), Semi-supervised classification with graph convolutional networks, in 'Proceedings of the International Conference on Learning Representations (ICLR)'.

Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y. & Porter, M. A. (2014), 'Multilayer networks', Journal of Complex Networks 2(3), 203-271. DOI: https://doi.org/10.1093/comnet/cnu016

Kolaczyk, E. D. & Csárdi, G. (2020), Statistical analysis of network data with R, Use R!, 2nd ed edn, Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-44129-6

Lee, Y., Lee, I. W. & Feiock, R. C. (2012), 'Interorganizational Collaboration Networks in Economic Development Policy: An Exponential Random Graph Model Analysis*', Policy Studies Journal 40(3), 547-573. DOI: https://doi.org/10.1111/j.1541-0072.2012.00464.x

Lu, L. & Zhou, T. (2011), 'Link prediction in complex networks: A survey', Physica A: Statistical Mechanics and Its Applications 390(6), 1150-1170. DOI: https://doi.org/10.1016/j.physa.2010.11.027

Luke, D. (2015), A User's Guide to Network Analysis in R, Use R!, Springer International Publishing, Cham. DOI: https://doi.org/10.1007/978-3-319-23883-8

Lusher, D., Koskinen, J. & Robins, G. (2013), Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications, Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511894701

Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013), Efficient estimation of word representations in vector space, in 'Proceedings of the International Conference on Learning Representations (ICLR)'.

Mikolov, T., Yih, W.-t. & Zweig, G. (2013), 'Linguistic regularities in continuous space word representations', Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies pp. 746-751.

Newman, M. E. J. (2001), 'The structure of scientific collaboration networks', Proceedings of the National Academy of Sciences 98(2), 404-409. DOI: https://doi.org/10.1073/pnas.021544898

Newman, M. E. J., Strogatz, S. H. & Watts, D. J. (2001), 'Random graphs with arbitrary degree distributions and their applications', Physical Review E64(2), 026118. DOI: https://doi.org/10.1103/PhysRevE.64.026118

Pareja, A., Domeniconi, G., Chen, J., Ma, T., Suzumura, T., Kanezashi, H., Kaler, T., Schardl, T. B. & Leiserson, C. E. (2020), Evolvegcn: Evolving graph convolutional networks for dynamic graphs, in 'Proceedings of the AAAI Conference on Artificial Intelligence'. DOI: https://doi.org/10.1609/aaai.v34i04.5984

Perozzi, B., Al-Rfou, R. & Skiena, S. (2014), Deepwalk: Online learning of social representations, in 'Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining', pp. 701-710. DOI: https://doi.org/10.1145/2623330.2623732

Robins, G., Pattison, P., Kalish, Y. & Lusher, D. (2007), 'An introduction to exponential random graph (p*) models for social networks', Social Networks 29(2), 173-191. DOI: https://doi.org/10.1016/j.socnet.2006.08.002

Rossi, E., Kenlay, H., Gorinova, M., Bronstein, M. & Chamberlain, B. (2020), 'Temporal graph networks for deep learning on dynamic graphs', arXiv preprint arXiv:2006.10637.

Rumelhart, D. E. & McClelland, J. L. (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/5236.001.0001

Salter-Townshend, M. & Murphy, T. B. (2013), 'Variational bayesian inference for the latent position cluster model for network data', Computational Statistics & Data Analysis 57(1), 661-671. DOI: https://doi.org/10.1016/j.csda.2012.08.004

Silge, J. & Robinson, D. (2017), Text Mining with R: A Tidy Approach, O'Reilly Media, Inc., Sebastopol, CA. https://www.oreilly.com/library/view/textminingwith/9781491981658/

Skiena, S. S. (2008), The Algorithm Design Manual, Springer London, London. DOI: https://doi.org/10.1007/978-1-84800-070-4

Skiena, S. S. (2017), The Data Science Design Manual, Texts in Computer Science, Springer International Publishing, Cham. DOI: https://doi.org/10.1007/978-3-319-55444-0

Skvoretz, J. (1990), 'Biased net theory: Approximations, simulations and observations', Social Networks 12(3), 217-238. DOI: https://doi.org/10.1016/0378-8733(90)90006-U

Snijders, T. A. B. (2002), 'Markov chain monte carlo estimation of exponential random graph models', Journal of Social Structure 3(2), 1-40.

Sokolova, M. & Lapalme, G. (2009), 'A systematic analysis of performance measures for classification tasks', Information Processing & Management 45(4), 427-437. DOI: https://doi.org/10.1016/j.ipm.2009.03.002

Sosa, J. & Buitrago, L. (2021), 'A review of latent space models for social networks', Revista Colombiana de Estadística 44(1), 171-200. DOI: https://doi.org/10.15446/rce.v44n1.89369

Strauss, D. & Ikeda, M. (1990), 'Pseudolikelihood estimation for social networks', Journal of the American Statistical Association 85(409), 204-212. DOI: https://doi.org/10.1080/01621459.1990.10475327

van der Maaten, L. & Hinton, G. (2008), 'Visualizing data using t-sne', Journal of Machine Learning Research 9(Nov), 2579-2605.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C. & Philip, S. Y. (2021), 'A comprehensive survey on graph neural networks', IEEE Transactions on Neural Networks and Learning Systems 32(1), 4-24. DOI: https://doi.org/10.1109/TNNLS.2020.2978386

Xu, M. (2021), 'Understanding graph embedding methods and their applications', SIAM Review 63(4), 825-853. DOI: https://doi.org/10.1137/20M1386062

Yang, Z., Algesheimer, R. & Tessone, C. J. (2015), 'Evaluating link prediction methods', Knowledge-Based Systems 74, 87-96.

Yao, L., Mao, C. & Luo, Y. (2019), Graph convolutional networks for text classification, in 'Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence'. DOI: https://doi.org/10.1609/aaai.v33i01.33017370

Ying, Z., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. (2019), Gnnexplainer: Generating explanations for graph neural networks, in 'Advances in Neural Information Processing Systems (NeurIPS)'.

You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z. & Shen, Y. (2020), Graph contrastive learning with augmentations, in 'Advances in Neural Information Processing Systems (NeurIPS)'.

Zhang, Z., Cui, P. & Zhu, W. (2020), 'Deep learning on graphs: A survey', IEEE Transactions on Knowledge and Data Engineering 34(1), 249-270. DOI: https://doi.org/10.1109/TKDE.2020.2981333

How to Cite

APA

Sosa, J., Martínez, D. & Guerrero, N. (2025). A Unified Approach to Link Prediction in Collaboration Networks. Revista Colombiana de Estadística, 48(2), 115–137. https://doi.org/10.15446/rce.v48n2.117558

ACM

[1]
Sosa, J., Martínez, D. and Guerrero, N. 2025. A Unified Approach to Link Prediction in Collaboration Networks. Revista Colombiana de Estadística. 48, 2 (Jul. 2025), 115–137. DOI:https://doi.org/10.15446/rce.v48n2.117558.

ACS

(1)
Sosa, J.; Martínez, D.; Guerrero, N. A Unified Approach to Link Prediction in Collaboration Networks. Rev. colomb. estad. 2025, 48, 115-137.

ABNT

SOSA, J.; MARTÍNEZ, D.; GUERRERO, N. A Unified Approach to Link Prediction in Collaboration Networks. Revista Colombiana de Estadística, [S. l.], v. 48, n. 2, p. 115–137, 2025. DOI: 10.15446/rce.v48n2.117558. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/117558. Acesso em: 15 nov. 2025.

Chicago

Sosa, Juan, Diego Martínez, and Nicolás Guerrero. 2025. “ A Unified Approach to Link Prediction in Collaboration Networks”. Revista Colombiana De Estadística 48 (2):115-37. https://doi.org/10.15446/rce.v48n2.117558.

Harvard

Sosa, J., Martínez, D. and Guerrero, N. (2025) “ A Unified Approach to Link Prediction in Collaboration Networks”, Revista Colombiana de Estadística, 48(2), pp. 115–137. doi: 10.15446/rce.v48n2.117558.

IEEE

[1]
J. Sosa, D. Martínez, and N. Guerrero, “ A Unified Approach to Link Prediction in Collaboration Networks”, Rev. colomb. estad., vol. 48, no. 2, pp. 115–137, Jul. 2025.

MLA

Sosa, J., D. Martínez, and N. Guerrero. “ A Unified Approach to Link Prediction in Collaboration Networks”. Revista Colombiana de Estadística, vol. 48, no. 2, July 2025, pp. 115-37, doi:10.15446/rce.v48n2.117558.

Turabian

Sosa, Juan, Diego Martínez, and Nicolás Guerrero. “ A Unified Approach to Link Prediction in Collaboration Networks”. Revista Colombiana de Estadística 48, no. 2 (July 8, 2025): 115–137. Accessed November 15, 2025. https://revistas.unal.edu.co/index.php/estad/article/view/117558.

Vancouver

1.
Sosa J, Martínez D, Guerrero N. A Unified Approach to Link Prediction in Collaboration Networks. Rev. colomb. estad. [Internet]. 2025 Jul. 8 [cited 2025 Nov. 15];48(2):115-37. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/117558

Download Citation

CrossRef Cited-by

CrossRef citations0

Dimensions

PlumX

Article abstract page views

190

Downloads

Download data is not yet available.