A New Method for Detecting Significant p-values with Applications to Genetic Data

Jorge Iván Vélez; Juan Carlos Correa; Mauricio Arcos-Burgos

doi:10.15446/rce.v37n1.44358

Published

2014-01-01

A New Method for Detecting Significant p-values with Applications to Genetic Data

Un nuevo método para la detección de valores p significativos y su aplicación a datos genéticos

DOI:

https://doi.org/10.15446/rce.v37n1.44358

Keywords:

Extreme values theory, p-value, Type I error probability, Multiple testing, Genetic data (en)
teoría de valores extremos, valor-p, probabilidad de error tipo I, comparaciones múltiples, datos genéticos (es)

Downloads

Authors

Jorge Iván Vélez Universidad Nacional de Australia / Universidad Nacional de Colombia - Sede Medellín
Juan Carlos Correa Universidad Nacional de Colombia - Sede Medellín
Mauricio Arcos-Burgos Universidad Nacional de Australia / Universidad de Antioquia

Abstract (en)
Abstract (es)

A new method for detecting significant p-values is described in this paper. This method, based on the distribution of the m-th order statistic of a U(0; 1) distribution, is shown to be suitable in applications where m ! 1 independent hypothesis are tested and it is of interest for a fixed type I error probability to determine those being significant while controlling the false positives. Equivalencies and comparisons between our method and others methods based-on p-values are also established, and a graphical representation of the distribution of the test statistic is depicted for different values of m. Finally, our proposal is illustrated with two microarray data sets.

Se describe una nuevo método para la detección de valores p significativos. Este método, basado en el m-ésimo estadístico de orden de la distribución U(0; 1), es adecuado en casos en los que se realizan m ! 1 pruebas de hipótesis independientes y es de interés determinar aquellas que son significativas, controlando los falsos positivos, para una probabilidad de error tipo I predeterminada. Adicionalmente, se realiza una comparación con algunas pruebas clásicas y se grafica la distribución del estadístico de prueba para diferentes valores de m. Finalmente se ilustra el uso de la metodología con dos conjuntos de datos provenientes de estudios con microarreglos.

https://doi.org/10.15446/rce.v37n1.44358

A New Method for Detecting Significant p-values with Applications to Genetic Data

Una nuevo método para la detección de valores p significativos y su aplicación a datos genéticos

JORGE IVÁN VÉLEZ¹, JUAN CARLOS CORREA², MAURICIO ARCOS-BURGOS³

¹The Australian National University, Genomics and Predicitive Medicine Group, Genome Biology Department, John Curtin School of Medical Research, Canberra, ACT, Australia. University of Antioquia, Group of Neurosciences, Medellín, Colombia. National University of Colombia, Research Group in Statistics, Medellín, Colombia. Ph.D Scholar. Email: jorge.velez@anu.edu.au
²National University of Colombia, Research Group in Statistics, Medellín, Colombia. National University of Colombia, Department of Statistics, Medellín, Colombia. Associate professor. Email: jccorrea@unal.edu.co
³The Australian National University, Genomics and Predicitive Medicine Group, Genome Biology Department, John Curtin School of Medical Research, Canberra, ACT, Australia. University of Antioquia, Group of Neurosciences, Medellín, Colombia. Associate professor. Email: mauricio.arcos-burgos@anu.edu.au

Abstract

A new method for detecting significant p-values is described in this paper. This method, based on the distribution of the m-th order statistic of a U(0,1) distribution, is shown to be suitable in applications where m\rightarrow ∞ independent hypothesis are tested and it is of interest for a fixed type I error probability to determine those being significant while controlling the false positives. Equivalencies and comparisons between our method and others methods based-on p-values are also established, and a graphical representation of the distribution of the test statistic is depicted for different values of m. Finally, our proposal is illustrated with two microarray data sets.

Key words: Extreme values theory, p-value, Type I error probability, Multiple testing, Genetic data.

Resumen

Se describe una nuevo método para la detección de valores p significativos. Este método, basado en el m-ésimo estadístico de orden de la distribución U(0,1), es adecuado en casos en los que se realizan m\rightarrow ∞ pruebas de hipótesis independientes y es de interés determinar aquellas que son significativas, controlando los falsos positivos, para una probabilidad de error tipo I predeterminada. Adicionalmente, se realiza una comparación con algunas pruebas clásicas y se grafica la distribución del estadístico de prueba para diferentes valores de m. Finalmente se ilustra el uso de la metodología con dos conjuntos de datos provenientes de estudios con microarreglos.

Palabras clave: teoría de valores extremos, valor-p, probabilidad de error tipo I, comparaciones múltiples, datos genéticos.

Texto completo disponible en PDF

References

1. Benjamini, Y. & Hochberg, Y. (1995), 'Controlling the false discovery rate: a practical and powerful approach to multiple testing', Journal of the Royal Statistical Society, Series B (Methodological) 57(1), 389-300.

2. Benjamini, Y. & Yekutieli, D. (2001), 'The control of the false discovery rate in multiple testing under dependency', Annals of Statistics 29(4), 1165 - 1188.

3. Bonferroni, C. E. (1935), 'Il calcolo delle assicurazioni su gruppi di teste', Studi in Onore del Professore Salvatore Ortu Carboni,, 13-60.

4. Casella, G. & Berger, R. (2001), Statistical Inference, 2 edn, Duxbury Press, United States of America.

5. Devroye, L. (1986), Non-Uniform Random Variate Generation, New York: Spring-Verlang.

6. Gentleman, R., Carey, V., Huber, W. & Hahne, F. (2011), genefilter: Methods for filtering genes from microarray experiments. R package version 1.34.0.

7. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. & Lander, E. (1999), 'Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring', Science 286, 531-537.

8. Liu, J. Z., Mcrae, A. F., Nyholt, D. R., Medland, S. E., Wray, N. R., Brown, K. M., Hayward, N. K., Montgomery, G. W., Visscher, P. M., Martin, N. G. & Macgregor, S. (2010), 'A versatile gene-based test for genome-wide association studies', The American Journal of Human Genetics 87(1), 139 - 145.

9. Manolio, T. A. (2010), 'Genomewide association studies and assessment of the risk of disease', New England Journal of Medicine 363(2), 166-176.

10. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstráaale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D. & Groop, L. C. (2003), 'Pgc-1álpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes', Nature Genetics 34(3), 267-73.

11. Murdoch, D., Tsai, Y. & Adcock, J. (2008), 'P-values are random variables', The American Statistician 62(3), 242-245.

12. Nyholt, D. R. (2004), 'A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other', The American Journal of Human Genetics 74(4), 765 - 769.

13. Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S. & Dudoit, S. (2011), multtest: Resampling-based multiple hypothesis testing. R package version 2.8.0.

14. R Core Team, (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org/

15. Sackrowitz, H. & Samuel-Cahn, E. (1999), 'P Values as Random Variables-Expected P Values', The American Statistician 53(4), 326-331.

16. Serfling, R. (1980), Approximation Theorems of Mathematical Statistics, John Wiley & Sons, United States of America.

17. Shaffer, J. P. (1995), 'Multiple hypothesis testing', Annual Review of Psychology 46, 561-584.

[Recibido en noviembre de 2012. Aceptado en enero de 2014]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv37n1a05,

     AUTHOR  = {Vélez, Jorge Iván and Correa, Juan Carlos and Arcos-Burgos, Mauricio},

     TITLE   = {{A New Method for Detecting Significant p-values with Applications to Genetic Data}},

     JOURNAL = {Revista Colombiana de Estadística},

    YEAR    = {2014},

    volume  = {37},

    number  = {1},

    pages   = {69-78}

}

References

Benjamini, Y. & Hochberg, Y. (1995), 'Controlling the false discovery rate: a practical and powerful approach to multiple testing', Journal of the Royal Statistical Society, Series B (Methodological) 57(1), 389-300.

Benjamini, Y. & Yekutieli, D. (2001), 'The control of the false discovery rate in multiple testing under dependency', Annals of Statistics 29(4), 1165 - 1188.

Bonferroni, C. E. (1935), 'Il calcolo delle assicurazioni su gruppi di teste', Studi in Onore del Professore Salvatore Ortu Carboni,, 13-60.

Casella, G. & Berger, R. (2001), Statistical Inference, 2 edn, Duxbury Press, United States of America.

Devroye, L. (1986), Non-Uniform Random Variate Generation, New York: Spring-Verlang.

Gentleman, R., Carey, V., Huber, W. & Hahne, F. (2011), genefilter: Methods for filtering genes from microarray experiments. R package version 1.34.0.

Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. & Lander, E. (1999), 'Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring', Science 286, 531-537.

Liu, J. Z., Mcrae, A. F., Nyholt, D. R., Medland, S. E., Wray, N. R., Brown, K. M., Hayward, N. K., Montgomery, G. W., Visscher, P. M., Martin, N. G. & Macgregor, S. (2010), 'A versatile gene-based test for genome-wide association studies', The American Journal of Human Genetics 87(1), 139 - 145.

Manolio, T. A. (2010), 'Genomewide association studies and assessment of the risk of disease', New England Journal of Medicine 363(2), 166-176.

Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstráaale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D. & Groop, L. C. (2003), 'Pgc-1álpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes', Nature Genetics 34(3), 267-73.

Murdoch, D., Tsai, Y. & Adcock, J. (2008), 'P-values are random variables', The American Statistician 62(3), 242-245.

Nyholt, D. R. (2004), 'A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other', The American Journal of Human Genetics 74(4), 765 - 769.

Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S. & Dudoit, S. (2011), multtest: Resampling-based multiple hypothesis testing. R package version 2.8.0.

R Core Team, (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org/

Sackrowitz, H. & Samuel-Cahn, E. (1999), 'P Values as Random Variables-Expected P Values', The American Statistician 53(4), 326-331.

Serfling, R. (1980), Approximation Theorems of Mathematical Statistics, John Wiley & Sons, United States of America.

Shaffer, J. P. (1995), 'Multiple hypothesis testing', Annual Review of Psychology 46, 561-584.

How to Cite

APA

Vélez, J. I., Correa, J. C. & Arcos-Burgos, M. (2014). A New Method for Detecting Significant p-values with Applications to Genetic Data. Revista Colombiana de Estadística, 37(1), 69–78. https://doi.org/10.15446/rce.v37n1.44358

ACM

[1]

Vélez, J.I., Correa, J.C. and Arcos-Burgos, M. 2014. A New Method for Detecting Significant p-values with Applications to Genetic Data. Revista Colombiana de Estadística. 37, 1 (Jan. 2014), 69–78. DOI:https://doi.org/10.15446/rce.v37n1.44358.

ACS

(1)

Vélez, J. I.; Correa, J. C.; Arcos-Burgos, M. A New Method for Detecting Significant p-values with Applications to Genetic Data. Rev. colomb. estad. 2014, 37, 69-78.

ABNT

VÉLEZ, J. I.; CORREA, J. C.; ARCOS-BURGOS, M. A New Method for Detecting Significant p-values with Applications to Genetic Data. Revista Colombiana de Estadística, [S. l.], v. 37, n. 1, p. 69–78, 2014. DOI: 10.15446/rce.v37n1.44358. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/44358. Acesso em: 13 may. 2026.

Chicago

Vélez, Jorge Iván, Juan Carlos Correa, and Mauricio Arcos-Burgos. 2014. “A New Method for Detecting Significant p-values with Applications to Genetic Data”. Revista Colombiana De Estadística 37 (1):69-78. https://doi.org/10.15446/rce.v37n1.44358.

Harvard

Vélez, J. I., Correa, J. C. and Arcos-Burgos, M. (2014) “A New Method for Detecting Significant p-values with Applications to Genetic Data”, Revista Colombiana de Estadística, 37(1), pp. 69–78. doi: 10.15446/rce.v37n1.44358.

IEEE

[1]

J. I. Vélez, J. C. Correa, and M. Arcos-Burgos, “A New Method for Detecting Significant p-values with Applications to Genetic Data”, Rev. colomb. estad., vol. 37, no. 1, pp. 69–78, Jan. 2014.

MLA

Vélez, J. I., J. C. Correa, and M. Arcos-Burgos. “A New Method for Detecting Significant p-values with Applications to Genetic Data”. Revista Colombiana de Estadística, vol. 37, no. 1, Jan. 2014, pp. 69-78, doi:10.15446/rce.v37n1.44358.

Turabian

Vélez, Jorge Iván, Juan Carlos Correa, and Mauricio Arcos-Burgos. “A New Method for Detecting Significant p-values with Applications to Genetic Data”. Revista Colombiana de Estadística 37, no. 1 (January 1, 2014): 69–78. Accessed May 13, 2026. https://revistas.unal.edu.co/index.php/estad/article/view/44358.

Vancouver

1.

Vélez JI, Correa JC, Arcos-Burgos M. A New Method for Detecting Significant p-values with Applications to Genetic Data. Rev. colomb. estad. [Internet]. 2014 Jan. 1 [cited 2026 May 13];37(1):69-78. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/44358

Download Citation

CrossRef Cited-by

10

1. J I Vélez, F Lopera, D Sepulveda-Falla, H R Patel, A S Johar, A Chuah, C Tobón, D Rivera, A Villegas, Y Cai, K Peng, R Arkell, F X Castellanos, S J Andrews, M F Silva Lara, P K Creagh, S Easteal, J de Leon, M L Wong, J Licinio, C A Mastronardi, M Arcos-Burgos. (2016). APOE*E2 allele delays age of onset in PSEN1 E280A Alzheimer’s disease. Molecular Psychiatry, 21(7), p.916. https://doi.org/10.1038/mp.2015.177.

2. Juan Pablo Acosta, Silvia Restrepo, Juan David Henao, Liliana López-Kleine. (2019). Multivariate Method for Inferential Identification of Differentially Expressed Genes in Gene Expression Experiments. Journal of Computational Biology, 26(8), p.866. https://doi.org/10.1089/cmb.2018.0013.

3. Diego Sepulveda‐Falla, Jorge I. Vélez, Natalia Acosta‐Baena, Ana Baena, Sonia Moreno, Susanne Krasemann, Francisco Lopera, Claudio A. Mastronardi, Mauricio Arcos‐Burgos. (2024). Genetic modifiers of cognitive decline in PSEN1 E280A Alzheimer's disease. Alzheimer's & Dementia, 20(4), p.2873. https://doi.org/10.1002/alz.13754.

4. Jorge I. Vélez, Luiggi A. Samper, Mauricio Arcos-Holzinger, Lady G. Espinosa, Mario A. Isaza-Ruget, Francisco Lopera, Mauricio Arcos-Burgos. (2021). A Comprehensive Machine Learning Framework for the Exact Prediction of the Age of Onset in Familial and Sporadic Alzheimer’s Disease. Diagnostics, 11(5), p.887. https://doi.org/10.3390/diagnostics11050887.

5. Marcela Henriquez-Henriquez, Maria T. Acosta, Ariel F. Martinez, Jorge I. Vélez, Francisco Lopera, David Pineda, Juan D. Palacio, Teresa Quiroga, Tilla S. Worgall, Richard J. Deckelbaum, Claudio Mastronardi, Brooke S. G. Molina, Benedetto Vitiello, Joanne B. Severe, Peter S. Jensen, L. Eugene Arnold, Kimberly Hoagwood, John Richters, Donald R. Vereen, Stephen P. Hinshaw, Glen R. Elliott, Karen C. Wells, Jeffery N. Epstein, Desiree W. Murray, C. Keith Conners, John March, James Swanson, Timothy Wigal, Dennis P. Cantwell, Howard B. Abikoff, Lily Hechtman, Laurence L. Greenhill, Jeffrey H. Newcorn, Brooke S. G. Molina, Betsy Hoza, William E. Pelham, Robert D. Gibbons, Sue Marcus, Kwan Hur, Helena C. Kraemer, Thomas Hanley, Karen Stern, Mauricio Arcos-Burgos, Maximilian Muenke. (2020). Mutations in sphingolipid metabolism genes are associated with ADHD. Translational Psychiatry, 10(1) https://doi.org/10.1038/s41398-020-00881-8.

6. Jorge I Vélez, Cameron A Jack, Aaron Chuah, Bob Buckley, Juan C Correa, Simon Easteal, Mauricio Arcos-Burgos. (2015). Cross validation of pooling/resampling GWAS using the WTCCC data. Molecular Biology and Genetic Engineering, 3(1), p.1. https://doi.org/10.7243/2053-5767-3-1.

7. Jorge I. Vélez, Francisco Lopera, Claudia T. Silva, Andrés Villegas, Lady G. Espinosa, Oscar M. Vidal, Claudio A. Mastronardi, Mauricio Arcos-Burgos. (2020). Familial Alzheimer’s Disease and Recessive Modifiers. Molecular Neurobiology, 57(2), p.1035. https://doi.org/10.1007/s12035-019-01798-0.

8. Martha L. Cervantes-Henriquez, Johan E. Acosta-López, Mostapha Ahmad, Manuel Sánchez-Rojas, Giomar Jiménez-Figueroa, Wilmar Pineda-Alhucema, Martha L. Martinez-Banfi, Luz M. Noguera-Machacón, Elsy Mejía-Segura, Moisés De La Hoz, Mauricio Arcos-Holzinger, David A. Pineda, Pedro J. Puentes-Rozo, Mauricio Arcos-Burgos, Jorge I. Vélez. (2021). ADGRL3, FGF1 and DRD4: Linkage and Association with Working Memory and Perceptual Organization Candidate Endophenotypes in ADHD. Brain Sciences, 11(7), p.854. https://doi.org/10.3390/brainsci11070854.

9. Jorge I. Vélez, Dora Rivera, Claudio A. Mastronardi, Hardip R. Patel, Carlos Tobón, Andrés Villegas, Yeping Cai, Simon Easteal, Francisco Lopera, Mauricio Arcos-Burgos. (2016). A Mutation inDAOAModifies the Age of Onset inPSEN1E280A Alzheimer’s Disease. Neural Plasticity, 2016, p.1. https://doi.org/10.1155/2016/9760314.

10. Raydonal Ospina, Fernando Marmolejo-Ramos. (2019). Performance of Some Estimators of Relative Variability. Frontiers in Applied Mathematics and Statistics, 5 https://doi.org/10.3389/fams.2019.00043.

Dimensions

PlumX

Article abstract page views

775

Downloads

Download data is not yet available.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

	IBN Publindex
	El Índice Bibliográfico Nacional Publindex es un sistema colombiano para la clasificación, actualización, escalafonamiento y certificación de las publicaciones científicas y tecnológicas. Es regido por COLCIENCIAS y el ICFES en Colombia.
	SciELO Colombia
	SciELO Colombia es una librería virtual para América Latina, el Caribe, España y Portugal, fue creada por FAPESP en el año de 1997 en Sao Pablo Brasil, actualmente en Colombia es gestionada por la Universidad Nacional de Colombia.
	REDIB
	Portal donde se muestran las revistas electrónicas españolas y latinoamericanas de acceso abierto (Open Access). Fue creado en España.
	Scopus
	Scopus es una base de datos bibliográfica de resúmenes y citas de artículos de revistas científicas. Cubre aproximadamente 19.500 títulos de más de 5.000 editores internacionales, incluyendo la cobertura de de 16.500 revistas.
	Latindex
	Latindex es producto de la cooperación de una red de instituciones latinoamericanas que funcionan de manera coordinada para reunir y diseminar información bibliográfica sobre las publicaciones científicas seriadas producidas en la región.
	Dialnet
	Dialnet es un portal de difusión de la producción científica hispana que inició su funcionamiento en el año 2001 especializado en ciencias humanas y sociales. Su base de datos, de acceso libre, fue creada por la Universidad de La Rioja (España).
	Zentralblatt Math
	Zentralblatt MATH (zbMATH) es el servicio de resumen y revisión más completo y de más larga duración del mundo en matemática pura y aplicada. Está editado por la European Mathematical Society (EMS), la Academia de Ciencias y Humanidades de Heidelberg y FIZ Karlsruhe. El trabajo editorial lo realiza la oficina de Berlín de FIZ Karlsruhe que, como miembro de la Asociación Leibniz, es una empresa sin fines de lucro y una organización reconocida de interés público. zbMATH es distribuido por Springer Nature.

Revista Colombiana de Estadística

Published

A New Method for Detecting Significant p-values with Applications to Genetic Data

Un nuevo método para la detección de valores p significativos y su aplicación a datos genéticos

DOI:

Keywords:

Downloads

Authors

References

How to Cite

APA

ACM

ACS

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

CrossRef Cited-by

Dimensions

PlumX

Article abstract page views

Downloads

License

Make a Submission

Information for Authors

Scimago Journal & Country Rank (SJR)

Keywords