New criteria for the choice of training sample size for model selection and prediction: the cubic root rule

Israel Almodovar; Luis Pericchi

Publicado

2012-01-01

New criteria for the choice of training sample size for model selection and prediction: the cubic root rule

Un nuevo criterio para la elección del tamaño de la muestra de entrenamiento para la selección de modelos y de predicción: la regla de la raíz cúbica

Palabras clave:

5% cubic root rule, intrinsec priors, objective bayesian hypothesis testing, training sample size (en)
Regla de la raiz cubica 5%, apriori intrinseca, pruebas de hipótesis bayesiana objetivas, tamaño de muestra de entrenamiento (es)

Descargas

PDF

Autores/as

Israel Almodovar Department of Statistics, Iowa State University, Ames, IA 50011
Luis Pericchi Department of Mathematics, University of Puerto Rico, Río Piedras Campus, PR 00936-8377

Resumen (en)
Resumen (es)

The size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sample M, can be minimal (for the identification of the sampling model) or maximal (for optimal prediction of future data). We suggest a compromise that seems to work well whatever the purpose of the analysis: the 5% cubic root rule: M=min[0.05*n, n^{1/3}]. We proceed to define a comprehensive loss function that combines identification errors and prediction errors, appropriately standardized. We find that the very simple cubic root rule is extremely close to an over- all optimum for a wide selection of sample sizes and cutting points that define the decision rules. The first time that the cubic root has been proposed is in Pericchi (2010). This article propose to generalize the rule and to take full statistical advantage for realistic situations. Another way to look at the rule, is as a synthesis of the rationale that justify both AIC and BIC.

El tamaño de una muestra de entrenamiento en la selección y prueba en Bayesiana objetiva es un problema central en la teoría y en la práctica. Nos concentraremos en muestras de entrenamiento simuladas y en pruebas de hipótesis simples. El resultado impactante es que, aun en las situaciones más simples, la muestra de entrenamiento M óptima puede ser minimal (para la identificación del modelo muestral) o maximal (para la predicción óptima de datos futuros). Se sugiere un compromiso que parece funcionar bien para cualquier propósito del análisis: la regla de la raíz cúbica del 5%: M=min[0.05*n, n^{1/3}]. Se procede a definir una función de pérdida comprehensiva que combina los errores de identificación y los errores de predicción, estandarizados apropiadamente. Se halla que la regla de la raíz cúbica simple es cercana en extremo a un óptimo general para una amplia selección de tamaños muestrales y puntos de corte que definen las reglas de decisión. La primera vez que se ha propuesto la raíz cúbica fue en Pericchi(2010). Este artículo propone generalizar la regla y tomar una ventaja estadística completa para situaciones reales. Otra forma de ver la regla es como una síntesis de la racionalidad que justifica tanto el AIC como el BIC.

Referencias

Abramowitz, M. ; Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, Inc.

Berger J.O. and Pericchi L.R. (1996a) The Intrinsic Bayes Factor for Model Selection and Prediction. Jour, Amer. Statist. Ass., 91, 109-122.

Berger J.O. and Pericchi L.R. (1996b) The Intrinsic Bayes Factor for

Linear Models. Bayesian Statistics 5, Bernardo et al. eds,

Oxford University Press, 23-42.

Casella G. and Moreno E. (2009). Assessing Robustness of Intrinsic Test of Independence in Two-way Contingency Tables. Tech Report.

Chakrabarti A. and Ghosh J.(2007). Some Aspects of Bayesian Model Selection for Prediction. Bayesian Statistics, 8, 51-90.

Kass R.E. and Wasserman L. (1995). A Reference Bayesian Test for

Nested Hypothesis and its Relationship with Schwarz Criterion.

Jour, Amer. Statist. Ass. 90, 928-934.

Pericchi, L.R. (2010). How large should be the training sample? Invited Chapter in the book: ''Frontiers of Decision Making and Bayesian Analysis. In Honor of James O. Berger'', Chen MH et al editors. Springer. (In press).

Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A (2002), 'Bayesian Measures of Model Complexity and Fit (with Discussion), Journal of the Royal Statistical Society, Series B, 64(4):583-616, and in The BUGS Project DIC www.mrc-bsu.cam.ac.uk/bugs/winbugs/dicpage.shtml.

Cómo citar

APA

Almodovar, I. & Pericchi, L. (2012). New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias, 1(1), 7–22. https://revistas.unal.edu.co/index.php/rfc/article/view/48975

ACM

[1]

Almodovar, I. y Pericchi, L. 2012. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias. 1, 1 (ene. 2012), 7–22.

ACS

(1)

Almodovar, I.; Pericchi, L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Rev. Fac. Cienc. 2012, 1, 7-22.

ABNT

ALMODOVAR, I.; PERICCHI, L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias, [S. l.], v. 1, n. 1, p. 7–22, 2012. Disponível em: https://revistas.unal.edu.co/index.php/rfc/article/view/48975. Acesso em: 28 feb. 2026.

Chicago

Almodovar, Israel, y Luis Pericchi. 2012. «New criteria for the choice of training sample size for model selection and prediction: the cubic root rule». Revista De La Facultad De Ciencias 1 (1):7-22. https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Harvard

Almodovar, I. y Pericchi, L. (2012) «New criteria for the choice of training sample size for model selection and prediction: the cubic root rule», Revista de la Facultad de Ciencias, 1(1), pp. 7–22. Disponible en: https://revistas.unal.edu.co/index.php/rfc/article/view/48975 (Accedido: 28 febrero 2026).

IEEE

[1]

I. Almodovar y L. Pericchi, «New criteria for the choice of training sample size for model selection and prediction: the cubic root rule», Rev. Fac. Cienc., vol. 1, n.º 1, pp. 7–22, ene. 2012.

MLA

Almodovar, I., y L. Pericchi. «New criteria for the choice of training sample size for model selection and prediction: the cubic root rule». Revista de la Facultad de Ciencias, vol. 1, n.º 1, enero de 2012, pp. 7-22, https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Turabian

Almodovar, Israel, y Luis Pericchi. «New criteria for the choice of training sample size for model selection and prediction: the cubic root rule». Revista de la Facultad de Ciencias 1, no. 1 (enero 1, 2012): 7–22. Accedido febrero 28, 2026. https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Vancouver

1.

Almodovar I, Pericchi L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Rev. Fac. Cienc. [Internet]. 1 de enero de 2012 [citado 28 de febrero de 2026];1(1):7-22. Disponible en: https://revistas.unal.edu.co/index.php/rfc/article/view/48975

Descargar cita

Visitas a la página del resumen del artículo

438

Descargas

Los datos de descargas todavía no están disponibles.

Licencia

Los autores o titulares del derecho de autor de cada artículo confieren a la Revista de la Facultad de Ciencias de la Universidad Nacional de Colombia una autorización no exclusiva, limitada y gratuita sobre el artículo que una vez evaluado y aprobado se envía para su posterior publicación ajustándose a las siguientes características:

1. Se remite la versión corregida de acuerdo con las sugerencias de los evaluadores y se aclara que el artículo mencionado se trata de un documento inédito sobre el que se tienen los derechos que se autorizan y se asume total responsabilidad por el contenido de su obra ante la Revista de la Facultad de Ciencias, la Universidad Nacional de Colombia y ante terceros.

2. La autorización conferida a la revista estará vigente a partir de la fecha en que se incluye en el volumen y número respectivo de la Revista de la Facultad de Ciencias en el Sistema Open Journal Systems y en la página principal de la revista (https://revistas.unal.edu.co/index.php/rfc/index), así como en las diferentes bases e índices de datos en que se encuentra indexada la publicación.

3. Los autores autorizan a la Revista de la Facultad de Ciencias de la Universidad Nacional de Colombia para publicar el documento en el formato en que sea requerido (impreso, digital, electrónico o cualquier otro conocido o por conocer) y autorizan a la Revista de la Facultad de Ciencias para incluir la obra en los índices y buscadores que estimen necesarios para promover su difusión.

4. Los autores aceptan que la autorización se hace a título gratuito, por lo tanto renuncian a recibir emolumento alguno por la publicación, distribución, comunicación pública y cualquier otro uso que se haga en los términos de la presente autorización.

5. Todos los contenidos de la Revista de la Facultad de Ciencias, están publicados bajo la Licencia Creative Commons Atribución – No comercial – Sin Derivar 4.0.