Published

2012-01-01

New criteria for the choice of training sample size for model selection and prediction: the cubic root rule

Un nuevo criterio para la elección del tamaño de la muestra de entrenamiento para la selección de modelos y de predicción: la regla de la raíz cúbica

Keywords:

5% cubic root rule, intrinsec priors, objective bayesian hypothesis testing, training sample size (en)
Regla de la raiz cubica 5%, apriori intrinseca, pruebas de hipótesis bayesiana objetivas, tamaño de muestra de entrenamiento (es)

Authors

  • Israel Almodovar Department of Statistics, Iowa State University, Ames, IA 50011
  • Luis Pericchi Department of Mathematics, University of Puerto Rico, Río Piedras Campus, PR 00936-8377

The size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sample M, can be minimal (for the identification of the sampling model) or maximal (for optimal prediction of future data). We suggest a compromise that seems to work well whatever the purpose of the analysis: the 5% cubic root rule: M=min[0.05*n, n^{1/3}]. We proceed to define a comprehensive loss function that combines identification  errors and prediction errors, appropriately standardized. We find that the very  simple cubic root rule is extremely close to an over- all optimum for a wide selection  of sample sizes and cutting points that define the decision rules. The first time that  the cubic root has been proposed is in Pericchi (2010). This article propose to generalize  the rule and to take full statistical advantage for realistic situations. Another way to look  at the rule, is as a synthesis of the rationale that justify both AIC and BIC.

El tamaño de una muestra de entrenamiento en la selección y prueba en Bayesiana objetiva es un problema central en la teoría y en la práctica. Nos concentraremos en muestras de entrenamiento simuladas y en pruebas de hipótesis simples. El resultado impactante es que, aun en las situaciones más simples, la muestra de entrenamiento M óptima  puede ser minimal (para la identificación del modelo muestral) o maximal (para la predicción óptima de datos futuros). Se sugiere un compromiso que parece funcionar bien para cualquier propósito del análisis: la regla de la raíz cúbica del 5%: M=min[0.05*n, n^{1/3}]. Se procede a definir una función de pérdida comprehensiva que combina los errores de identificación y los errores de predicción, estandarizados apropiadamente. Se halla que la regla de la raíz cúbica simple es cercana en extremo a un óptimo general para una amplia selección de tamaños muestrales y puntos de corte que definen las reglas de decisión. La primera vez que se ha propuesto la raíz cúbica fue en Pericchi(2010). Este artículo propone generalizar la regla y tomar una ventaja estadística completa para situaciones reales. Otra forma de ver la regla es como una síntesis de la racionalidad que justifica tanto el AIC como el BIC.

References

Abramowitz, M. ; Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, Inc.

Berger J.O. and Pericchi L.R. (1996a) The Intrinsic Bayes Factor for Model Selection and Prediction. Jour, Amer. Statist. Ass., 91, 109-122.

Berger J.O. and Pericchi L.R. (1996b) The Intrinsic Bayes Factor for

Linear Models. Bayesian Statistics 5, Bernardo et al. eds,

Oxford University Press, 23-42.

Casella G. and Moreno E. (2009). Assessing Robustness of Intrinsic Test of Independence in Two-way Contingency Tables. Tech Report.

Chakrabarti A. and Ghosh J.(2007). Some Aspects of Bayesian Model Selection for Prediction. Bayesian Statistics, 8, 51-90.

Kass R.E. and Wasserman L. (1995). A Reference Bayesian Test for

Nested Hypothesis and its Relationship with Schwarz Criterion.

Jour, Amer. Statist. Ass. 90, 928-934.

Pericchi, L.R. (2010). How large should be the training sample? Invited Chapter in the book: ''Frontiers of Decision Making and Bayesian Analysis. In Honor of James O. Berger'', Chen MH et al editors. Springer. (In press).

Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A (2002), 'Bayesian Measures of Model Complexity and Fit (with Discussion), Journal of the Royal Statistical Society, Series B, 64(4):583-616, and in The BUGS Project DIC www.mrc-bsu.cam.ac.uk/bugs/winbugs/dicpage.shtml.

How to Cite

APA

Almodovar, I. and Pericchi, L. (2012). New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias, 1(1), 7–22. https://revistas.unal.edu.co/index.php/rfc/article/view/48975

ACM

[1]
Almodovar, I. and Pericchi, L. 2012. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias. 1, 1 (Jan. 2012), 7–22.

ACS

(1)
Almodovar, I.; Pericchi, L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Rev. Fac. Cienc. 2012, 1, 7-22.

ABNT

ALMODOVAR, I.; PERICCHI, L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Revista de la Facultad de Ciencias, [S. l.], v. 1, n. 1, p. 7–22, 2012. Disponível em: https://revistas.unal.edu.co/index.php/rfc/article/view/48975. Acesso em: 15 jan. 2025.

Chicago

Almodovar, Israel, and Luis Pericchi. 2012. “New criteria for the choice of training sample size for model selection and prediction: the cubic root rule”. Revista De La Facultad De Ciencias 1 (1):7-22. https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Harvard

Almodovar, I. and Pericchi, L. (2012) “New criteria for the choice of training sample size for model selection and prediction: the cubic root rule”, Revista de la Facultad de Ciencias, 1(1), pp. 7–22. Available at: https://revistas.unal.edu.co/index.php/rfc/article/view/48975 (Accessed: 15 January 2025).

IEEE

[1]
I. Almodovar and L. Pericchi, “New criteria for the choice of training sample size for model selection and prediction: the cubic root rule”, Rev. Fac. Cienc., vol. 1, no. 1, pp. 7–22, Jan. 2012.

MLA

Almodovar, I., and L. Pericchi. “New criteria for the choice of training sample size for model selection and prediction: the cubic root rule”. Revista de la Facultad de Ciencias, vol. 1, no. 1, Jan. 2012, pp. 7-22, https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Turabian

Almodovar, Israel, and Luis Pericchi. “New criteria for the choice of training sample size for model selection and prediction: the cubic root rule”. Revista de la Facultad de Ciencias 1, no. 1 (January 1, 2012): 7–22. Accessed January 15, 2025. https://revistas.unal.edu.co/index.php/rfc/article/view/48975.

Vancouver

1.
Almodovar I, Pericchi L. New criteria for the choice of training sample size for model selection and prediction: the cubic root rule. Rev. Fac. Cienc. [Internet]. 2012 Jan. 1 [cited 2025 Jan. 15];1(1):7-22. Available from: https://revistas.unal.edu.co/index.php/rfc/article/view/48975

Download Citation

Article abstract page views

326

Downloads

Download data is not yet available.