New criteria for the choice of training sample size for model selection and prediction: the cubic root rule
Un nuevo criterio para la elección del tamaño de la muestra de entrenamiento para la selección de modelos y de predicción: la regla de la raíz cúbica
Keywords:
5% cubic root rule, intrinsec priors, objective bayesian hypothesis testing, training sample size (en)Regla de la raiz cubica 5%, apriori intrinseca, pruebas de hipótesis bayesiana objetivas, tamaño de muestra de entrenamiento (es)
Downloads
The size of a training sample in Objective Bayesian Testing and Model Selection is a central problem in the theory and in the practice. We concentrate here in simulated training samples and in simple hypothesis. The striking result is that even in the simplest of situations, the optimal training sample M, can be minimal (for the identification of the sampling model) or maximal (for optimal prediction of future data). We suggest a compromise that seems to work well whatever the purpose of the analysis: the 5% cubic root rule: M=min[0.05*n, n^{1/3}]. We proceed to define a comprehensive loss function that combines identification errors and prediction errors, appropriately standardized. We find that the very simple cubic root rule is extremely close to an over- all optimum for a wide selection of sample sizes and cutting points that define the decision rules. The first time that the cubic root has been proposed is in Pericchi (2010). This article propose to generalize the rule and to take full statistical advantage for realistic situations. Another way to look at the rule, is as a synthesis of the rationale that justify both AIC and BIC.
References
Abramowitz, M. ; Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications, Inc.
Berger J.O. and Pericchi L.R. (1996a) The Intrinsic Bayes Factor for Model Selection and Prediction. Jour, Amer. Statist. Ass., 91, 109-122.
Berger J.O. and Pericchi L.R. (1996b) The Intrinsic Bayes Factor for
Linear Models. Bayesian Statistics 5, Bernardo et al. eds,
Oxford University Press, 23-42.
Casella G. and Moreno E. (2009). Assessing Robustness of Intrinsic Test of Independence in Two-way Contingency Tables. Tech Report.
Chakrabarti A. and Ghosh J.(2007). Some Aspects of Bayesian Model Selection for Prediction. Bayesian Statistics, 8, 51-90.
Kass R.E. and Wasserman L. (1995). A Reference Bayesian Test for
Nested Hypothesis and its Relationship with Schwarz Criterion.
Jour, Amer. Statist. Ass. 90, 928-934.
Pericchi, L.R. (2010). How large should be the training sample? Invited Chapter in the book: ''Frontiers of Decision Making and Bayesian Analysis. In Honor of James O. Berger'', Chen MH et al editors. Springer. (In press).
Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A (2002), 'Bayesian Measures of Model Complexity and Fit (with Discussion), Journal of the Royal Statistical Society, Series B, 64(4):583-616, and in The BUGS Project DIC www.mrc-bsu.cam.ac.uk/bugs/winbugs/dicpage.shtml.
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
Article abstract page views
Downloads
License
The authors or copyright holders of each paper confer to the Journal of the Faculty of Sciences of Universidad Nacional de Colombia a non-exclusive, limited and free authorization on the paper that, once evaluated and approved, is sent for its subsequent publication in accordance with the following characteristics:
- The corrected version is sent according to the suggestions of the evaluators and it is clarified that the paper mentioned is an unpublished document on which the rights are authorized and full responsibility is assumed for the content of the work before both the Journal of the Faculty of Sciences, Universidad Nacional de Colombia and third parties.
- The authorization granted to the Journal will be in force from the date it is included in the respective volume and number of the Journal of the Faculty of Sciences in the Open Journal Systems and on the Journal’s home page (https://revistas.unal.edu.co/index.php/rfc/index), as well as in the different databases and data indexes in which the publication is indexed.
- The authors authorize the Journal of the Faculty of Sciences of Universidad Nacional de Colombia to publish the document in the format in which it is required (printed, digital, electronic or any other known or to be known) and authorize the Journal of the Faculty of Sciences to include the work in the indexes and search engines deemed necessary to promote its diffusion.
- The authors accept that the authorization is given free of charge, and therefore they waive any right to receive any emolument for the publication, distribution, public communication, and any other use made under the terms of this authorization.
- All the contents of the Journal of the Faculty of Sciences are published under the Creative Commons Attribution – Non-commercial – Without Derivative 4.0.License
MODEL LETTER OF PRESENTATION and TRANSFER OF COPYRIGHTS
Personal data processing policy
The names and email addresses entered in this Journal will be used exclusively for the purposes set out in it and will not be provided to third parties or used for other purposes.