The lack of reproducibility in research: How statistics can endorse results

Scott Goddard, Valen Johnson


Scientific research is validated by reproduction of the results, but efforts to reproduce spurious claims drain resources. We focus on one cause of such failure: false positive statistical test results caused by random variability. Classical statistical methods rely on p-values to measure the evidence against null hypotheses, but Bayesian hypothesis testing produces more easily understood results, provided one can specify prior distributions under the alternative hypothesis. We describe new tests, UMPBTs, which are Bayesian tests that provide default specification of alternative priors, and show that these tests also maximize statistical power. 


statistical evidence; hypothesis test; Bayesian analysis; uniformly most powerful Bayesian tests

Full Text: PDF (Català) PDF (Español) PDF



Begley, C. and L. Ellis, 2012. «Drug Development: Raise Standards for Preclinical Cancer Research». Nature, 483(7391): 531-533. DOI: <10.1038/483531a>.

Bem, D., 2011. «Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Effect». Journal Personality and Social Psychology, 100(3): 407-425. DOI: <10.1037/a0021524>.

Bem, D.; Utts, J. and W. Johnson, 2011. «Must Psychologists Change the Way They Analyze Their Data?». Journal Personality and Social Psychology, 101(4): 716-719. DOI: <10.1037/a0024777>.

Berger, J. and T. Sellke, 1987. «Testing a Point Null Hypothesis: Irreconcilability of -values and Evidence». Journal of the American Statistical Association, 82(397): 112-122. DOI: <10.2307/2289131>.

Edwards, W.; Lindman, H. and L. Savage, 1963. «Bayesian Statistical Inference for Psychological Research». Psychological Review, 70(3): 193-242. DOI: <10.1037/h0044139>.

Hirschhorn, J.; Lohmueller, K.; Byrne, E. and K. Hirschhorn, 2002. «A Comprehensive Review of Genetic Association Studies». Genetics in Medicine, 4(2): 45-61. DOI: <10.1097/00125817-200203000-00002>.

Johnson, V. E., 2013a. «Uniformly Most Powerful Bayesian Tests». The Annals of Statistics, 41(1): 1716-1741. DOI: <10.1214/13-AOS1123>.

Johnson, V. E., 2013b. «Revised Standards for Statistical Evidence». PNAS, 110(48): 19313-19317. DOI: <10.1073/pnas.1313476110>.

Prinz, F.; Schlange, T. and K. Asadullah, 2011. «Believe It or Not: How Much Can We Rely on Published Data on Potential Drug Targets?». Nature Reviews Drug Discovery, 10(9): 712. DOI: <10.1038/nrd3439-c1>.

Rouder, J. and R. Morey, 2011. «A Bayes Factor Meta-analysis of Bem’s ESP Claim». Psychonomic Bulleton and Review, 18(4): 682-689. DOI: <10.3758/s13423-011-0088-7>.

Sellke, T.; Bayarri, M. and J. Berger, 2001. «Calibration of p-values for Testing Precise Null Hypotheses». The American Statistician, 55(1): 62-71. DOI: <10.1198/000313001300339950>.

Wagenmakers, E.; Wetzels, R.; Borsboom, D. and H. van der Maas, 2011. «Why Psychologists Must Change the Way they Analyze Their Data: the Case of Psi: Comment on Bem (2011)». Journal of Personality and Social Psychology, 100(3): 426-432. DOI: <10.1037/a0022790>.


  • There are currently no refbacks.