Over the last decade, there’s been a lot of talk about reproducibility problems in science — about published results that turn out to be false alarms. In fields like psychology, neuroscience, and cell biology, these errors can send scientists down unproductive paths, waste time and money, and pollute headlines with misleading claims. "But I get much more exercised about reproducibility problems in clinical genetics, because those have massive and real-time consequences for thousands of families," says MacArthur.
While experiments may be published even in a top scientific journal, other researchers who attempt to repeat the same experiments under the same conditions often find contradicting results. As a measure of this, a recent study attempted to reproduce psychology publications and successfully replicated only 39 out of 100 studies. It turns out that excluding sex in experimental design may have contributed to reproducibility issues. Furthermore, sex can also have a biological impact on our scientific understanding and influence how well early biological studies translate into advances in human medicine.
Experimental results that don’t hold up to replication have caused consternation among scientists for years, especially in the life and social sciences (SN: 1/24/15, p. 20). In 2015 several research groups examining the issue reported on the magnitude of the irreproducibility problem. The news was not good.
Project on Reproducibility and Robustness of the Empirical Instrumental Variables Literature in Medicine.
The finding that acute and chronic manipulations of the same neural circuit can produce different behavioural outcomes poses new questions about how best to analyse these circuits.
Stanford Center for Reproducible Neuroscience: A new preprint has been posted to the ArXiv that has very important implications and should be required reading for all fMRI researchers. Anders Eklund, Tom Nichols, and Hans Knutson applied task fMRI analyses to a large number of resting fMRI datasets, in order to identify the empirical corrected “familywise” Type I error rates observed under the null hypothesis for both voxel-wise and cluster-wise inference. What they found is shocking: While voxel-wise error rates were valid, nearly all cluster-based parametric methods (except for FSL’s FLAME 1) have greatly inflated familywise Type I error rates. This inflation was worst for analyses using lower cluster-forming thresholds (e.g. p=0.01) compared to higher thresholds, but even with higher thresholds there was serious inflation. This should be a sobering wake-up call for fMRI researchers, as it suggests that the methods used in a large number of previous publications suffer from exceedingly high false positive rates (sometimes greater than 50%).