In a recent Opinion article, Parker et al.  highlight a range of important issues and provide tangible solutions to improve transparency in ecology and evolution (E&E). We agree wholeheartedly with their points and encourage the E&E community to heed their advice. However, a key issue remains conspicuously unaddressed: Parker et al. assume that ‘deliberate dishonesty’ is rare in E&E, yet evidence suggests that occurrences of scientific misconduct (i.e., data fabrication, falsification, and/or plagiarism) are disturbingly common in the life sciences .
Early in my Ph.D. studies, my supervisor assigned me the task of running computer code written by a previous student who was graduated and gone. It was hell. I had to sort through many different versions of the code, saved in folders with a mysterious numbering scheme. There was no documentation and scarcely an explanatory comment in the code itself. It took me at least a year to run the code reliably, and more to get results that reproduced those in my predecessor's thesis. Now that I run my own lab, I make sure that my students don't have to go through that.
A scientific result is not truly established until it is independently confirmed. This is one of the tenets of experimental science. Yet, we have seen a rash of recent headlines about experimental results that could not be reproduced. In the biomedical field, efforts to reproduce results of academic research by drug companies have had less than a 50% success rate,a resulting in billions of dollars in wasted effort. In most cases the cause is not intentional fraud, but rather sloppy research protocols and faulty statistical analysis. Nevertheless, this has led to both a loss in public confidence in the scientific enterprise and some serious soul searching within certain fields. Publishers have begun to take the lead in insisting on more careful reporting and review, as well as facilitating government open science initiatives mandating sharing of research data and code. To support efforts of this type, the ACM Publications Board recently approved a new policy on Result and Artifact Review and Badging. This policy defines two badges ACM will use to highlight papers that have undergone independent verification. Results Replicated is applied when the paper's main results have been replicated using artifacts provided by the author, or Results Reproduced if done completely independently.
A comic illustrating the complexities and history of research reproducibility.
The scientific community is increasingly concerned with cases of published "discoveries" that are not replicated in further studies. The field of mouse phenotyping was one of the first to raise this concern, and to relate it to other complicated methodological issues: the complex interaction between genotype and environment; the definitions of behavioral constructs; and the use of the mouse as a model animal for human health and disease mechanisms. In January 2015, researchers from various disciplines including genetics, behavior genetics, neuroscience, ethology, statistics and bioinformatics gathered in Tel Aviv University to discuss these issues. The general consent presented here was that the issue is prevalent and of concern, and should be addressed at the statistical, methodological and policy levels, but is not so severe as to call into question the validity and the usefulness of the field as a whole. Well-organized community efforts, coupled with improved data and metadata sharing were agreed by all to have a key role to play in view of identifying specific problems, as well as promoting effective solutions. As replicability is related to validity and may also affect generalizability and translation of findings, the implications of the present discussion reach far beyond the issue of replicability of mouse phenotypes but may be highly relevant throughout biomedical research.
When graduate student Alyssa Ward took a science-policy internship, she expected to learn about policy — not to unearth gaps in her biomedical training. She was compiling a bibliography about the reproducibility of experiments, and one of the papers, a meta-analysis, found that scientists routinely fail to explain how they choose the number of samples to use in a study. "My surprise was not about the omission — it was because I had no clue how, or when, to calculate sample size," Ward says. Nor had she ever been taught about major categories of experimental design, or the limitations of P values. (Although they can help to judge the strength of scientific evidence, P values do not — as many think — estimate the likelihood that a hypothesis is true.)