The week at Retraction Watch featured a refreshingly honest retraction, and a big win for PubPeer. Here’s what was happening elsewhere.
Accumulating evidence indicates high risk of bias in preclinical animal research, questioning the scientific validity and reproducibility of published research findings. Systematic reviews found low rates of reporting of measures against risks of bias in the published literature (e.g., randomization, blinding, sample size calculation) and a correlation between low reporting rates and inflated treatment effects. That most animal research undergoes peer review or ethical review would offer the possibility to detect risks of bias at an earlier stage, before the research has been conducted.
It’s not a new story, although "the reproducibility crisis" may seem to be. For life sciences, I think it started in the late 1950s. Problems caused in clinical research burst into the open in a very public way then. But before we get to that, what is "research reproducibility"? It’s a euphemism for unreliable research or research reporting. Steve Goodman and colleagues (2016) say 3 dimensions of science that affect reliability are at play: Methods reproducibility – enough detail available to enable a study to be repeated; Results reproducibility – the findings are replicated by others; Inferential reproducibility – similar conclusions are drawn about results, which brings statistics and interpretation squarely into the mix. There is a lot of history behind each of those. Here are some of the milestones in awareness and proposed solutions that stick out for me.
The US National Institutes of Health (NIH) is now assessing all research grant submissions based on the rigor and transparency of the proposed research plans. Previously, efforts to strengthen scientific practices had been undertaken by individual institutes, beginning in 2011 with the National Institute on Aging, which partnered with APS and the NIH Office of Behavioral and Social Science Research to begin a conversation about improving reproducibility across science. These early efforts were noted and encouraged by Congress. Now, the entire agency has committed to this important goal: NIH's 2016–2020 strategic plan announces, "NIH will take the lead in promoting new approaches toward enhancing the rigor of experimental design, analysis, and reporting."
ReproZip (Rampin et al. 2014) is a tool aimed at simplifying the process of creating reproducible experiments. After finishing an experiment, writing a website, constructing a database, or creating an interactive environment, users can run ReproZip to create reproducible packages, archival snapshots, and an easy way for reviewers to validate their work.
Reproducibility in animal research is alarmingly low, and a lack of scientific rigor has been proposed as a major cause. Systematic reviews found low reporting rates of measures against risks of bias (e.g., randomization, blinding), and a correlation between low reporting rates and overstated treatment effects. Reporting rates of measures against bias are thus used as a proxy measure for scientific rigor, and reporting guidelines (e.g., ARRIVE) have become a major weapon in the fight against risks of bias in animal research. Surprisingly, animal scientists have never been asked about their use of measures against risks of bias and how they report these in publications. Whether poor reporting reflects poor use of such measures, and whether reporting guidelines may effectively reduce risks of bias has therefore remained elusive. To address these questions, we asked in vivo researchers about their use and reporting of measures against risks of bias and examined how self-reports relate to reporting rates obtained through systematic reviews. An online survey was sent out to all registered in vivo researchers in Switzerland (N = 1891) and was complemented by personal interviews with five representative in vivo researchers to facilitate interpretation of the survey results. Return rate was 28% (N = 530), of which 302 participants (16%) returned fully completed questionnaires that were used for further analysis.