Studies indicate, however, that more than half of the experiments involving clinical trials of new drugs and treatments are irreproducible. John Ioannidis at Stanford University, US, goes on saying that most of the search results is actually false. Ioannidis is the author of a mathematical model that predicts that the smaller the sample and less stringent are the experimental methodology, definitions, outcomes and statistical analysis, the greater the probability of error. Furthermore, studies that hold financial and other interests or of great impact are also more prone to false results.
Reproducible Science Promoting Open Science
A paper from investigators at the University of Alabama at Birmingham recently published in Obesity identifies several key statistical errors commonly seen in obesity research with discussions on how to identify and avoid making these mistakes. "Our goal is to provide researchers and reviewers with a tutorial to improve the rigor of the science in future obesity studies,” said Brandon George, Ph.D., statistician in the University of Alabama at Birmingham Office of Energetics. “Investigators who conduct primary research may find the paper useful to read or share with statistical collaborators to obtain a deeper understanding of statistical issues, avoid making the discussed errors, and increase the reproducibility and rigor of the field. Editors, reviewers and consumers will find valuable information allowing them to properly identify these common errors while critically reading the work of others."
The scrutiny of the scientific community has also turned to research involving computer programs, finding that reproducibility depends more strongly on implementation than commonly thought. These problems are especially relevant for property predictions of crystals and molecules, which hinge on precise computer implementations of the governing equation of quantum physics. We devised a procedure to assess the precision of DFT methods and used this to demonstrate reproducibility among many of the most widely used DFT codes.
Benchmarking has proven to be crucial for the investigation of the behavior and performances of a system. However, the choice of relevant benchmarks still remains a challenge. To help the process of comparing and choosing among benchmarks, we propose a solution for automatic benchmark profiling. It computes unified benchmark profiles reflecting benchmarks’ duration, function repartition, stability, CPU efficiency, parallelization and memory usage. It identifies the needed system information for profile computation, collects it from execution traces and produces profiles through efficient and reproducible trace analysis treatments. The paper presents the design, implementation and the evaluation of the approach. The analysis of the kernel trace follows a workflow implemented using the VisTrails tool.
As science grapples with what some have called a reproducibility crisis, replication studies, which aim to reproduce the results of previous studies, have been held up as a way to make science more reliable. It seems like common sense: Take a study and do it again — if you get the same result, that’s evidence that the findings are true, and if the result doesn’t turn up again, they’re false. Yet in practice, it’s nowhere near this simple.
Once again, reproducibility is in the news. Most recently we hear that irreproducibility is irreproducible and thus everything is actually fine. The most recent round was kicked off by a criticism of the Reproducibility Project followed by claim and counter claim on whether one analysis makes more sense than the other. I’m not going to comment on that but I want to tease apart what the disagreement is about, because it shows that the problem with reproducibility goes much deeper than whether or not a particular experiment replicates.
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. With a broad scientific audience in mind, we describe strengths and limitations of each approach, as well as circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.
The scientific community is bustling with projects to make published results more reliable. Efforts are under way to establish checklists, to revamp training in experimental design, and even to fund disinterested scientists to replicate others' experiments. A more efficient strategy would be to rework current incentives to put less emphasis on high-impact publications, but those systems are entrenched, and public funders and universities are ill-prepared for that scale of change. To catalyse change, industry must step up to the plate. I have learned this first hand, as head of the Structural Genomics Consortium (SGC), a research charity funded by business, government and other charities. If more companies contributed funds and expertise to efforts such as ours, I believe it would create a system that rewards science that is both cutting-edge and reproducible.
A satirical piece detailing the replication and reproducibility crisis in Psychology.
In 2005, John Ioannidis, a professor of medicine at Stanford University, published a paper, “Why most published research findings are false,” mathematically showing that a huge number of published papers must be incorrect. He also looked at a number of well-regarded medical research findings, and found that, of 34 that had been retested, 41% had been contradicted or found to be significantly exaggerated. Since then, researchers in several scientific areas have consistently struggled to reproduce major results of prominent studies. By some estimates, at least 51%—and as much as 89%—of published papers are based on studies and experiments showing results that cannot be reproduced.