Scientific advance relies on transparency, rigour and reproducibility. At PLOS ONE we have always supported the publication of rigorous research, in all its forms, positive or negative, as showcased in our earlier Missing Pieces Collection. In this 10th Anniversary Collection, A Decade of Missing Pieces Senior Editor Alejandra Clark revisits this important theme and highlights a decade of null and negative results, replication studies and studies refuting previously published work.
In January, Bruce Beutler, an immunologist at University of Texas Southwestern Medical Center and winner of the 2011 Nobel Prize in Physiology or Medicine, emailed Science editor-in-chief Jeremy Berg to report that attempts to replicate the findings in "MAVS, cGAS, and endogenous retroviruses in T-independent B cell responses" had weakened his confidence in original results. The paper had found that virus-like elements in the human genome play an important role in the immune system’s response to pathogens. Although Beutler and several co-authors requested retraction right off the bat, the journal discovered that two co-authors disagreed, which Berg told us drew out the retraction process. In an attempt to resolve the situation, the journal waited for Beutler’s lab to perform another replication attempt. Those findings were inconclusive and the dissenting authors continued to push back against retraction.
In order for research methods to be consistent, accessible and reproducible, we need universal, widely understood standards for research that all scientists adhere to. NPL has been responsible for maintaining fundamental standards and units for more than 100 years and is now engaged in pioneering work to create a set of “gold standards” for all scientific methodologies, materials, analyses and protocols, based on exhaustive testing at a large number of laboratories, in tandem with both industry and national and international standardisation organisations.
Provenance refers to any information describing the production process of an end product, which can be anything from a piece of digital data to a physical object. While this survey focuses on the former type of end product, this definition still leaves room for many different interpretations of and approaches to provenance. These are typically motivated by different application domains for provenance (e.g., accountability, reproducibility, process debugging) and varying technical requirements such as runtime, scalability, or privacy. As a result, we observe a wide variety of provenance types and provenance-generating methods. This survey provides an overview of the research field of provenance, focusing on what provenance is used for (what for?), what types of provenance have been defined and captured for the different applications (what form?), and which resources and system requirements impact the choice of deploying a particular provenance solution (what from?). For each of these three key questions, we provide a classification and review the state of the art for each class. We conclude with a summary and possible future research challenges.
As computational pipelines become a bigger part of science, it is important to ensure that the results are reproducible, a concern which has come to the fore in recent years. All developed software should be able to be run automatically without any user intervention. In addition to being valuable to the wider community, which may wish to reproduce or extend a published analysis, reproducible research practices allow for better control over the project by the original authors themselves. For example, keeping a non-executable record of parameters and command line arguments leads to error-prone analysis and opens up the possibility that, when the results are to be written up for publication, the researcher will no longer be able to even completely describe the process that led to them. For large projects, the use of multiple computational cores (either in a multi-core machine or distributed across a compute cluster) is necessary to obtain results in a useful time frame. Furthermore, it is often the case that, as the project evolves, it becomes necessary to save intermediate results while down-stream analyses are designed (or redesigned) and implemented. Under many frameworks, this causes having a single point of entry for the computation becomes increasingly difficult. Jug is a software framework which addresses these issues by caching intermediate results and distributing the computational work as tasks across a network. Jug is written in Python without the use of compiled modules, is completely crossplatform, and available as free software under the liberal MIT license.
Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However,the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encapsulate experiment descriptions and components and are suitable for representing reproducibility. Additionally, they can be published alongside traditional patterns as a form of documentation for the experiment which can be combined with linked open data. For reproducibility utilising published datasets, it is necessary to declare the conditions or restrictions for permissible reuse. In this paper, we take a look at the state of workflow reproducibility through a browser based tool and a corresponding study to identify how workflows might be combined with traditional forms of documentation and publication. We also discuss the licensing aspects for data in workflows and how it can be annotated using linked open data ontologies