Reproducibility of modeling is a problem that exists for any machine learning practitioner, whether in industry or academia. The consequences of an irreproducible model can include significant financial costs, lost time, and even loss of personal reputation (if results prove unable to be replicated). This paper will first discuss the problems we have encountered while building a variety of machine learning models, and subsequently describe the framework we built to tackle the problem of model reproducibility. The framework is comprised of four main components (data, feature, scoring, and evaluation layers), which are themselves comprised of well defined transformations. This enables us to not only exactly replicate a model, but also to reuse the transformations across different models. As a result, the platform has dramatically increased the speed of both offline and online experimentation while also ensuring model reproducibility.
An increasing number of studies, surveys, and editorials highlight experimental and computational reproducibility and replication issues that appear to pervade most areas of modern science. This perspective examines some of the multiple and complex causes of what has been called a "reproducibility crisis," which can impact materials, interface/(bio)interphase, and vacuum sciences. Reproducibility issues are not new to science, but they are now appearing in new forms requiring innovative solutions. Drivers include the increasingly multidiscipline, multimethod nature of much advanced science, increased complexity of the problems and systems being addressed, and the large amounts and multiple types of experimental and computational data being collected and analyzed in many studies. Sustained efforts are needed to address the causes of reproducibility problems that can hinder the rate of scientific progress and lower public and political regard for science. The initial efforts of the American Vacuum Society to raise awareness of a new generation of reproducibility challenges and provide tools to help address them serve as examples of mitigating actions that can be undertaken.
The drive for reproducibility in the computational sciences has provoked discussion and effort across a broad range of perspectives: technological, legislative/policy, education, and publishing. Discussion on these topics is not new, but the need to adopt standards for reproducibility of claims made based on computational results is now clear to researchers, publishers and policymakers alike. Many technologies exist to support and promote reproduction of computational results: containerisation tools like Docker, literate programming approaches such as Sweave, knitr, iPython or cloud environments like Amazon Web Services. But these technologies are tied to specific programming languages (e.g. Sweave/knitr to R; iPython to Python) or to platforms (e.g. Docker for 64-bit Linux environments only). To date, no single approach is able to span the broad range of technologies and platforms represented in computational biology and biotechnology. To enable reproducibility across computational biology, we demonstrate an approach and provide a set of tools that is suitable for all computational work and is not tied to a particular programming language or platform. We present published examples from a series of papers in different areas of computational biology, spanning the major languages and technologies in the field (Python/R/MATLAB/Fortran/C/Java). Our approach produces a transparent and flexible process for replication and recomputation of results. Ultimately, its most valuable aspect is the decoupling of methods in computational biology from their implementation. Separating the 'how' (method) of a publication from the 'where' (implementation) promotes genuinely open science and benefits the scientific community as a whole.
With concerns over research reproducibility on the rise, systematic replications of published science have become an important tool to estimate the replicability of findings in specific areas. Nevertheless, such initiatives are still uncommon in biomedical science, and have never been performed at a national level. The Brazilian Reproducibility Initiative is a multicenter, systematic effort to assess the reproducibility of the country’s biomedical research by replicating between 50 and 100 experiments from Brazilian life sciences articles. The project will focus on a set of common laboratory methods, performing each experiment in multiple institutions across the country, with the reproducibility of published findings analyzed in the light of interlaboratory variability. The results, due in 2021, will allow us not only to estimate the reproducibility of Brazilian biomedical science, but also to investigate if there are aspects of the published literature that can be used to predict it.
This paper discusses results and insights from the 1st ReQuEST workshop, a collective effort to promote reusability, portability and reproducibility of deep learning research artifacts within the Architecture/PL/Systems communities. ReQuEST (Reproducible Quality-Efficient Systems Tournament) exploits the open-source. Collective Knowledge framework (CK) to unify benchmarking, optimization, and co-design of deep learning systems implementations and exchange results via a live multi-objective scoreboard. Systems evaluated under ReQuEST are diverse and include an FPGA-based accelerator, optimized deep learning libraries for x86 and ARM systems, and distributed inference in Amazon Cloud and over a cluster of Raspberry Pis. We finally discuss limitations to our approach, and how we plan improve upon those limitations for the upcoming SysML artifact evaluation effort.
Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, "Random File Numbers (Uniform)" is able to predict computational reproducibility with a good accuracy. We also analyze the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speedup reproducibility evaluations substantially, with a reduced accuracy loss.