Who We Are

The moderators of this site are dedicated to disseminating information and resources related to computational reproducibility. This project is a part of the Moore/Sloan Data Science Environment. New York University, the University of California, Berkeley and the University of Washington have launched a 5-year, $37.8 million, cross-institutional effort with support from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation to harness the potential of data scientists and big data for basic research and scientific discovery.  Reproducibility is a key component of this effort.

A hallmark of the scientific method has been that experiments should be described in enough detail that they can be repeated and perhaps generalized. This implies the ability to redo experiments in nominally equal settings and also to test the generalizability of a claimed conclusion by trying similar experiments in different settings. In principle, this should be easier for computational experiments than for natural science experiments, because not only can computational processes be automated but also computational systems do not suffer from the ‘biological variation’ that plagues the life sciences.

Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. Different studies have shown an alarming reality regarding computational reproducibility:

  • Vandewalle et al. found out that only 9% out of 134 papers published in IEEE Transactions on Image Processing in 2004 had code available online, while 33% had data available.
  • From over 268 papers that were accepted in seven different conferences, 43% submitted their code and data to the Artifact Evaluation Process, and only 30% could be correctly reproduced.
  • Collberg et al. showed that, out of 402 Computer Systems papers, only 56% were able to share their code.

Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. Over the years, to increase the adoption to reproducibility, we have been involved in several activities, including the development of infrastructure, tools, and best practices; working with conferences and journals to establish reproducibility evaluation for articles; giving presentations and tutorials.

We hope this site will become a community resource, where scientists from a wide range of disciplines can obtain and share information to improve research practices and ultimately, the quality of their results.

 

Comments are closed