Who We Are

The moderators of this site are dedicated to disseminating information and resources related to research reproducibility. This project is a part of the Moore/Sloan Data Science Environment. New York University, the University of California, Berkeley and the University of Washington have launched a 5-year, $37.8 million, cross-institutional effort with support from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation to harness the potential of data scientists and big data for basic research and scientific discovery. Reproducibility is a key component of this effort.

On this site, the moderators have curated sources of various types discussing reproducibility in the directory. You can find academic papers, blog posts, popular media articles, talks, tools, and more. We aim to include only open access resources in our directory -- if you find something closed (e.g. requires login or money to see the resource) please let us know in a GitHub issue, letting us know the title of and link to the resource in question. If you want to contribute to the directory, check out our how to contribute page on this site or the CONTRIBUTING.md on GitHub.

Why did we make this site?

We know that research should be described in enough detail that it can be repeated and perhaps generalized. This implies the ability to redo research in nominally equal settings and also to test the generalizability of a claimed conclusion by trying similar approaches in different settings.Unfortunately, the state of the art falls far short of this goal. Methods are specified only informally in papers, where expresults are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. Different studies have shown an alarming reality regarding reproducibility:

Vandewalle et al. found out that only 9% out of 134 papers published in IEEE Transactions on Image Processing in 2004 had code available online, while 33% had data available.
From over 268 papers that were accepted in seven different conferences, 43% submitted their code and data to the Artifact Evaluation Process, and only 30% could be correctly reproduced.
Collberg et al. showed that, out of 402 Computer Systems papers, only 56% were able to share their code.

Because important discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. Over the years, to increase the adoption to reproducibility, we have been involved in several activities, including the development of infrastructure, tools, and best practices; working with conferences and journals to establish reproducibility evaluation for articles; giving presentations and tutorials.

We hope this site will become a community resource, where researchers from a wide range of disciplines can obtain and share information to improve research practices and ultimately, the quality of their results.