Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce. In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors. While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.
Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an opensource implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language.
A webinar on the challenges of reproducibility in data scarce fields.
This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start.
Join our panelists for a discussion on challenges and opportunities related to sharing and using open data in research, including meeting funder and journal guidelines.
In this RCE Podcast, Brock Palen and Jeff Squyres discuss Reproducible Neuroscience with RCE Podcast Chris Gorgolewski from Stanford. "In recent years there has been increasing concern about the reproducibility of scientific results. Because scientific research represents a major public investment and is the basis for many decisions that we make in medicine and society, it is essential that we can trust the results. Our goal is to provide researchers with tools to do better science. Our starting point is in the field of neuroimaging, because that’s the domain where our expertise lies."
The University of Minnesota Libraries addressed this issue head-on this year by launching the reproducibility portal in an effort to help faculty and others on campus improve their research practices. The portal is a collaboration that includes Liberal Arts Technology and Information Services (LATIS) and the Minnesota Supercomputing Institute (MSI).
Introduce the python environment wrapper and packing tools; virtualenv & pip. Show you how you can stay up to date by using in requires.io egg security and update checking. Cover Fabric a python deployment tool and wider systems and workflow replication with Vagrant and Reprozip.If time allowing touch upon test driven development and adding Travis to your project.
A preview of ReproZip by Vicky Steeves at the PresQT Workshop May 1, 2017 at the University of Notre Dame.
Amidst the recent flood of concerns about transparency and reproducibility in the behavioral and clinical sciences, we suggest a simple, inexpensive, easy-to-implement, and uniquely powerful tool to improve the reproducibility of scientific research and accelerate progress—video recordings of experimental procedures. Widespread use of video for documenting procedures could make moot disagreements about whether empirical replications truly reproduced the original experimental conditions. We call on researchers, funders, and journals to make commonplace the collection and open sharing of video-recorded procedures.