Posts about reproducibility infrastructure (old posts, page 4)

PRUNE: A Preserving Run Environment for Reproducible Scientific Computing

Computing as a whole suffers from a crisis of reproducibility. Programs executed in one context are aston-ishingly hard to reproduce in another context, resulting in wasted effort by people and general distrust of results produced by computer. The root of the problem lies in the fact that every program has implicit dependencies on data and execution environment whichare rarely understood by the end user. To address this problem, we present PRUNE, the Preserving Run Environment.In PRUNE, every task to be executed is wrapped in a functional interface and coupled with a strictly defined environment. The task is then executed by PRUNErather than the user to ensure reproducibility. As a scientific workflow evolves in PRUNE, a growing but immutable tree of derived data is created. The provenance of every item in the system can be precisely described, facilitating sharing and modification between collaborating researchers, along with efficient management of limited storage space. We present the user interface and the initial prototype of PRUNE, and demonstrate its application in matching records and comparing surnames in U.S. Censuses.

Project package libraries and reproducibility

If you are an R user it has probably happened to you that you upgraded some R package in your R installation, and then suddenly your R script or application stopped working. One strategy is that you create a new package library for a new project. A package library is just a directory that holds all installed R packages. (In addition to the ones that are installed with R itself.) This is why we created the pkgsnap tool. This is a very simple package with two exported functions: 1) snap takes a snapshot of your project library. It writes out the names and versions of the currently installed packages into a text file. You can put this text file into the version control repository of the project, to make sure it is not lost, and 2) restore uses the snapshot file to recreate the package project library from scratch. It installs the recorded versions of the recorded packages, in the right order.

Tools and techniques for computational reproducibility

When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. With a broad scientific audience in mind, we describe strengths and limitations of each approach, as well as circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

Janiform Papers Demo (pdbf: portable database files)

PDBF documents are a hybrid format. They are a valid PDF and a valid HTML page at the same time. You can now optionally add an VirtualBox OVA file with a complete operating system to the PDBF document. Yes, this means that the resulting file is a valid PDF, HTML, and OVA file at the same time. If you change the file extension to PDF and open it with an PDF viewer, you can see the static part of the document.