Facilitating Reproducibility and Collaboration with Literate Programming

A fundamental challenge for open science is how best to create and share documents containing computational results. Traditional methods involve maintaining the code, generated tables and figures, and text as separate files and manually assembling them into a finished document. As projects grow in complexity, this approach can lead to procedures which are error prone and hard to replicate. Fortunately, new tools are emerging to address this problem and librarians who provide data services are ideally positioned to provide training. In the workshop we’ll use RStudio to demonstrate how to create a "compilable" document containing all the text elements (including bibliography), as well as the code required to create embedded graphs and tables. We’ll demonstrate how the process facilitates making revisions when, for example, a reviewer has suggested a revision or when there has been a change in the underlying data. We’ll also demonstrate the convenience of integrating version control into the workflow using RStudio’s built-in support for git.

Evaluating Reproducibility in Computational Biology Research

For my Honors Senior Project, I read five research papers in the field of computational biology and attempted to reproduce the results. However, for the most part, this proved a challenge, as many details vital to utilizing relevant software and data had been excluded. Using Geir Kjetil Sandve's paper "Ten Simple Rules for Reproducible Computational Research" as a guide, I discuss how authors of these five papers did and did not obey these rules of reproducibility and how this affected my ability to reproduce their results.

Scientific Research: Reproducibility and Bias in Chemistry

When scientists are able to recreate earlier research results, published by other scientists, the research is considered reproducible. But what happens when the results don’t match? It means that the initial research is non-reproducible. Reproducibility, or non-reproducibility, of scientific experiments seems straightforward; it implies that an experimental result is either valid or invalid. In fact, researchers affiliated with Stanford University, Tufts University, and University of Ioannina in Greece concluded in 2005 that a majority of all research findings are false. How do those invalid results end up in scientific papers? A group of Stanford researchers concluded that, in many cases, bias is to blame.

Systematic reviews and evidence synthesis

While comprehensive and expert searching may be part of the traditional aspects of academic librarianship, systematic reviews also require transparency and reproducibility of search methodology. This work is supported by use of reporting guidelines and related librarian expertise. This guide provides resources that are useful to librarians assisting with systematic reviews in a broad range of disciplines outside the biomedical sciences. Because the bulk of published literature on systematic reviews is concentrated in the health sciences, some resources are subject-specific in title, but have broader applications.

DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.