DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.

Use Cases of Computational Reproducibility for Scientific Workflows at Exascale

We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics, in a hybrid queriable system, the ProvEn server. The system capabilities are illustrated on two use cases: scientific reproducibility of results in the ACME climate simulations and performance reproducibility in molecular dynamics workflows on HPC computing platforms.

Open access to data at Yale University

Open access to research data increases knowledge, advances science, and benefits society. Many researchers are now required to share data. Two research centers at Yale have launched projects that support this mission. Both centers have developed technology, policies, and workflows to facilitate open access to data in their respective fields. The Yale University Open Data Access (YODA) Project at the Center for Outcomes Research and Evaluation advocates for the responsible sharing of clinical research data. The Project, which began in 2014, is committed to open science and data transparency, and supports research attempting to produce concrete benefits to patients, the medical community, and society as a whole. Early experience sharing data, made available by Johnson & Johnson (J&J) through the YODA Project, has demonstrated a demand for shared clinical research data as a resource for investigators. To date, the YODA Project has facilitated the sharing of data for over 65 research projects. The Institution for Social and Policy Studies (ISPS) Data Archive is a digital repository that shares and preserves the research produced by scholars affiliated with ISPS. Since its launch in 2011, the Archive holds data and code underlying almost 90 studies. The Archive is committed to the ideals of scientific reproducibility and transparency: It provides free and public access to research materials and accepts content for distribution under a Creative Commons license. The Archive has pioneered a workflow, “curating for reproducibility,” that ensures long term usability and data quality.

Reflections on the Future of Research Curation and Research Reproducibility

In the years since the launch of the World Wide Web in 1993, there have been profoundly transformative changes to the entire concept of publishing—exceeding all the previous combined technical advances of the centuries following the introduction of movable type in medieval Asia around the year 10001 and the subsequent large-scale commercialization of printing several centuries later by J. Gutenberg (circa 1440). Periodicals in print—from daily newspapers to scholarly journals—are now quickly disappearing, never to return, and while no publishing sector has been unaffected, many scholarly journals are almost unrecognizable in comparison with their counterparts of two decades ago. To say that digital delivery of the written word is fundamentally different is a huge understatement. Online publishing permits inclusion of multimedia and interactive content that add new dimensions to what had been available in print-only renderings. As of this writing, the IEEE portfolio of journal titles comprises 59 online only2 (31%) and 132 that are published in both print and online. The migration from print to online is more stark than these numbers indicate because of the 132 periodicals that are both print and online, the print runs are now quite small and continue to decline. In short, most readers prefer to have their subscriptions fulfilled by digital renderings only.