Scientific Research: Reproducibility and Bias in Chemistry

When scientists are able to recreate earlier research results, published by other scientists, the research is considered reproducible. But what happens when the results don’t match? It means that the initial research is non-reproducible. Reproducibility, or non-reproducibility, of scientific experiments seems straightforward; it implies that an experimental result is either valid or invalid. In fact, researchers affiliated with Stanford University, Tufts University, and University of Ioannina in Greece concluded in 2005 that a majority of all research findings are false. How do those invalid results end up in scientific papers? A group of Stanford researchers concluded that, in many cases, bias is to blame.

Systematic reviews and evidence synthesis

While comprehensive and expert searching may be part of the traditional aspects of academic librarianship, systematic reviews also require transparency and reproducibility of search methodology. This work is supported by use of reporting guidelines and related librarian expertise. This guide provides resources that are useful to librarians assisting with systematic reviews in a broad range of disciplines outside the biomedical sciences. Because the bulk of published literature on systematic reviews is concentrated in the health sciences, some resources are subject-specific in title, but have broader applications.

DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.

Use Cases of Computational Reproducibility for Scientific Workflows at Exascale

We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics, in a hybrid queriable system, the ProvEn server. The system capabilities are illustrated on two use cases: scientific reproducibility of results in the ACME climate simulations and performance reproducibility in molecular dynamics workflows on HPC computing platforms.

Open access to data at Yale University

Open access to research data increases knowledge, advances science, and benefits society. Many researchers are now required to share data. Two research centers at Yale have launched projects that support this mission. Both centers have developed technology, policies, and workflows to facilitate open access to data in their respective fields. The Yale University Open Data Access (YODA) Project at the Center for Outcomes Research and Evaluation advocates for the responsible sharing of clinical research data. The Project, which began in 2014, is committed to open science and data transparency, and supports research attempting to produce concrete benefits to patients, the medical community, and society as a whole. Early experience sharing data, made available by Johnson & Johnson (J&J) through the YODA Project, has demonstrated a demand for shared clinical research data as a resource for investigators. To date, the YODA Project has facilitated the sharing of data for over 65 research projects. The Institution for Social and Policy Studies (ISPS) Data Archive is a digital repository that shares and preserves the research produced by scholars affiliated with ISPS. Since its launch in 2011, the Archive holds data and code underlying almost 90 studies. The Archive is committed to the ideals of scientific reproducibility and transparency: It provides free and public access to research materials and accepts content for distribution under a Creative Commons license. The Archive has pioneered a workflow, “curating for reproducibility,” that ensures long term usability and data quality.