Report on the First IEEE Workshop on the Future of Research Curation and Research Reproducibility

This report describes perspectives from the Workshop on the Future of Research Curation and Research Reproducibility that was collaboratively sponsored by the U.S. National Science Foundation (NSF) and IEEE (Institute of Electrical and Electronics Engineers) in November 2016. The workshop brought together stakeholders including researchers, funders, and notably, leading science, technology, engineering, and mathematics (STEM) publishers. The overarching objective was a deep dive into new kinds of research products and how the costs of creation and curation of these products can be sustainably borne by the agencies, publishers, and researcher communities that were represented by workshop participants. The purpose of this document is to describe the ideas that participants exchanged on approaches to increasing the value of all research by encouraging the archiving of reusable data sets, curating reusable software, and encouraging broader dialogue within and across disciplinary boundaries. How should the review and publication processes change to promote reproducibility? What kinds of objects should the curatorial process expand to embrace? What infrastructure is required to preserve the necessary range of objects associated with an experiment? Who will undertake this work? And who will pay for it? These are the questions the workshop was convened to address in presentations, panels, small working groups, and general discussion.

Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses

Summary: This manuscript introduces and describes Dugong, a Docker image based on Ubuntu 16.04, which automates installation of more than 3500 bioinformatics tools (along with their respective libraries and dependencies), in alternative computational environments. The software operates through a user-friendly XFCE4 graphic interface that allows software management and installation by users not fully familiarized with the Linux command line and provides the Jupyter Notebook to assist in the delivery and exchange of consistent and reproducible protocols and results across laboratories, assisting in the development of open science projects.

Enabling reproducible real-time quantitative PCR research: the RDML package

Reproducibility, a cornerstone of research, requires defined data formats, which include the set-up and output of experiments. The Real-time PCR Data Markup Language (RDML) is a recommended standard of the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines. Despite the popularity of the RDML format for analysis of qPCR data, handling of RDML files is not yet widely supported in all PCR curve analysis softwares. Results: This study describes the open source RDML package for the statistical computing language R.RDML is compatible with RDML versions ≤ 1.2 and provides functionality to (i) import RDML data; (ii) extract sample information (e.g., targets, concentration); (iii) transform data to various formats of the R environment; (iv) generate human readable run summaries; and (v) to create RDML files from user data. In addition, RDML offers a graphical user interface to read, edit and create RDML files.

American Geophysical Union Coalition Receives Grant to Advance Open and FAIR Data Standards in the Earth and Space Sciences

To address this critical need, the Laura and John Arnold Foundation has awarded a grant to a coalition of groups representing the international Earth and space science community, convened by the American Geophysical Union (AGU), to develop standards that will connect researchers, publishers, and data repositories in the Earth and space sciences to enable FAIR (findable, accessible, interoperable, and reusable) data – a concept first developed by Force11.org – on a large scale. This will accelerate scientific discovery and enhance the integrity, transparency, and reproducibility of this data. The resulting set of best practices will include: metadata and identifier standards; data services; common taxonomies; landing pages at repositories to expose the metadata and standard repository information; standard data citation; and standard integration into editorial peer review workflows.

The Practice of Reproducible Research

This book contains a collection of 31 case studies of reproducible research workflows, written by academic researchers in the data-intensive sciences. Each case study describes how the author combined specific tools, ideas, and practices in order to complete a real-world research project. Emphasis is placed on the practical aspects of how the author organized his or her research to make it as reproducible as possible.