NIH Request for Information on Strategies for NIH Data Management, Sharing, and Citation

This Request for Information (RFI) seeks public comments on data management and sharing strategies and priorities in order to consider: (1) how digital scientific data generated from NIH-funded research should be managed, and to the fullest extent possible, made publicly available; and, (2) how to set standards for citing shared data and software. Response to this RFI is voluntary. Responders are free to address any or all of the items in Sections I and II, delineated below, or any other relevant topics respondents recognize as important for NIH to consider. Respondents should not feel compelled to address all items. Instructions on how to respond to this RFI are provided in "Concluding Comments."

Student teams take on synbio reproducibility problem

Well, over the last two years iGEM teams around the world have been working to find out just how reproducible fluorescent proteins measurements are. They distributed testing plasmids and compared results across labs, measurement instruments, genetic parts, and E. coli strains. It’s a thorough 2 year study of interlab variability, and the results are out in PLOS ONE, “Reproducibility of Fluorescent Expression from Engineered Biological Constructs in E. coli“.

Linux Foundation Back Reproducible Builds Effort for Secure Software

In an effort to help open-source software developers build more secure software, the Linux Foundation is doubling down on its efforts to help the reproducible builds project. Among the most basic and often most difficult aspects of software development is making sure that the software end-users get is the same software that developers actually built. "Reproducible builds are a set of software development practices that create a verifiable path from human readable source code to the binary code used by computers," the Reproducible Builds project explains.

Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments

We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, WholeTale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.

Reward, reproducibility and recognition in research – the case for going Open

The advent of the internet has meant that scholarly communication has changed immeasurably over the past two decades but in some ways it has hardly changed at all. The coin in the realm of any research remains the publication of novel results in a high impact journal – despite known issues with the Journal Impact Factor. This elusive goal has led to many problems in the research process: from hyperauthorship to high levels of retractions, reproducibility problems and 'cherry picking' of results. The veracity of the academic record is increasingly being brought into question. An additional problem is this static reward systems binds us to the current publishing regime, preventing any real progress in terms of widespread open access or even adoption of novel publishing opportunities. But there is a possible solution. Increased calls to open research up and provide a greater level of transparency have started to yield practical real solutions. This talk will cover the problems we currently face and describe some of the innovations that might offer a way forward.

Scientific Data Science and the Case for Open Access

"Open access" has become a central theme of journal reform inacademic publishing. In this article, Iexamine the consequences of an important technological loophole in which publishers can claim to be adhering to the principles of open access by releasing articles in proprietary or “locked” formats that cannot be processed by automated tools, whereby even simple copy and pasting of text is disabled. These restrictions will prevent the development of an important infrastructural element of a modern research enterprise, namely,scientific data science, or the use of data analytic techniques to conduct meta-analyses and investigations into the scientific corpus. I give a brief history of the open access movement, discuss novel journalistic practices, and an overview of data-driven investigation of the scientific corpus. I arguethat particularly in an era where the veracity of many research studies has been called into question, scientific data science should be oneof the key motivations for open access publishing. The enormous benefits of unrestricted access to the research literature should prompt scholars from all disciplines to reject publishing models whereby articles are released in proprietary formats or are otherwise restricted from being processed by automated tools as part of a data science pipeline.