Over the past few years, research reproducibility has been increasingly highlighted as a multifaceted challenge across many disciplines. There are socio-cultural obstacles as well as a constantly changing technical landscape that make replicating and reproducing research extremely difficult. Researchers face challenges in reproducing research across different operating systems and different versions of software, to name just a few of the many technical barriers. The prioritization of citation counts and journal prestige has undermined incentives to make research reproducible. While libraries have been building support around research data management and digital scholarship, reproducibility is an emerging area that has yet to be systematically addressed. To respond to this, New York University (NYU) created the position of Librarian for Research Data Management and Reproducibility (RDM & R), a dual appointment between the Center for Data Science (CDS) and the Division of Libraries. This report will outline the role of the RDM & R librarian, paying close attention to the collaboration between the CDS and Libraries to bring reproducible research practices into the norm.
Reproducible Science Promoting Open Science
We wish to answer this question: If you observe a "significant" P value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive rates. When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3:1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive rate to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe P = 0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100:1 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive rate would still be 8% if the prior probability of a real effect was only 0.1. And, in this case, if you wanted to achieve a false positive rate of 5% you would need to observe P = 0.00045. It is recommended that P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive rate. It may also be helpful to specify the minimum false positive rate associated with the observed P value. And that the terms "significant" and "non-significant" should never be used. Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as "significant". This practice must account for a substantial part of the lack of reproducibility in some areas of science. And this is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Science is endangered by statistical misunderstanding, and by university presidents and research funders who impose perverse incentives on scientists.
Presentation on analysis preservation and reusability at #C4RR in Cambridge.
A high-profile project aiming to test reproducibility in cancer biology has released a second batch of results, and this time the news is good: Most of the experiments from two key cancer papers could be repeated. The latest replication studies, which appear today in eLife, come on top of five published in January that delivered a mixed message about whether high-impact cancer research can be reproduced. Taken together, however, results from the completed studies are “encouraging,” says Sean Morrison of the University of Texas Southwestern Medical Center in Dallas, an eLife editor. Overall, he adds, independent labs have now “reproduced substantial aspects” of the original experiments in four of five replication efforts that have produced clear results.
Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce. In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors. While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.
Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an opensource implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language.
This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start.
Join our panelists for a discussion on challenges and opportunities related to sharing and using open data in research, including meeting funder and journal guidelines.
In this RCE Podcast, Brock Palen and Jeff Squyres discuss Reproducible Neuroscience with RCE Podcast Chris Gorgolewski from Stanford. "In recent years there has been increasing concern about the reproducibility of scientific results. Because scientific research represents a major public investment and is the basis for many decisions that we make in medicine and society, it is essential that we can trust the results. Our goal is to provide researchers with tools to do better science. Our starting point is in the field of neuroimaging, because that’s the domain where our expertise lies."
A webinar on the challenges of reproducibility in data scarce fields.