Here's the three-pronged approach we're using in our own research to tackle the reproducibility issue

A big part of this problem has to do with what’s been called a “reproducibility crisis” in science – many studies if run a second time don’t come up with the same results. Scientists are worried about this situation, and high-profile international research journals have raised the alarm, too, calling on researchers to put more effort into ensuring their results can be reproduced, rather than only striving for splashy, one-off outcomes. Concerns about irreproducible results in science resonate outside the ivory tower, as well, because a lot of this research translates into information that affects our everyday lives.

  • news article
  • Promoting transparency and reproducibility in Behavioral Neuroscience: Publishing replications, registered reports, and null results

    The editors of Behavioral Neuroscience have been discussing several recent developments in the landscape of scientific publishing. The discussion was prompted, in part, by reported issues of reproducibility and concerns about the integrity of the scientific literature. Although enhanced rigor and transparency in science are certainly important, a related issue is that increased competition and focus on novel findings has impeded the extent to which the scientific process is cumulative. We have decided to join the growing number of journals that are adopting new reviewing and publishing practices to address these problems. In addition to our standard research articles, we are pleased to announce 3 new categories of articles: replications, registered reports, and null results. In joining other journals in psychology and related fields to offer these publication types, we hope to promote higher standards of methodological rigor in our science. This will ensure that our discoveries are based on sound evidence and that they provide a durable foundation for future progress. (PsycINFO Database Record)

    Sustainable computational science: the ReScience initiative

    Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests

    Reproducibility Librarianship

    Over the past few years, research reproducibility has been increasingly highlighted as a multifaceted challenge across many disciplines. There are socio-cultural obstacles as well as a constantly changing technical landscape that make replicating and reproducing research extremely difficult. Researchers face challenges in reproducing research across different operating systems and different versions of software, to name just a few of the many technical barriers. The prioritization of citation counts and journal prestige has undermined incentives to make research reproducible. While libraries have been building support around research data management and digital scholarship, reproducibility is an emerging area that has yet to be systematically addressed. To respond to this, New York University (NYU) created the position of Librarian for Research Data Management and Reproducibility (RDM & R), a dual appointment between the Center for Data Science (CDS) and the Division of Libraries. This report will outline the role of the RDM & R librarian, paying close attention to the collaboration between the CDS and Libraries to bring reproducible research practices into the norm.

    The Reproducibility Of Research And The Misinterpretation Of P Values

    We wish to answer this question: If you observe a "significant" P value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive rates. When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3:1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive rate to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe P = 0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100:1 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive rate would still be 8% if the prior probability of a real effect was only 0.1. And, in this case, if you wanted to achieve a false positive rate of 5% you would need to observe P = 0.00045. It is recommended that P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive rate. It may also be helpful to specify the minimum false positive rate associated with the observed P value. And that the terms "significant" and "non-significant" should never be used. Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as "significant". This practice must account for a substantial part of the lack of reproducibility in some areas of science. And this is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Science is endangered by statistical misunderstanding, and by university presidents and research funders who impose perverse incentives on scientists.

    Cancer studies pass reproducibility test

    A high-profile project aiming to test reproducibility in cancer biology has released a second batch of results, and this time the news is good: Most of the experiments from two key cancer papers could be repeated. The latest replication studies, which appear today in eLife, come on top of five published in January that delivered a mixed message about whether high-impact cancer research can be reproduced. Taken together, however, results from the completed studies are “encouraging,” says Sean Morrison of the University of Texas Southwestern Medical Center in Dallas, an eLife editor. Overall, he adds, independent labs have now “reproduced substantial aspects” of the original experiments in four of five replication efforts that have produced clear results.

    Reproducibility in Machine Learning-Based Studies: An Example of Text Mining

    Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce. In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors. While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.

    trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

    Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an opensource implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language.