A big part of this problem has to do with what’s been called a “reproducibility crisis” in science – many studies if run a second time don’t come up with the same results. Scientists are worried about this situation, and high-profile international research journals have raised the alarm, too, calling on researchers to put more effort into ensuring their results can be reproduced, rather than only striving for splashy, one-off outcomes. Concerns about irreproducible results in science resonate outside the ivory tower, as well, because a lot of this research translates into information that affects our everyday lives.
The editors of Behavioral Neuroscience have been discussing several recent developments in the landscape of scientific publishing. The discussion was prompted, in part, by reported issues of reproducibility and concerns about the integrity of the scientific literature. Although enhanced rigor and transparency in science are certainly important, a related issue is that increased competition and focus on novel findings has impeded the extent to which the scientific process is cumulative. We have decided to join the growing number of journals that are adopting new reviewing and publishing practices to address these problems. In addition to our standard research articles, we are pleased to announce 3 new categories of articles: replications, registered reports, and null results. In joining other journals in psychology and related fields to offer these publication types, we hope to promote higher standards of methodological rigor in our science. This will ensure that our discoveries are based on sound evidence and that they provide a durable foundation for future progress. (PsycINFO Database Record)
Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests
Over the past few years, research reproducibility has been increasingly highlighted as a multifaceted challenge across many disciplines. There are socio-cultural obstacles as well as a constantly changing technical landscape that make replicating and reproducing research extremely difficult. Researchers face challenges in reproducing research across different operating systems and different versions of software, to name just a few of the many technical barriers. The prioritization of citation counts and journal prestige has undermined incentives to make research reproducible. While libraries have been building support around research data management and digital scholarship, reproducibility is an emerging area that has yet to be systematically addressed. To respond to this, New York University (NYU) created the position of Librarian for Research Data Management and Reproducibility (RDM & R), a dual appointment between the Center for Data Science (CDS) and the Division of Libraries. This report will outline the role of the RDM & R librarian, paying close attention to the collaboration between the CDS and Libraries to bring reproducible research practices into the norm.
We wish to answer this question: If you observe a "significant" P value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by P values between 0.01 and 0.05 is explored by exact calculations of false positive rates. When you observe P = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3:1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the P value. And if you want to limit the false positive rate to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe P = 0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100:1 odds on there being a real effect. That would usually be regarded as conclusive, But the false positive rate would still be 8% if the prior probability of a real effect was only 0.1. And, in this case, if you wanted to achieve a false positive rate of 5% you would need to observe P = 0.00045. It is recommended that P values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive rate. It may also be helpful to specify the minimum false positive rate associated with the observed P value. And that the terms "significant" and "non-significant" should never be used. Despite decades of warnings, many areas of science still insist on labelling a result of P < 0.05 as "significant". This practice must account for a substantial part of the lack of reproducibility in some areas of science. And this is before you get to the many other well-known problems, like multiple comparisons, lack of randomisation and P-hacking. Science is endangered by statistical misunderstanding, and by university presidents and research funders who impose perverse incentives on scientists.
Presentation on analysis preservation and reusability at #C4RR in Cambridge.