In scientometrics, we have not yet had an intensive debate about the reproducibility of research published in our field, although concerns about a lack of reproducibility have occasionally surfaced (see e.g. Glänzel & Schöpflin 1994 and Van den Besselaar et al. 2017), and the need to improve the reproducibility is used as an important argument for open citation data (see www.issi-society.org/open-citations-letter/). We initiated a first discussion about reproducibility in scientometrics with a workshop at ISSI 2017 in Wuhan. One of the outcomes was the sense that scientific fields differ with regard to the type and pervasiveness of threats to the reproducibility of their published research, last but not least due to their differences in modes of knowledge production, such as confirmatory versus exploratory study designs, and differences in methods and empirical objects.
Methodological developments and software implementations progress in increasingly faster time-frames. The introduction and widespread acceptance of pre-print archived reports and open-source software make state-of-the-art statistical methods readily accessible to researchers. At the same time, researchers more and more emphasize that their results should be reproducible (using the same data obtaining the same results), which is a basic requirement for assessing the replicability (obtaining similar results in new data) of results. While the age of fast-paced methodology greatly facilitates reproducibility, it also undermines it in ways not often realized by researchers. The goal of this paper is to make researchers aware of these caveats. I discuss sources of limited replicability and reproducibility in both the development of novel statistical methods and their implementation in software routines. Novel methodology comes with many researcher degrees of freedom, and new understanding comes with changing standards over time. In software-development, reproducibility may be impacted due to software developing and changing over time, a problem that is greatly magnified by large dependency-trees between software-packages. The paper concludes with a list of recommendations for both developers and users of new methods to improve reproducibility of results.
Scientific reproducibility is key to the advancement of science as researchers can build on sound and validated results to design new research studies. However, recent studies in biomedical research have highlighted key challenges in scientific reproducibility as more than 70% of researchers in a survey of more than 1500 participants were not able to reproduce results from other groups and 50% of researchers were not able to reproduce their own experiments. Provenance metadata is a key component of scientific reproducibility and as part of the Provenance for Clinical and Health Research (ProvCaRe) project, we have: (1) identified and modeled important provenance terms associated with a biomedical research study in the S3 model (formalized in the ProvCaRe ontology); (2) developed a new natural language processing (NLP) workflow to identify and extract provenance metadata from published articles describing biomedical research studies; and (3) developed the ProvCaRe knowledge repository to enable users to query and explore provenance of research studies using the S3 model. However, a key challenge in this project is the automated classification of provenance metadata extracted by the NLP workflow according to the S3 model and its subsequent querying in the ProvCaRe knowledge repository. In this paper, we describe the development and comparative evaluation of deep learning techniques for multi-class classification of structured provenance metadata extracted from biomedical literature using 12 different categories of provenance terms represented in the S3 model. We describe the application of the Long Term Short Memory (LSTM) network, which has the highest classification accuracy of 86% in our evaluation, to classify more than 48 million provenance triples in the ProvCaRe knowledge repository (available at: https://provcare.case.edu/).
In recent years, serious concerns have arisen about reproducibility in science. Estimates of the cost of irreproducible preclinical studies range from 28 billion USD per year in the USA alone (Freedman et al. in PLoS Biol 13(6):e1002165, 2015) to over 200 billion USD per year worldwide (Chalmers and Glasziou in Lancet 374:86–89, 2009). The situation in the social sciences is not very different: Reproducibility in psychological research, for example, has been estimated to be below 50% as well (Open Science Collaboration in Science 349:6251, 2015). Less well studied is the issue of reproducibility of simulation research. A few replication studies of agent-based models, however, suggest the problem for computational modeling may be more severe than for laboratory experiments (Willensky and Rand in JASSS 10(4):2, 2007; Donkin et al. in Environ Model Softw 92:142–151, 2017; Bajracharya and Duboz in: Proceedings of the symposium on theory of modeling and simulation—DEVS integrative M&S symposium, pp 6–11, 2013). In this perspective, we discuss problems of reproducibility in agent-based simulations of life and social science problems, drawing on best practices research in computer science and in wet-lab experiment design and execution to suggest some ways to improve simulation research practice.
The results of a scientific experiment have to be reproduced to be valid. The scientific method is well known in experimental sciences but it is not always the case for computer scientists. Recent publications and studies has shown that there is a significant reproducibility crisis in Biology and Medicine. This problem has also been demonstrated for hundreds of publications in computer science where only a limited set of publication results could be reproduced. In this paper we present the reproducibility challenge and we examine the reproducibility of a Parallel Discrete Event System Specification (PDEVS) model with two different execution frameworks.
We developed a new probabilistic model to assess the impact of recommendations rectifying the reproducibility crisis (by publishing both positive and 'negative' results and increasing statistical power) on competing objectives, such as discovering causal relationships, avoiding publishing false positive results, and reducing resource consumption. In contrast to recent publications our model quantifies the impact of each single suggestion not only for an individual study but especially their relation and consequences for the overall scientific process. We can prove that higher-powered experiments can save resources in the overall research process without generating excess false positives. The better the quality of the pre-study information and its exploitation, the more likely this beneficial effect is to occur. Additionally, we quantify the adverse effects of both neglecting good practices in the design and conduct of hypotheses-based research, and the omission of the publication of 'negative' findings. Our contribution is a plea for adherence to or reinforcement of the good scientific practice and publication of 'negative' findings.