Investigating the Effectiveness of the Open Data Badge Policy at Psychological Science Through Computational Reproducibility

In April 2019, Psychological Science published its first issue in which all research articles received the Open Data badge. We used that issue to investigate the effectiveness of this badge, focusing on the adherence to its stated aim at Psychological Science: ensuring reproducibility of results. Twelve researchers of varying experience levels attempted to reproduce the results of the empirical articles in the target issue (at least three researchers per article). We found that all articles provided at least some data, 6/14 articles provided analysis code or scripts, only 1/14 articles was rated to be exactly reproducible, and 3/14 essentially reproducible with minor deviations. We recommend that Psychological Science require a check of reproducibility at the peer review stage before awarding badges, and that the Open Data badge be renamed "Open Data and Code" to avoid confusion and encourage researchers to adhere to this higher standard.

Share the code, not just the data: A case study of the reproducibility of JML articles published under the open data policy

In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. How effective is this policy? We compared 59 papers published before, and 59 papers published after, the policy took effect. After the policy was in place, the rate of data sharing increased by more than 50%. We further looked at whether papers published under the open data policy were reproducible, in the sense that the published results should be possible to regenerate given the data, and given the code, when code was provided. For 8 out of the 59 papers, data sets were inaccessible. The reproducibility rate ranged from 34% to 56%, depending on the reproducibility criteria. The strongest predictor of whether an attempt to reproduce would be successful is the presence of the analysis code: it increases the probability of reproducing reported results by almost 40%. We propose two simple steps that can increase the reproducibility of published papers: share the analysis code, and attempt to reproduce one’s own analysis using only the shared materials.

The Critical Need to Foster Reproducibility in Computational Geoscience

Chains of computer models translate emissions into climate signals and subsequent into impacts regarding floods, droughts, heatwaves, and other perils. While the need for computational geoscience is significant, recent publications across the geo- and environmental sciences suggest that reproducibility of computational geoscience might be limited. So far, the focus of reproducibility largely remained on discussions of this problem in the social sciencesor medicine; in this talk, we take a peek behind the curtain of everyday geoscientific research and unveil how we need to foster reproducibility in computational geoscience and what is required to do that. A poll among more than 300 geoscientists reveals that geoscientific research is currently not reproducible enough. 61% say that a lack of reproducible research is putting trust in our results at stake, and only 3% strongly agree that computational geoscientific research is reproducible. Leading causes, contrasting previous polls, are not only a lack of resources and willingness to share code and data but also a lack of knowledge about state-of-the-art software development methods and licenses among the geoscientific community. To lay a path for a future where Open Science is the norm, we let the voices of the community speak on what they think is necessary and paint a picture of a future that fosters reproducible geoscience and thus trust.

Reproducibility in Computing Research: An Empirical Study

In computing, research findings are often anecdotally faulted for not being reproducible. Numerous empirical studies have analyzed the reproducibility of a variety of research. Our objective, in this study, is to quantify the current state of reproducibility of research in computing based on prior research, using three reproducibility factors—Method, Data and Experiment—to measure three different degrees of reproducibility. Twenty-five variables traditionally utilized to document reproducibility are identified and grouped into three factors, namely Method, Data and Experiment. These variables describe the extent to which these factors are documented for each paper. Approximately 100 randomly selected research papers from the International Conference on Information Systems series, for the year 2019, are surveyed. Our findings suggest that none of the papers documented all the variables. In fact, the results show that relatively few variables for each factor are documented. Some of the variables vary across different categories of papers, and most papers fail in at least one of the factors. Reproducibility scores decrease with increased documentation requirements. Reproducibility may improve over time, as researchers prioritize reproducibility and utilize methods that ensure reproducibility. Research documentation in computing is remarkably limited, resulting in a dearth of reproducible factors. Future research may study the shifts and trends in reproducibility over time. Meanwhile, researchers and publishers must increase their focus on the reproducibility aspects of their papers. This study contributes to our understanding of the status quo of reproducibility in computing research.

The critical need to foster computational reproducibility

The climate crisis illustrates the critical need for earth and environmental models to assess the Earth's past and future by translating emissions into climate signals and subsequent impacts regarding floods, droughts, or heatwaves, as well as future resource availability. While computational models grow in relevance by guiding policies and public discourse, our trust in these models is put to the test. A recent study estimates that 93% of hydrology and water resources published studies cannot be reproduced. In this perspective, we question whether we are amid a reproducibility crisis in the computational earth sciences and peek behind the curtain of everyday research. Software development has become an integral part of research in most areas, including the earth sciences, where computational models and data processing algorithms become increasingly sophisticated to solve the challenges of our time. Paradoxically, this development poses a threat to scientific progress: Reproducibility, as an essential pillar of science, is increasingly difficult to reach or even to test. This trend is particularly worrisome as scientific results have potentially controversial implications for stakeholders and policymakers and may influence public opinion and decisions for a long time. In recent years, progress towards Open Science has led to more publishers demanding access to data and source code alongside peer-reviewed manuscripts; but recent studies still find that less reproducible research may be even cited more frequently. We argue that we insufficiently understand how the earth science community currently attempts to reproduce computational results and what challenges they face in this effort. To what do scientists attribute this lack of reproducibility in computational earth sciences, and what are possible solutions? In this perspective we survey the community on what they think is necessary and paint a picture of a future that fosters reproducible computational science and thus trust.

Beyond the Badge: Reproducibility Engineering as a Lifetime Skill

Ascertaining reproducibility of scientific experiments is receiving increased attention across disciplines. We argue that the necessary skills are important beyond pure scientific utility, and that they should be taught as part of software engineering (SWE) education. They serve a dual purpose: Apart from acquiring the coveted badges assigned to reproducible research, reproducibility engineering is a lifetime skill for a professional industrial career in computer science. SWE curricula seem an ideal fit for conveying such capabilities, yet they require some extensions, especially given that even at flagship conferences like ICSE, only slightly more than one-third of the technical papers (at the 2021 edition) receive recognition for artefact reusability. Knowledge and capabilities in setting up engineering environments that allow for reproducing artefacts and results over decades (a standard requirement in many traditional engineering disciplines), writing semi-literate commit messages that document crucial steps of a decision-making process and that are tightly coupled with code, or sustainably taming dynamic, quickly changing software dependencies, to name a few: They all contribute to solving the scientific reproducibility crisis, and enable software engineers to build sustainable, long-term maintainable, software-intensive, industrial systems. We propose to teach these skills at the undergraduate level, on par with traditional SWE topics.