We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics, in a hybrid queriable system, the ProvEn server. The system capabilities are illustrated on two use cases: scientific reproducibility of results in the ACME climate simulations and performance reproducibility in molecular dynamics workflows on HPC computing platforms.
For scientific theories grounded in empirical data, replicability is a core principle, for at least two reasons. First, unless we accept to have scientific theories rest on the authority of a small number of researchers, empirical studies should be replicable, in the sense that its methods and procedure should be detailed enough for someone else to conduct the same study. Second, for empirical results to provide a solid foundation for scientific theorizing, they should also be replicable, in the sense that most attempts at replicating the original study that produced them would yield similar results. The XPhi Replicability Project is primarily concerned with replicability in the second sense, that is: the replicability of results. In the past year, several projects have shed doubt on the replicability of key findings in psychology, and most notably social psychology. Because the methods of experimental philosophy have often been close to the ones used in social psychology, it is only natural to wonder to which extent the results experimental philosophers ground their theory are replicable. The aim of the XPhi Replicability Project is precisely to reach a reliable estimate of the replicability of empirical results in experimental philosophy. To this end, several research teams across the world will replicate around 40 studies in experimental philosophy, some among the most cited, others drawn at random. The results of the project will be published in a special issue of the Review of Philosophy and Psychology dedicated to the topic of replicability in cognitive science.
Figures are essential outputs of computational geoscientific research, e.g. maps and time series showing the results of spatiotemporal analyses. They also play a key role in open reproducible research, where public access is provided to paper, data, and source code to enable reproduction of the reported results. This scientific ideal is rarely practiced as studies, e.g. in biology have shown. In this article, we report on a series of studies to evaluate open reproducible research in the geosciences from the perspectives of authors and readers. First, we asked geoscientists what they understand by open reproducible research and what hinders its realisation. We found there is disagreement amongst authors, and a lack of openness impedes the adoption by authors and readers alike. However, reproducible research also includes the ability to achieve the same results requiring not only accessible but executable source code. Hence, to further examine the reader’s perspective, we searched for open access papers from the geosciences that have code/data attached (in R) and executed the analysis. We encountered several technical issues while executing the code and found differences between the original and reproduced figures. Based on these findings, we propose guidelines for authors to address these.
There are two well-known difficulties to test and interpret methodologies for mining developer interaction traces: first, the lack of enough large datasets needed by mining or machine learning approaches to provide reliable results; and second, the lack of "ground truth" or empirical evidence that can be used to triangulate the results, or to verify their accuracy and correctness. Moreover, relying solely on interaction traces limits our ability to take into account contextual factors that can affect the applicability of mining techniques in other contexts, as well hinders our ability to fully understand the mechanics behind observed phenomena. The data presented in this paper attempts to alleviate these challenges by providing 600+ hours of developer interaction traces, from which 26+ hours are backed with video recordings of the IDE screen and developer’s comments. This data set is relevant to researchers interested in investigating program comprehension, and those who are developing techniques for interaction traces analysis and mining.
Theoretical work on reproducibility of scientific claims has hitherto focused on hypothesis testing as the desired mode of statistical inference. Focusing on hypothesis testing, however, poses a challenge to identify salient properties of the scientific process related to reproducibility, especially for fields that progress by building, comparing, selecting, and re-building models. We build a model-centric meta-scientific framework in which scientific discovery progresses by confirming models proposed in idealized experiments. In a temporal stochastic process of scientific discovery, we define scientists with diverse research strategies who search the true model generating the data. When there is no replication in the system, the structure of scientific discovery is a particularly simple Markov chain. We analyze the effect of diversity of research strategies in the scientific community and the complexity of the true model on the time spent at each model, the mean first time to hit the true model and staying with the true model, and the rate of reproducibility given a true model. Inclusion of replication in the system breaks the Markov property and fundamentally alters the structure of scientific discovery. In this case, we analyze aforementioned properties of scientific discovery by an agent-based model. In our system, the seeming paradox of scientific progress despite irreproducibility persists even in the absence of questionable research practices and incentive structures, as the rate of reproducibility and scientific discovery of the truth are uncorrelated. We explain this seeming paradox by a combination of research strategies in the population and the state of truth. Further, we find that innovation speeds up the discovery of truth by making otherwise inaccessible, possibly true models visible to the scientific population. We also show that epistemic diversity in the scientific population optimizes across a range of desirable properties of scientific discovery.
Access to research data is a critical feature of an efficient, progressive, and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data ("analytic reproducibility"). To investigate, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), and data that were in-principle reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). However, for 35 articles with in-principle reusable data, the analytic reproducibility of target outcomes related to key findings was poor: 11 (31%) cases were reproducible without author assistance, 11 (31%) cases were reproducible only with author assistance, and 13 (37%) cases were not fully reproducible despite author assistance. Importantly, original conclusions did not appear to be seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification, and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.