Truth, Proof, and Reproducibility: There's no counter-attack for the codeless

Current concerns about reproducibility in many research communities can be traced back to a high value placed on empirical reproducibility of the physical details of scientific experiments and observations. For example, the detailed descriptions by 17th century scientist Robert Boyle of his vacuum pump experiments are often held to be the ideal of reproducibility as a cornerstone of scientific practice. Victoria Stodden has claimed that the computer is an analog for Boyle's pump -- another kind of scientific instrument that needs detailed descriptions of how it generates results. In the place of Boyle's hand-written notes, we now expect code in open source programming languages to be available to enable others to reproduce and extend computational experiments. In this paper we show that there is another genealogy for reproducibility, starting at least from Euclid, in the production of proofs in mathematics. Proofs have a distinctive quality of being necessarily reproducible, and are the cornerstone of mathematical science. However, the task of the modern mathematical scientist has drifted from that of blackboard rhetorician, where the craft of proof reigned, to a scientific workflow that now more closely resembles that of an experimental scientist. So, what is proof in modern mathematics? And, if proof is unattainable in other fields, what is due scientific diligence in a computational experimental environment? How do we measure truth in the context of uncertainty? Adopting a manner of Lakatosian conversant conjecture between two mathematicians, we examine how proof informs our practice of computational statistical inquiry. We propose that a reorientation of mathematical science is necessary so that its reproducibility can be readily assessed.

(Re)considering the Concept of Reproducibility of Information Systems Literature Reviews

In this paper, we describe DiOS, a lightweight model operating system which can be used to execute programs that make use of POSIX APIs. Such executions are fully reproducible: running the same program with the same inputs twice will result in two exactly identical instruction traces, even if the program uses threads for parallelism. DiOS is implemented almost entirely in portable C and C++: although its primary platform is DiVM, a verification-oriented virtua machine, it can be configured to also run in KLEE, a symbolic executor. Finally, it can be compiled into machine code to serve as a user-mode kernel. Additionally, DiOS is modular and extensible. Its various components can be combined to match both the capabilities of the underlying platform and to provide services required by a particular program. New components can be added to cover additional system calls or APIs. The experimental evaluation has two parts. DiOS is first evaluated as a component of a program verification platform based on DiVM. In the second part, we consider its portability and modularity by combining it with the symbolic executor KLEE.

Reproducible Execution of POSIX Programs with DiOS

Literature reviews play a key role in information systems (IS) research by describing, understanding, testing, and explaining the constructs and theories within a particular topic area. In recent years, various commentaries, debates, and editorials in the field’s top journals have highlighted the importance of systematicity and transparency in creating trustworthy literature reviews. Although also recognized as being important, the characteristic of reproducibility of IS literature reviews has not received nearly the same level of attention. This paper seeks to contribute to the ongoing discussion on the elements required for high quality IS literature reviews by clarifying the role of reproducibility. In doing so, we find that the concept of reproducibility has been misunderstood in much of the guidance to authors of IS literature reviews. Based on this observation, we make several suggestions for clarifying the terminology and identifying when reproducibility is desirable and feasible within IS literature reviews.

Can topic models be used in research evaluations? Reproducibility, validity, and reliability when compared with semantic maps

We replicate and analyze the topic model which was commissioned to King’s College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample—a test for reliability—has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semantic maps.

Novelty in science should not come at the cost of reproducibility

The pressures of a scientific career can end up incentivising an all‐or‐nothing approach to cross the finish line first. While competition can be healthy and drives innovation, the current system fails to encourage scientists to work reproducibility. This sometimes leaves those individuals who come second to correct mistakes in published research without being rewarded. Instead, we need a culture that rewards reproducibility and holds it as important as the novelty of the result. Here, I draw on my own journey in the oestrogen receptor research field to highlight this and suggest ways for the 'first past the post' culture to be challenged.

A Link is not Enough – Reproducibility of Data

Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices.