The literature on the reproducibility crisis presents several putative causes for the proliferation of irreproducible results, including HARKing, p-hacking and publication bias. Without a theory of reproducibility, however, it is difficult to determine whether these putative causes can explain most irreproducible results. Drawing from an historically informed conception of science that is open and collaborative, we identify the components of an idealized experiment and analyze these components as a precursor to develop such a theory. Openness, we suggest, has long been intuitively proposed as a solution to irreproducibility. However, this intuition has not been validated in a theoretical framework. Our concern is that the under-theorizing of these concepts can lead to flawed inferences about the (in)validity of experimental results or integrity of individual scientists. We use probabilistic arguments and examine how openness of experimental components relates to reproducibility of results. We show that there are some impediments to obtaining reproducible results that precede many of the causes often cited in literature on the reproducibility crisis. For example, even if erroneous practices such as HARKing, p-hacking, and publication bias were absent at the individual and system level, reproducibility may still not be guaranteed.
Although essential to the development of a robust evidence base for nurse educators, the concepts of replication and reproducibility have received little attention in the nursing education literature. In this Methodology Corner installment, the concepts of study replication and reproducibility are explored in depth. In designing, conducting, and documenting the findings of studies in nursing education, researchers are encouraged to make design choices that improve study replicability and reproducibility of study findings. [J Nurs Educ. 2018;57(11):638–640.] There has been considerable discussion in the professional literature about questionable research practices that raise doubt about the credibility of research findings (Shrout & Rodgers, 2018) and that limit reproducibility of research findings (Shepherd, Peratikos, Rebeiro, Duda, & McCowan, 2017). This discussion has led to what scientists term as a replication crisis (Goodman, Fanelli, & Ioannidis, 2016). Although investigators in various disciplines have provided suggestions to address this crisis (Alvarez, Key, & Núñez, 2018; Goodman et al., 2016; Shrout & Rodgers, 2018), similar discussions or reports of replication within nursing education literature are limited, despite a call for replication studies (Morin, 2016). Consequently, the focus of this article is on replication and reproducibility. The topic is important, given that the hallmark of good science is being able to replicate or reproduce findings (Morin, 2016). Replication serves to provide “stability in our knowledge of nature” (Schmidt, 2009, p. 92).
Cell culture is a vital component of laboratories throughout the scientifi c community, yet the absence of standardized protocols and documentation practice challenges laboratory effi ciency and scientific reproducibility. We examined the effectiveness of a cloud-based software application, CultureTrax ® as a tool for standardizing and transferring a complex cell culture protocol. The software workfl ow and template were used to electronically format a cardiomyocyte differentiation protocol and share a digitally executable copy with a different lab user. While the protocol was unfamiliar to the recipient, they executed the experiment by solely using CultureTrax and successfully derived cardiomyocytes from human induced pluripotent stem cells. This software tool significantly reduced the time and resources required to effectively transfer and implement a novel protocol.
Replicability and reproducibility of computational models has been somewhat understudied by “the replication movement.” In this paper, we draw on methodological studies into the replicability of psychological experiments and on the mechanistic account of explanation to analyze the functions of model replications and model reproductions in computational neuroscience. We contend that model replicability, or independent researchers' ability to obtain the same output using original code and data, and model reproducibility, or independent researchers' ability to recreate a model without original code, serve different functions and fail for different reasons. This means that measures designed to improve model replicability may not enhance (and, in some cases, may actually damage) model reproducibility. We claim that although both are undesirable, low model reproducibility poses more of a threat to long-term scientific progress than low model replicability. In our opinion, low model reproducibility stems mostly from authors' omitting to provide crucial information in scientific papers and we stress that sharing all computer code and data is not a solution. Reports of computational studies should remain selective and include all and only relevant bits of code.
Web document collections such as WT10G, GOV2, and ClueWeb are widely used for text retrieval experiments. Documents in these collections contain a fair amount of non-content-related markup in the form of tags, hyperlinks, and so on. Published articles that use these corpora generally do not provide specific details about how this markup information is handled during indexing. However, this question turns out to be important: Through experiments, we find that including or excluding metadata in the index can produce significantly different results with standard IR models. More importantly, the effect varies across models and collections. For example, metadata filtering is found to be generally beneficial when using BM25, or language modeling with Dirichlet smoothing, but can significantly reduce retrieval effectiveness if language modeling is used with Jelinek-Mercer smoothing. We also observe that, in general, the performance differences become more noticeable as the amount of metadata in the test collections increase. Given this variability, we believe that the details of document preprocessing are significant from the point of view of reproducibility. In a second set of experiments, we also study the effect of preprocessing on query expansion using RM3. In this case, once again, we find that it is generally better to remove markup before using documents for query expansion.
Experimental deception has not been seriously examined in terms of its impact on reproducible science. I demonstrate, using data from the Open Science Collaboration’s Reproducibility Project (2015), that experiments involving deception have a higher probability of not replicating and have smaller effect sizes compared to experiments that do not have deception procedures. This trend is possibly due to missing information about the context and performance of agents in the studies in which the original effects were generated, leading to either compromised internal validity, or an incomplete specification and control of variables in replication studies. Of special interest are the mechanisms by which deceptions are implemented and how these present challenges for the efficient transmission of critical information from experimenter to participant. I rehearse possible frameworks that might form the basis of a future research program on experimental deception and make some recommendations as to how such a program might be initiated.