Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices.
Reproducibility plays a crucial role in experimentation. However, the modern research ecosystem and the underlying frameworks are constantly evolving and thereby making it extremely difficult to reliably reproduce scientific artifacts such as data, algorithms, trained models and visual-izations. We therefore aim to design a novel system for assisting data scientists with rigorous end-to-end documentation of data-oriented experiments. Capturing data lineage, metadata, andother artifacts helps reproducing and sharing experimental results. We summarize this challenge as automated documentation of data science experiments. We aim at reducing manualoverhead for experimenting researchers, and intend to create a novel approach in dataflow and metadata tracking based on the analysis of the experiment source code. The envisioned system will accelerate the research process in general, andenable capturing fine-grained meta information by deriving a declarative representation of data science experiments.
Lee et al. (2019) make several practical recommendations for replicable, useful cognitive modeling. They also point out that the ultimate test of the usefulness of a cognitive model is its ability to solve practical problems. In this commentary, we argue that for cognitive modeling to reach applied domains, there is a pressing need to improve the standards of transparency and reproducibility in cognitive modelling research. Solution-oriented modeling requires engaging practitioners who understand the relevant domain. We discuss mechanisms by which reproducible research can foster engagement with applied practitioners. Notably, reproducible materials provide a start point for practitioners to experiment with cognitive models and determine whether those models might be suitable for their domain of expertise. This is essential because solving complex problems requires exploring a range of modeling approaches, and there may not time to implement each possible approach from the ground up. We also note the broader benefits to reproducibility within the field.
The Target Article by Lee et al. (2019) highlights the ways in which ongoing concerns about research reproducibility extend to model-based approaches in cognitive science. Whereas Lee et al. focus primarily on the importance of research practices to improve model robustness, we propose that the transparent sharing of model specifications, including their inputs and outputs, is also essential to improving the reproducibility of model-based analyses. We outline an ongoing effort (within the context of the Brain Imaging Data Structure community) to develop standards for the sharing of the structure of computational models and their outputs.
The main criticism of my piece in ref (2) seems to be that my calculations rely on testing a point null hypothesis, i.e. the hypothesis that the true effect size is zero. He objects to my contention that the true effect size can be zero, "just give the same pill to both groups", on the grounds that two pills can't be exactly identical. He then says "I understand that this criticism may come across as frivolous semantic pedantry of no practical consequence: of course that the author meant to say 'pills with the same contents' as everybody would have understood". Yes, that is precisely how it comes across to me. I shall try to explain in more detail why I think that this criticism has little substance.
Computational Communication Research (CCR) is a new open access journal dedicated to publishing high quality computational research in communication science. This editorial introduction describes the role that we envision for the journal. First, we explain what computational communication science is and why a new journal is needed for this subfield. Then, we elaborate on the type of research this journal seeks to publish, and stress the need for transparent and reproducible science. The relation between theoretical development and computational analysis is discussed, and we argue for the value of null-findings and risky research in additive science. Subsequently, the (experimental) two-phase review process is described. In this process, after the first double-blind review phase, an editor can signal that they intend to publish the article conditional on satisfactory revisions. This starts the second review phase, in which authors and reviewers are no longer required to be anonymous and the authors are encouraged to publish a preprint to their article which will be linked as working paper from the journal. Finally, we introduce the four articles that, together with this Introduction, form the inaugural issue.