Posts about reproducible paper (old posts, page 4)

The Dagstuhl Beginners Guide to Reproducibility for Experimental Networking Research

Reproducibility is one of the key characteristics of good science, but hard to achieve for experimental disciplines like Internet measurements and networked systems. This guide provides advice to researchers, particularly those new to the field, on designing experiments so that their work is more likely to be reproducible and to serve as a foundation for follow-on work by others.

The Reproducibility of Economics Research: A Case Study

Published reproductions or replications of economics research are rare. However, recent years have seen increased recognition of the important role of replication in the scientific endeavor. We describe and present the results of a large reproduction exercise in which we assess the reproducibility of research articles published in the American Economic Journal: Applied Economics over the last decade. 69 of 162 eligible replication attempts successfuly replicated the article’s analysis 42.6%. A further 68 (42%) were at least partially successful. A total of 98 out of 303 (32.3%) relied on confidential or proprietary data, and were thus not reproducible by this project. We also conduct several bibliometric analyses of reproducible vs. non-reproducible articles.

A Reaction Norm Perspective on Reproducibility

Reproducibility in biomedical research, and more specifically in preclinical animal research, has been seriously questioned. Several cases of spectacular failures to replicate findings published in the primary scientific literature have led to a perceived reproducibility crisis. Diverse threats to reproducibility have been proposed, including lack of scientific rigour, low statistical power, publication bias, analytical flexibility and fraud. An important aspect that is generally overlooked is the lack of external validity caused by rigorous standardization of both the animals and the environment. Here, we argue that a reaction norm approach to pheno- typic variation, acknowledging gene-by-environment interactions, can help us seeing reproducibility of animal experiments in a new light. We illustrate how dominating environmental effects can affect inference and effect size estimates of studies and how elimination of dominant factors through standardization affects the nature of the expected phenotype variation. We do this by introducing a construct that we dubbed the reaction norm of small effects. Finally, we discuss the consequences of a reaction norm of small effects for statistical analysis, specifically for random effect latent variable models and the random lab model.

The Costs of Reproducibility

Improving the reproducibility of neuroscience research is of great concern, especially to early-career researchers (ECRs). Here I outline the potential costs for ECRs in adopting practices to improve reproducibility. I highlight the ways in which ECRs can achieve their career goals while doing better science and the need for established researchers to support them in these efforts.

Towards an Open (Data) Science Analytics-Hub for Reproducible Multi-Model Climate Analysis at Scale

Open Science is key to future scientific research and promotes a deep transformation in the whole scientific research process encouraging the adoption of transparent and collaborative scientific approaches aimed at knowledge sharing. Open Science is increasingly gaining attention in the current and future research agenda worldwide. To effectively address Open Science goals, besides Open Access to results and data, it is also paramount to provide tools or environments to support the whole research process, in particular the design, execution and sharing of transparent and reproducible experiments, including data provenance (or lineage) tracking. This work introduces the Climate Analytics-Hub, a new component on top of the Earth System Grid Federation (ESGF), which joins big data approaches and parallel computing paradigms to provide an Open Science environment for reproducible multi-model climate change data analytics experiments at scale. An operational implementation has been set up at the SuperComputing Centre of the Euro-Mediterranean Center on Climate Change, with the main goal of becoming a reference Open Science hub in the climate community regarding the multi-model analysis based on the Coupled Model Intercomparison Project (CMIP).

A Practical Roadmap for Provenance Capture and Data Analysis in Spark-based Scientific Workflows

Whenever high-performance computing applications meet data-intensive scalable systems, an attractive approach is the use of Apache Spark for the management of scientific workflows. Spark provides several advantages such as being widely supported and granting efficient in-memory data management for large-scale applications. However, Spark still lacks support for data tracking and workflow provenance. Additionally, Spark’s memory management requires accessing all data movements between the workflow activities. Therefore, the running of legacy programs on Spark is interpreted as a "black-box" activity, which prevents the capture and analysis of implicit data movements. Here, we present SAMbA, an Apache Spark extension for the gathering of prospective and retrospective provenance and domain data within distributed scientific workflows. Our approach relies on enveloping both RDD structure and data contents at runtime so that (i) RDD-enclosure consumed and produced data are captured and registered by SAMbA in a structured way, and (ii) provenance data can be queried during and after the execution of scientific workflows. By following the W3C PROV representation, we model the roles of RDD regarding prospective and retrospective provenance data. Our solution provides mechanisms for the capture and storage of provenance data without jeopardizing Spark’s performance. The provenance retrieval capabilities of our proposal are evaluated in a practical case study, in which data analytics are provided by several SAMbA parameterizations.