Posts about reproducible paper (old posts, page 22)

LHC PARAMETER REPRODUCIBILITY

This document reviews the stability of the main LHC operational parameters, namely orbit, tune, coupling and chromaticity. The analysis will be based on the LSA settings, measured parameters and real-time trims. The focus will be set on ramp and high energy reproducibility as they are more diflicult to assess and correct on a daily basis for certain parameters like chromaticity and coupling. The reproducibility of the machine in collision will be analysed in detail, in particular the beam offsets at the IPS since the ever decreasing beam sizes at the IPs make beam steering at the IP more and mode delicate.

A survey on provenance: What for? What form? What from?

Provenance refers to any information describing the production process of an end product, which can be anything from a piece of digital data to a physical object. While this survey focuses on the former type of end product, this definition still leaves room for many different interpretations of and approaches to provenance. These are typically motivated by different application domains for provenance (e.g., accountability, reproducibility, process debugging) and varying technical requirements such as runtime, scalability, or privacy. As a result, we observe a wide variety of provenance types and provenance-generating methods. This survey provides an overview of the research field of provenance, focusing on what provenance is used for (what for?), what types of provenance have been defined and captured for the different applications (what form?), and which resources and system requirements impact the choice of deploying a particular provenance solution (what from?). For each of these three key questions, we provide a classification and review the state of the art for each class. We conclude with a summary and possible future research challenges.

Jug: Software for parallel reproducible computation in Python

As computational pipelines become a bigger part of science, it is important to ensure that the results are reproducible, a concern which has come to the fore in recent years. All developed software should be able to be run automatically without any user intervention. In addition to being valuable to the wider community, which may wish to reproduce or extend a published analysis, reproducible research practices allow for better control over the project by the original authors themselves. For example, keeping a non-executable record of parameters and command line arguments leads to error-prone analysis and opens up the possibility that, when the results are to be written up for publication, the researcher will no longer be able to even completely describe the process that led to them. For large projects, the use of multiple computational cores (either in a multi-core machine or distributed across a compute cluster) is necessary to obtain results in a useful time frame. Furthermore, it is often the case that, as the project evolves, it becomes necessary to save intermediate results while down-stream analyses are designed (or redesigned) and implemented. Under many frameworks, this causes having a single point of entry for the computation becomes increasingly difficult. Jug is a software framework which addresses these issues by caching intermediate results and distributing the computational work as tasks across a network. Jug is written in Python without the use of compiled modules, is completely crossplatform, and available as free software under the liberal MIT license.

Utilising Semantic Web Ontologies To Publish Experimental Workflows

Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However,the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encapsulate experiment descriptions and components and are suitable for representing reproducibility. Additionally, they can be published alongside traditional patterns as a form of documentation for the experiment which can be combined with linked open data. For reproducibility utilising published datasets, it is necessary to declare the conditions or restrictions for permissible reuse. In this paper, we take a look at the state of workflow reproducibility through a browser based tool and a corresponding study to identify how workflows might be combined with traditional forms of documentation and publication. We also discuss the licensing aspects for data in workflows and how it can be annotated using linked open data ontologies

Enabling reproducible real-time quantitative PCR research: the RDML package

Reproducibility, a cornerstone of research, requires defined data formats, which include the set-up and output of experiments. The Real-time PCR Data Markup Language (RDML) is a recommended standard of the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines. Despite the popularity of the RDML format for analysis of qPCR data, handling of RDML files is not yet widely supported in all PCR curve analysis softwares. Results: This study describes the open source RDML package for the statistical computing language R.RDML is compatible with RDML versions ≤ 1.2 and provides functionality to (i) import RDML data; (ii) extract sample information (e.g., targets, concentration); (iii) transform data to various formats of the R environment; (iv) generate human readable run summaries; and (v) to create RDML files from user data. In addition, RDML offers a graphical user interface to read, edit and create RDML files.

The Sacred Infrastructure for Computational Research

We present a toolchain for computational research consisting of Sacred and two supporting tools. Sacred is an open source Python framework which aims to provide basic infrastructure for running computational experiments independent of the methods and libraries used. Instead, it focuses on solving universal everyday problems, such as managing configurations, reproducing results, and bookkeeping. Moreover, it provides an extensible basis for other tools, two of which we present here: Labwatch helps with tuning hyperparameters, and Sacredboard provides a web-dashboard for organizing and analyzing runs and results.