Scripts and Scientific Workflow Management Systems (SWfMSs) are common approachesthat have been used to automate the execution flow of processes and data analysis in scien-tific (computational) experiments. Although widely used in many disciplines, scripts arehard to understand, adapt, reuse, and reproduce. For this reason, several solutions havebeen proposed to aid experiment reproducibility for script-based environments. However,they neither allow to fully document the experiment nor do they help when third partieswant to reuse just part of the code. SWfMSs, on the other hand, help documentationand reuse by supporting scientists in the design and execution of their experiments, whichare specified and run as interconnected (reusable) workflow components (a.k.a. buildingblocks). While workflows are better than scripts for understandability and reuse, they stillrequire additional documentation. During experiment design, scientists frequently createworkflow variants, e.g., by changing workflow components. Reuse and reproducibilityrequire understanding and tracking variant provenance, a time-consuming task. This the-sis aims to support reproducibility and reuse of computational experiments. To meetthese challenges, we address two research problems: (1) understanding a computationalexperiment, and (2) extending a computational experiment. Our work towards solvingthese problems led us to choose workflows and ontologies to answer both problems. Themain contributions of this thesis are thus: (i) to present the requirements for the con-version of script to reproducible research; (ii) to propose a methodology that guides thescientists through the process of conversion of script-based experiments into reproducibleworkflow research objects; (iii) to design and implement features for quality assessmentof computational experiments; (iv) to design and implement W2Share, a framework tosupport the conversion methodology, which exploits tools and standards that have beendeveloped by the scientific community to promote reuse and reproducibility; (v) to designand implement OntoSoft-VFF, a framework for capturing information about software andworkflow components to support scientists manage workflow exploration and evolution.Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and WeatherForecasting
The computational reproducibility of analytic results has been discussed and evaluated in many different scientific disciplines, all of which have one finding in common: analytic results are far too often not reproducible. There are numerous examples of reproducibility guidelines for various applications, however, a comprehensive assessment tool for evaluating the individual components of the research pipeline was unavailable. To address this need, COS developed the ReproRubric, which defines multiple Tiers of reproducibility based on criteria established for each critical stage of the typical research workflow - from initial design of the experiment through final reporting of the results.
The contemporary scientific community places a growing emphasis on the reproducibility of research. The projects R package is a free, open-source endeavor created in the interest of facilitating reproducible research workflows. It adds to existing software tools for reproducible research and introduces several practical features that are helpful for scientists and their collaborative research teams. For each individual project, it supplies an intuitive framework for storing raw and cleaned study data sets, and provides script templates for protocol creation, data cleaning, data analysis and manuscript development. Internal databases of project and author information are generated and displayed, and manuscript title pages containing author lists and their affiliations are automatically generated from the internal database. File management tools allow teams to organize multiple projects. When used on a shared file system, multiple researchers can harmoniously contribute to the same project in a less punctuated manner, reducing the frequency of misunderstandings and the need for status updates.
Most efforts to estimate the reproducibility of published findings have focused on specific areas of research, even though science is usually assessed and funded on a regional or national basis. Here we describe a project to assess the reproducibility of findings in biomedical science published by researchers based in Brazil. The Brazilian Reproducibility Initiative is a systematic, multi-center effort to repeatbetween 60 and 100 experiments: theproject will focus on a set of common laboratory methods, repeating each experiment in three different laboratories. The results, due in 2021, will allow us to estimate the level of reproducibility of biomedical sciencein Brazil, and to investigate what the published literature can tell us about the reproducibility ofresearch in a given area.
Docker seems to be an attractive solution for cloud database benchmarking as it simplifies the setup process through pre-built images that are portable and simple to maintain. However, the usage of Docker for benchmarking is only valid if there is no effect on measurement results. Existing work has so far only focused on the performance overheads that Docker directly induces for specific applications. In this paper, we have studied indirect effects of dockerization on the results of database benchmarking. Among others, our results clearly show that containerization has a measurable and non-constant influence on measurement results and should, hence, only be used after careful analysis.
These 10 simple rules should not be limited to molecular dynamics but also include Monte Carlo simulations, quantum mechanics calculations, molecular docking, and any other computational methods involving computations on biological molecules.