Reproducibility should be a cornerstone of scientific research and is a growing concern among the scientific community and the public. Understanding how to design services and tools that support documentation, preservation and sharing is required to maximize the positive impact of scientific research. We conducted a study of user attitudes towards systems that support data preservation in High Energy Physics, one of science's most data-intensive branches. We report on our interview study with 12 experimental physicists, studying requirements and opportunities in designing for research preservation and reproducibility. Our findings suggest that we need to design for motivation and benefits in order to stimulate contributions and to address the observed scalability challenge. Therefore, researchers' attitudes towards communication, uncertainty, collaboration and automation need to be reflected in design. Based on our findings, we present a systematic view of user needs and constraints that define the design space of systems supporting reproducible practices.
Web technology has transformed our lives, and has led to a paradigm shift in the computational sciences. As the neuroimaging informatics research community amasses large datasets to answer complex neuroscience questions, we find that the web is the best medium to facilitate novel insights by way of improved collaboration and communication. Here, we review the landscape of web technologies used in neuroimaging research, and discuss future applications, areas for improvement, and the limitations of using web technology in research. Fully incorporating web technology in our research lifecycle requires not only technical skill, but a widespread culture change; a shift from the small, focused "wet lab" to a multidisciplinary and largely collaborative "web lab."
Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing. While there are good examples if the use of scientific workflows in bioinformatics, medical informatics, biomedical imaging and data analysis, there are fewer examples in multi-scale computational modeling in general and cardiac electrophysiology in particular. Cardiac electrophysiology simulation is a mature area of multi-scale computational biology that serves as an excellent use case for developing and testing new scientific workflows. In this article, we develop, describe and test a computational workflow that serves as a proof of concept of a platform for the robust integration and implementation of a reusable and reproducible multi-scale cardiac cell and tissue model that is expandable, modular and portable. The workflow described leverages Python and Kepler-Python actor for plotting and pre/post-processing. During all stages of the workflow design, we rely on freely available open-source tools, to make our workflow freely usable by scientists.
There is broad interest to improve the reproducibility of published research. We developed a survey tool to assess the availability of digital research artifacts published alongside peer-reviewed journal articles (e.g. data, models, code, directions for use) and reproducibility of article results. We used the tool to assess 360 of the 1,989 articles published by six hydrology and water resources journals in 2017. Like studies from other fields, we reproduced results for only a small fraction of articles (1.6% of tested articles) using their available artifacts. We estimated, with 95% confidence, that results might be reproduced for only 0.6% to 6.8% of all 1,989 articles. Unlike prior studies, the survey tool identified key bottlenecks to making work more reproducible. Bottlenecks include: only some digital artifacts available (44% of articles), no directions (89%), or all artifacts available but results not reproducible (5%). The tool (or extensions) can help authors, journals, funders, and institutions to self-assess manuscripts, provide feedback to improve reproducibility, and recognize and reward reproducible articles as examples for others.
Scripts and Scientific Workflow Management Systems (SWfMSs) are common approachesthat have been used to automate the execution flow of processes and data analysis in scien-tific (computational) experiments. Although widely used in many disciplines, scripts arehard to understand, adapt, reuse, and reproduce. For this reason, several solutions havebeen proposed to aid experiment reproducibility for script-based environments. However,they neither allow to fully document the experiment nor do they help when third partieswant to reuse just part of the code. SWfMSs, on the other hand, help documentationand reuse by supporting scientists in the design and execution of their experiments, whichare specified and run as interconnected (reusable) workflow components (a.k.a. buildingblocks). While workflows are better than scripts for understandability and reuse, they stillrequire additional documentation. During experiment design, scientists frequently createworkflow variants, e.g., by changing workflow components. Reuse and reproducibilityrequire understanding and tracking variant provenance, a time-consuming task. This the-sis aims to support reproducibility and reuse of computational experiments. To meetthese challenges, we address two research problems: (1) understanding a computationalexperiment, and (2) extending a computational experiment. Our work towards solvingthese problems led us to choose workflows and ontologies to answer both problems. Themain contributions of this thesis are thus: (i) to present the requirements for the con-version of script to reproducible research; (ii) to propose a methodology that guides thescientists through the process of conversion of script-based experiments into reproducibleworkflow research objects; (iii) to design and implement features for quality assessmentof computational experiments; (iv) to design and implement W2Share, a framework tosupport the conversion methodology, which exploits tools and standards that have beendeveloped by the scientific community to promote reuse and reproducibility; (v) to designand implement OntoSoft-VFF, a framework for capturing information about software andworkflow components to support scientists manage workflow exploration and evolution.Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and WeatherForecasting
The computational reproducibility of analytic results has been discussed and evaluated in many different scientific disciplines, all of which have one finding in common: analytic results are far too often not reproducible. There are numerous examples of reproducibility guidelines for various applications, however, a comprehensive assessment tool for evaluating the individual components of the research pipeline was unavailable. To address this need, COS developed the ReproRubric, which defines multiple Tiers of reproducibility based on criteria established for each critical stage of the typical research workflow - from initial design of the experiment through final reporting of the results.