Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows

Introduction: As biomedical research becomes more data-intensive, computational reproducibility is a growing area of importance. Unfortunately, many biomedical researchers have not received formal computational training and often struggle to produce results that can be reproduced using the same data, code, and methods. Programming workshops can be a tool to teach new computational methods, but it is not always clear whether researchers are able to use their new skills to make their work more computationally reproducible. Methods: This mixed methods study consisted of in-depth interviews with 14 biomedical researchers before and after participation in an introductory programming workshop. During the interviews, participants described their research workflows and responded to a quantitative checklist measuring reproducible behaviors. The interview data was analyzed using a thematic analysis approach, and the pre and post workshop checklist scores were compared to assess the impact of the workshop on computational reproducibility of the researchers' workflows. Results: Pre and post scores on a checklist of reproducible behaviors did not increase in a statistically significant manner. The qualitative interviews revealed that several participants had made small changes to their workflows including switching to open source programming languages for their data cleaning, analysis, and visualization. Overall many of the participants indicated higher levels of programming literacy and an interest in further training. Factors that enabled change included supportive environments and an immediate research need, while barriers included collaborators that were resistant to new tools and a lack of time. Conclusion: While none of the participants completely changed their workflows, many of them did incorporate new practices, tools, or methods that helped make their work more reproducible and transparent to other researchers. This indicate that programming workshops now offered by libraries and other organizations contribute to computational reproducibility training for researchers

A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results. However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists' advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing. This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.

Leveraging Container Technologies in a GIScience Project: A Perspective from Open Reproducible Research

Scientific reproducibility is essential for the advancement of science. It allows the results of previous studies to be reproduced, validates their conclusions and develops new contributions based on previous research. Nowadays, more and more authors consider that the ultimate product of academic research is the scientific manuscript, together with all the necessary elements (i.e., code and data) so that others can reproduce the results. However, there are numerous difficulties for some studies to be reproduced easily (i.e., biased results, the pressure to publish, and proprietary data). In this context, we explain our experience in an attempt to improve the reproducibility of a GIScience project. According to our project needs, we evaluated a list of practices, standards and tools that may facilitate open and reproducible research in the geospatial domain, contextualising them on Peng’s reproducibility spectrum. Among these resources, we focused on containerisation technologies and performed a shallow review to reflect on the level of adoption of these technologies in combination with OSGeo software. Finally, containerisation technologies proved to enhance the reproducibility and we used UML diagrams to describe representative work-flows deployed in our GIScience project.

Publishing computational research -- A review of infrastructures for reproducible and transparent scholarly communication

Funding agencies increasingly ask applicants to include data and software management plans into proposals. In addition, the author guidelines of scientific journals and conferences more often include a statement on data availability, and some reviewers reject unreproducible submissions. This trend towards open science increases the pressure on authors to provide access to the source code and data underlying the computational results in their scientific papers. Still, publishing reproducible articles is a demanding task and not achieved simply by providing access to code scripts and data files. Consequently, several projects develop solutions to support the publication of executable analyses alongside articles considering the needs of the aforementioned stakeholders. The key contribution of this paper is a review of applications addressing the issue of publishing executable computational research results. We compare the approaches across properties relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations. The review can support publishers to decide which system to integrate into their submission process, editors to recommend tools for researchers, and authors of scientific papers to adhere to reproducibility principles.

Use of study design principles would increase the reproducibility of reviews in conservation biology

Despite the importance of reviews and syntheses in advancing our understanding of the natural world and informing conservation policy, they frequently are not conducted with the same careful methods as primary studies. This discrepancy can lead to controversy over review conclusions because the methods employed to gather evidence supporting the conclusions are not reproducible. To illustrate this problem, we assessed whether the methods of reviews involved in two recent controversies met the common scientific standard of being reported in sufficient detail to be repeated by an independent researcher. We found that none of the reviews were repeatable by this standard. Later stages of the review process, such as quantitative analyses, were generally described well, but the more fundamental, data-gathering stage was not fully described in any of the reviews. To address the irreproducibility of review conclusions, we believe that ecologists and conservation biologists should recognize that literature searches for reviews are a data gathering exercise and apply the same rigorous study design principles and reporting standards that they would use for primary studies.