Posts about reproducible paper (old posts, page 3)

The reproducibility debate is an opportunity, not a crisis

There are many factors that contribute to the reproducibility and replicability of scientific research. There is a need to understand the research ecosystem, and improvements will require combined efforts across all parts of this ecosystem. National structures can play an important role in coordinating these efforts, working collaboratively with researchers, institutions, funders, publishers, learned societies and other sectoral organisations, and providing a monitoring and reporting function. Whilst many new ways of working and emerging innovations hold a great deal of promise, it will be important to invest in meta-research activity to ensure that these approaches are evidence based, work as intended, and do not have unintended consequences. Addressing reproducibility will require working collaboratively across the research ecosystem to share best practice and to make the most effective use of resources. The UK Reproducibility Network (UKRN) brings together Local Networks of researchers, Institutions, and External Stakeholders (funders, publishers, learned societies and other sectoral organisations), to coordinate action on reproducibility and work to ensure the UK retains its place as a centre for world-leading research. This activity is coordinated by the UKRN Steering Group. We consider this structure as valuable, bringing together a range of voices at a range of levels to support the combined efforts required to enact change.

End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach

We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps.

Improving research quality: the view from the UK Reproducibility Network institutional leads for research improvement

The adoption and incentivisation of open and transparent research practices is critical in addressing issues around research reproducibility and research integrity. These practices will require training and funding. Individuals need to be incentivised to adopt open and transparent research practices (e.g., added as desirable criteria in hiring, probation, and promotion decisions, recognition that funded research should be conducted openly and transparently, the importance of publishers mandating the publication of research workflows and appropriately curated data associated with each research output). Similarly, institutions need to be incentivised to encourage the adoption of open and transparent practices by researchers. Research quality should be prioritised over research quantity. As research transparency will look different for different disciplines, there can be no one-size-fits-all approach. An outward looking and joined up UK research strategy is needed that places openness and transparency at the heart of research activity. This should involve key stakeholders (institutions, research organisations, funders, publishers, and Government) and crucially should be focused on action. Failure to do this will have negative consequences not just for UK research, but also for our ability to innovate and subsequently commercialise UK-led discovery.

Reproducible and Portable Big Data Analytics in the Cloud

Cloud computing has become a major approach to enable reproducible computational experiments because of its support of on-demand hardware and software resource provisioning. Yet there are still two main difficulties in reproducing big data applications in the cloud. The first is how to automate end-to-end execution of big data analytics in the cloud including virtual distributed environment provisioning, network and security group setup, and big data analytics pipeline description and execution. The second is an application developed for one cloud, such as AWS or Azure, is difficult to reproduce in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automatic scalable big data application execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. Based on the approach, we propose and develop an open-source toolkit that supports 1) on-demand distributed hardware and software environment provisioning, 2) automatic data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproducibility of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using three big data analytics applications that run on a virtual CPU/GPU cluster. Three main behaviors of our toolkit were benchmarked: i) execution overhead ratio for reproducibility support, ii) differences of reproducing the same application on AWS and Azure in terms of execution time, budgetary cost and cost-performance ratio, iii) differences between scale-out and scale-up approach for the same application on AWS and Azure.