One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution.
Prof. Lorena Barba has just posted a reading list for reproducible research that includes ten key papers to understand reproducibility.
The way science journals present research must be rehabilitated or risk becoming obsolete, causing foreseeable negative consequences to research funding and pro-ductivity. Researchers are dealing with ever- increasing complexities, and as techniques and solutions become more involved, so too does the task of describing them. Unfortunately, simply explaining a technique with text does not always paint a clear enough picture. Scientific publishing has followed essentially the same model since the original scientific journal was published in the mid-seventeenth century. Thanks to advances in technology, we have seen some minor improvements such as the addition of color printing and better dissemination and search functionality through online cataloging. But what has actually changed? In truth, not all that much. Articles are still published as text heavy-tomes with the occasional pho-tograph or chart to demonstrate a point.
Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by ex-amining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow andtask images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures.
We know now that much health and medical research which is published in peer-reviewed journals is wrong, and consequently much is unable to be replicated.[2-4] This is due in part to poor research practice, biases in publication, and simply a pressure to publish in order to ‘survive’. Cognitive biases that unreasonably wed to our hypotheses and results are to blame. Strongly embedded in our culture of health and medical research is the natural selection of poor science practice driven by the dependence for survival on high rates of publication in academic life. It is a classic form of cultural evolution along Darwinian lines.[6, 7] Do not think that even publications in the most illustrious medical journal are immune from these problems: the COMPare project reveals that more than 85% of large randomised controlled trials deviate seriously from their plan when the trial was registered prior to its start. An average of more than five new outcome measures was secretly added to the publication and a similar number of nominated outcomes were silently omitted. It is hardly far-fetched to propose that this drive to publish is contributing to the growth in the number of papers retracted from the literature for dubious conduct along with the increasing number of cases of research misconduct.
Columbia University and other New York City research institutions, including NYU, are hosting a one-day symposium on December 9, 2016 to showcase a robust discussion of reproducibility and research integrity among leading experts, high-profile journal editors, funders and researchers. This program will reveal the "inside story" of how issues are handled by institutions, journals and federal agencies and offer strategies for responding to challenges in these areas. The stimulating and provacative program is for researchers at all stages of their careers.