KDD 2017 Research Papers New Reproducibility Policy

Reproducibility: Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. Authors are strongly encouraged to make their code and data publicly available whenever possible. Algorithms and resources used in a paper should be described as completely as possible to allow reproducibility. This includes experimental methodology, empirical evaluations, and results. The reproducibility factor will play an important role in the assessment of each submission.

From old York to New York: PASIG 2016

One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution.

From old York to New York: PASIG 2016

One of the most valuable talks of the day for me was from Fernando Chirigati from New York University. He introduced us to a useful new tool called ReproZip. He made the point that the computational environment is as important as the data itself for the reproducibility of research data. This could include information about libraries used, environment variables and options. You can not expect your depositors to find or document all of the dependencies (or your future users to install them). What ReproZip does is package up all the necessary dependencies along with the data itself. This package can then be archived and re-used in the future. ReproZip can also be used to unpack and re-use the data in the future. I can see a very real use case for this for researchers within our institution.

A University Symposium: Promoting Credibility, Reproducibility and Integrity in Research

Columbia University and other New York City research institutions, including NYU, are hosting a one-day symposium on December 9, 2016 to showcase a robust discussion of reproducibility and research integrity among leading experts, high-profile journal editors, funders and researchers. This program will reveal the "inside story" of how issues are handled by institutions, journals and federal agencies and offer strategies for responding to challenges in these areas. The stimulating and provacative program is for researchers at all stages of their careers.

Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research

The Alan Turing Institute Symposium on Reproducibility for Data-Intensive Research was held on 6th - 7th April 2016 at the University of Oxford. It was organised by senior academics, publishers and library professionals representing the Alan Turing Institute (ATI) joint venture partners (the universities of Cambridge, Edinburgh, Oxford, UCL and Warwick), the University of Manchester, Newcastle University and the British Library. The key aim of the symposium was to address the challenges around reproducibility of data-intensive research in science, social science and the humanities. This report presents an overview of the discussions and makes some recommendations for the ATI to take forwards.

Report from the first CRN coding sprint

Two weeks ago (1st-4th of August 2016) we hosted a coding sprint at Stanford aimed at making neuroimaging data processing and analysis tools more portable and accessible. You might have heard about BIDS – it is a new standard for organizing and describing neuroimaging datasets that we have recently proposed. Containers (also known as “operating-system-level virtualization”) are very lightweight virtual machines that can encapsulate any piece of code along with all of the libraries necessary to run it. Docker and Singularity are two examples of container technologies. The reason we are so excited about containers for reproducible data analysis is that they provide a way to package a piece of software which can run in the same way across many different computing platforms, from a laptop to a supercomputer. Creating containerized and BIDS-aware versions of all of the major neuroimaging analysis packages is critical to our center’s mission: providing data analysis as an free and open service to incentivize researchers to share data.

1st International Workshop on Reproducible Open Science (RepScience2016)

This Workshop aims at becoming a forum to discuss ideas and advancements towards the revision of current scientific communication practices in order to support Open Science, introduce novel evaluation schemes, and enable reproducibility. As such it candidates as an event fostering collaboration between (i) Library and information scientists working on the identification of new publication paradigms; (ii) ICT scientists involved in the definition of new technical solutions to these issues; (iii) scientists/researchers who actually conduct the research and demand tools and practices for Open Science. The expected results are advancements in the definition of the next generation scientific communication ecosystem, where scientists can publish research results (including the scientific article, the data, the methods, and any “alternative” product that may be relevant to the conducted research) in order to enable reproducibility (effective reuse and decrease of cost of science) and rely on novel scientific reward practices.