The University of Minnesota Libraries addressed this issue head-on this year by launching the reproducibility portal in an effort to help faculty and others on campus improve their research practices. The portal is a collaboration that includes Liberal Arts Technology and Information Services (LATIS) and the Minnesota Supercomputing Institute (MSI).
Reproducible Science Promoting Open Science
A preview of ReproZip by Vicky Steeves at the PresQT Workshop May 1, 2017 at the University of Notre Dame.
Introduce the python environment wrapper and packing tools; virtualenv & pip. Show you how you can stay up to date by using in requires.io egg security and update checking. Cover Fabric a python deployment tool and wider systems and workflow replication with Vagrant and Reprozip.If time allowing touch upon test driven development and adding Travis to your project.
Accumulating evidence suggests that many findings in psychological science and cognitive neuroscience may prove difficult to reproduce; statistical power in brain imaging studies is low, and has not improved recently; software errors in common analysis tools are common, and can go undetected for many years; and, a few large scale studies notwithstanding, open sharing of data, code, and materials remains the rare exception. At the same time, there is a renewed focus on reproducibility, transparency, and openness as essential core values in cognitive neuroscience. The emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology reflects this new sensibility. We review evidence that the field has begun to embrace new open research practices, and illustrate how these can begin to address problems of reproducibility, statistical power, and transparency in ways that will ultimately accelerate discovery.
Background: The reproducibility of research is essential to rigorous science, yet significant concerns of the reliability and verifiability of biomedical research have been recently highlighted. Ongoing efforts across several domains of science and policy are working to clarify the fundamental characteristics of reproducibility and to enhance the transparency and accessibility of research. Methods: The aim of the proceeding work is to develop an assessment tool operationalizing key concepts of research transparency in the biomedical domain, specifically for secondary biomedical data research using electronic health record data. The tool (RepeAT) was developed through a multi-phase process that involved coding and extracting recommendations and practices for improving reproducibility from publications and reports across the biomedical and statistical sciences, field testing the instrument, and refining variables. Results: RepeAT includes 103 unique variables grouped into five categories (research design and aim, database and data collection methods, data mining and data cleaning, data analysis, data sharing and documentation). Preliminary results in manually processing 40 scientific manuscripts indicate components of the proposed framework with strong inter-rater reliability, as well as directions for further research and refinement of RepeAT. Conclusions: The use of RepeAT may allow the biomedical community to have a better understanding of the current practices of research transparency and accessibility among principal investigators. Common adoption of RepeAT may improve reporting of research practices and the availability of research outputs. Additionally, use of RepeAT will facilitate comparisons of research transparency and accessibility across domains and institutions.
Amidst the recent flood of concerns about transparency and reproducibility in the behavioral and clinical sciences, we suggest a simple, inexpensive, easy-to-implement, and uniquely powerful tool to improve the reproducibility of scientific research and accelerate progress—video recordings of experimental procedures. Widespread use of video for documenting procedures could make moot disagreements about whether empirical replications truly reproduced the original experimental conditions. We call on researchers, funders, and journals to make commonplace the collection and open sharing of video-recorded procedures.
Do researchers need a new "Craigslist?" We were recently alerted to a new online platform called StudySwap by one of its creators, who said it was partially inspired by one of our posts. The platform creates an "online marketplace" that previous researchers have called for, connecting scientists with willing partners – such as a team looking for someone to replicate its results, and vice versa. As co-creators Christopher Chartier at Ashland University and Randy McCarthy at Northern Illinois University tell us, having a place where researchers can find each other more efficiently "is in everyone’s best interest."
Currently, many scientific fields such as psychology or biomedicine face a methodological crisis concerning the reproducibility, replicability and validity of their research. In neuroimaging, similar methodological concerns have taken hold of the field and researchers are working frantically towards finding solutions for the methodological problems specific to neuroimaging. This paper examines some ethical and legal implications of this methodological crisis in neuroimaging. With respect to ethical challenges, the paper discusses the impact of flawed methods in neuroimaging research in cognitive and clinical neuroscience, particulyrly with respect to faulty brain-based models of human cognition, behavior and personality. Specifically examined is whether such faulty models, when they are applied to neurological or psychiatric diseases, could put patients at risk and whether this places special obligations upon researchers using neuroimaging. In the legal domain, the actual use of neuroimaging as evidence in U.S. courtrooms is surveyed, followed by an examination of ways the methodological problems may create challenges for the criminal justice system. Finally, the paper reviews and promotes some promising ideas and initiatives from within the neuroimaging community for addressing the methodological problems.
While linguists have always relied on language data, they have not always facilitated access to those data. Linguistic publications typically include short excerpts from data sets, ordinarily consisting of fewer than five words, and often without citation. Where citations are provided, the connection to the data set is usually only vaguely identified. An excerpt might be given a citation which refers to the name of the text from which it was extracted, but in practice the reader has no way to access that text. That is, in spite of the potential generated by recent shifts in the field, a great deal of linguistic research created today is not reproducible, either in principle or in practice. The workshops and panel presentation will facilitate development of standards for the curation and citation of linguistics data that are responsive to these changing conditions and shift the field of linguistics toward a more scientific, data-driven model which results in reproducible research.
Data are fundamental to the field of linguistics. Examples drawn from natural languages provide a foundation for claims about the nature of human language, and validation of these linguistic claims relies crucially on these supporting data. Yet, while linguists have always relied on language data, they have not always facilitated access to those data. Publications typically include only short excerpts from data sets, and where citations are provided, the connections to the data sets are usually only vaguely identified. At the same time, the field of linguistics has generally viewed the value of data without accompanying analysis with some degree of skepticism, and thus linguists have murky benchmarks for evaluating the creation, curation, and sharing of data sets in hiring, tenure and promotion decisions.This disconnect between linguistics publications and their supporting data results in much linguistic research being unreproducible, either in principle or in practice. Without reproducibility, linguistic claims cannot be readily validated or tested, rendering their scientific value moot. In order to facilitate the development of reproducible research in linguistics, The Linguistics Data Interest Group plans to develop the discipline-wide adoption of common standards for data citation and attribution. In our parlance citation refers to the practice of identifying the source of linguistic data, and attribution refers to mechanisms for assessing the intellectual and academic value of data citations.