While linguists have always relied on language data, they have not always facilitated access to those data. Linguistic publications typically include short excerpts from data sets, ordinarily consisting of fewer than five words, and often without citation. Where citations are provided, the connection to the data set is usually only vaguely identified. An excerpt might be given a citation which refers to the name of the text from which it was extracted, but in practice the reader has no way to access that text. That is, in spite of the potential generated by recent shifts in the field, a great deal of linguistic research created today is not reproducible, either in principle or in practice. The workshops and panel presentation will facilitate development of standards for the curation and citation of linguistics data that are responsive to these changing conditions and shift the field of linguistics toward a more scientific, data-driven model which results in reproducible research.
Data are fundamental to the field of linguistics. Examples drawn from natural languages provide a foundation for claims about the nature of human language, and validation of these linguistic claims relies crucially on these supporting data. Yet, while linguists have always relied on language data, they have not always facilitated access to those data. Publications typically include only short excerpts from data sets, and where citations are provided, the connections to the data sets are usually only vaguely identified. At the same time, the field of linguistics has generally viewed the value of data without accompanying analysis with some degree of skepticism, and thus linguists have murky benchmarks for evaluating the creation, curation, and sharing of data sets in hiring, tenure and promotion decisions.This disconnect between linguistics publications and their supporting data results in much linguistic research being unreproducible, either in principle or in practice. Without reproducibility, linguistic claims cannot be readily validated or tested, rendering their scientific value moot. In order to facilitate the development of reproducible research in linguistics, The Linguistics Data Interest Group plans to develop the discipline-wide adoption of common standards for data citation and attribution. In our parlance citation refers to the practice of identifying the source of linguistic data, and attribution refers to mechanisms for assessing the intellectual and academic value of data citations.
Supplements are increasingly important to the scientific record, particularly in genomics. However, they are often underutilized. Optimally, supplements should make results findable, accessible, interoperable, and reusable (i.e., “FAIR”). Moreover, properly off-loading to them the data and detail in a paper could make the main text more readable. We propose a hierarchical organization for supplements, with some parts paralleling and “shadowing” the main text and other elements branching off from it, and we suggest a specific formatting to make this structure explicit. Furthermore, sections of the supplement could be presented in multiple scientific “dialects”, including machine-readable and lay-friendly formats.
In summary, my little experiment has shown that reproducibility of Python scripts requires preserving the original environment, which fortunately is not so difficult over a time span of four years, at least if everything you need is part of the Anaconda distribution. I am not sure I would have had the patience to reinstall everything from source, given an earlier bad experience. The purely computational part of my code was even surprisingly robust under updates in its dependencies. But the plotting code wasn’t, as matplotlib has introduced backwards-incompatible changes in a widely used function. Clearly the matplotlib team prepared this carefully, introducing a deprecation warning before introducing the breaking change. For properly maintained client code, this can probably be dealt with.
The reproducibility of scientific experiments is crucial for corroborating, consolidating and reusing new scientific discoveries. However, the constant pressure for publishing results (Fanelli, 2010) has removed reproducibility from the agenda of many researchers: in a recent survey published in Nature (with more than 1500 scientists) over 70% of the participants recognize to have failed to reproduce the work from another colleague at some point in time (Baker, 2016). Analyses from psychology and cancer biology show reproducibility rates below 40% and 10% respectively (Collaboration, 2015) (Begley & Lee, 2012). As a consequence, retractions of publications have occurred in the last years in several disciplines (Marcus & Oransky, 2014) (Rockoff, 2015), and the general public is now skeptical about scientific studies on topics like pesticides, depression drugs or flu pandemics (American, 2010).
There is a "village" of people impacting research reproducibility, such as funding panels, the IACUC and its support staff, institutional leaders, investigators, veterinarians, animal facilities, and professional journals. IACUCs can contribute to research reproducibility by ensuring that reviews of animal use requests, program self-assessments and post-approval monitoring programs are sufficiently thorough, the animal model is appropriate for testing the hypothesis, animal care and use is conducted in a manner that is compliant with external and institutional requirements, and extraneous variables are minimized. The persons comprising the village also must have a shared vision that guards against reproducibility problems while simultaneously avoids being viewed as a burden to research. This review analyzes and discusses aspects of the IACUC's "must do" and "can do" activities that impact the ability of a study to be reproduced. We believe that the IACUC, with support from and when working synergistically with other entities in the village, can contribute to minimizing unintended research variables and strengthen research reproducibility.