Reproducibility is an essential requirement for computational studies including those based on machine learning techniques. However, many machine learning studies are either not reproducible or are difficult to reproduce. In this paper, we consider what information about text mining studies is crucial to successful reproduction of such studies. We identify a set of factors that affect reproducibility based on our experience of attempting to reproduce six studies proposing text mining techniques for the automation of the citation screening stage in the systematic review process. Subsequently, the reproducibility of 30 studies was evaluated based on the presence or otherwise of information relating to the factors. While the studies provide useful reports of their results, they lack information on access to the dataset in the form and order as used in the original study (as against raw data), the software environment used, randomization control and the implementation of proposed techniques. In order to increase the chances of being reproduced, researchers should ensure that details about and/or access to information about these factors are provided in their reports.
Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an opensource implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language.
Accumulating evidence suggests that many findings in psychological science and cognitive neuroscience may prove difficult to reproduce; statistical power in brain imaging studies is low, and has not improved recently; software errors in common analysis tools are common, and can go undetected for many years; and, a few large scale studies notwithstanding, open sharing of data, code, and materials remains the rare exception. At the same time, there is a renewed focus on reproducibility, transparency, and openness as essential core values in cognitive neuroscience. The emergence and rapid growth of data archives, meta-analytic tools, software pipelines, and research groups devoted to improved methodology reflects this new sensibility. We review evidence that the field has begun to embrace new open research practices, and illustrate how these can begin to address problems of reproducibility, statistical power, and transparency in ways that will ultimately accelerate discovery.
Background: The reproducibility of research is essential to rigorous science, yet significant concerns of the reliability and verifiability of biomedical research have been recently highlighted. Ongoing efforts across several domains of science and policy are working to clarify the fundamental characteristics of reproducibility and to enhance the transparency and accessibility of research. Methods: The aim of the proceeding work is to develop an assessment tool operationalizing key concepts of research transparency in the biomedical domain, specifically for secondary biomedical data research using electronic health record data. The tool (RepeAT) was developed through a multi-phase process that involved coding and extracting recommendations and practices for improving reproducibility from publications and reports across the biomedical and statistical sciences, field testing the instrument, and refining variables. Results: RepeAT includes 103 unique variables grouped into five categories (research design and aim, database and data collection methods, data mining and data cleaning, data analysis, data sharing and documentation). Preliminary results in manually processing 40 scientific manuscripts indicate components of the proposed framework with strong inter-rater reliability, as well as directions for further research and refinement of RepeAT. Conclusions: The use of RepeAT may allow the biomedical community to have a better understanding of the current practices of research transparency and accessibility among principal investigators. Common adoption of RepeAT may improve reporting of research practices and the availability of research outputs. Additionally, use of RepeAT will facilitate comparisons of research transparency and accessibility across domains and institutions.
Amidst the recent flood of concerns about transparency and reproducibility in the behavioral and clinical sciences, we suggest a simple, inexpensive, easy-to-implement, and uniquely powerful tool to improve the reproducibility of scientific research and accelerate progress—video recordings of experimental procedures. Widespread use of video for documenting procedures could make moot disagreements about whether empirical replications truly reproduced the original experimental conditions. We call on researchers, funders, and journals to make commonplace the collection and open sharing of video-recorded procedures.
Currently, many scientific fields such as psychology or biomedicine face a methodological crisis concerning the reproducibility, replicability and validity of their research. In neuroimaging, similar methodological concerns have taken hold of the field and researchers are working frantically towards finding solutions for the methodological problems specific to neuroimaging. This paper examines some ethical and legal implications of this methodological crisis in neuroimaging. With respect to ethical challenges, the paper discusses the impact of flawed methods in neuroimaging research in cognitive and clinical neuroscience, particulyrly with respect to faulty brain-based models of human cognition, behavior and personality. Specifically examined is whether such faulty models, when they are applied to neurological or psychiatric diseases, could put patients at risk and whether this places special obligations upon researchers using neuroimaging. In the legal domain, the actual use of neuroimaging as evidence in U.S. courtrooms is surveyed, followed by an examination of ways the methodological problems may create challenges for the criminal justice system. Finally, the paper reviews and promotes some promising ideas and initiatives from within the neuroimaging community for addressing the methodological problems.