Talk on open science for computational sciences.
The Target Article by Lee et al. (2019) highlights the ways in which ongoing concerns about research reproducibility extend to model-based approaches in cognitive science. Whereas Lee et al. focus primarily on the importance of research practices to improve model robustness, we propose that the transparent sharing of model specifications, including their inputs and outputs, is also essential to improving the reproducibility of model-based analyses. We outline an ongoing effort (within the context of the Brain Imaging Data Structure community) to develop standards for the sharing of the structure of computational models and their outputs.
The main criticism of my piece in ref (2) seems to be that my calculations rely on testing a point null hypothesis, i.e. the hypothesis that the true effect size is zero. He objects to my contention that the true effect size can be zero, "just give the same pill to both groups", on the grounds that two pills can't be exactly identical. He then says "I understand that this criticism may come across as frivolous semantic pedantry of no practical consequence: of course that the author meant to say 'pills with the same contents' as everybody would have understood". Yes, that is precisely how it comes across to me. I shall try to explain in more detail why I think that this criticism has little substance.
Computational Communication Research (CCR) is a new open access journal dedicated to publishing high quality computational research in communication science. This editorial introduction describes the role that we envision for the journal. First, we explain what computational communication science is and why a new journal is needed for this subfield. Then, we elaborate on the type of research this journal seeks to publish, and stress the need for transparent and reproducible science. The relation between theoretical development and computational analysis is discussed, and we argue for the value of null-findings and risky research in additive science. Subsequently, the (experimental) two-phase review process is described. In this process, after the first double-blind review phase, an editor can signal that they intend to publish the article conditional on satisfactory revisions. This starts the second review phase, in which authors and reviewers are no longer required to be anonymous and the authors are encouraged to publish a preprint to their article which will be linked as working paper from the journal. Finally, we introduce the four articles that, together with this Introduction, form the inaugural issue.
Ongoing technological developments have made it easier than ever before for scientists to share their data, materials, and analysis code. Sharing data and analysis code makes it easier for other researchers to re-use or check published research. These benefits will only emerge if researchers can reproduce the analysis reported in published articles, and if data is annotated well enough so that it is clear what all variables mean. Because most researchers have not been trained in computational reproducibility, it is important to evaluate current practices to identify practices that can be improved. We examined data and code sharing, as well as computational reproducibility of the main results without contacting the original authors, for Registered Reports published in the in psychological literature between 2014 and 2018. Of the 62 articles that met our inclusion criteria data was available for 40 articles, and analysis scripts for 43 articles. For the 35 articles that shared both data and code and performed analyses in SPSS, R, or JASP, we could run the scripts for 30 articles, and reproduce the main results for 19 articles. Although the percentage of articles that shared both data and code (61%) and articles that could be computationally reproduced (54%) was relatively high compared to other studies, there is clear room for improvement. We provide practices recommendations based on our observations, and link to examples of good research practices in the papers we reproduced.
Data processing in data intensive scientific fields likebioinformatics is automated to a great extent. Among others,automation is achieved with workflow engines that execute anexplicitly stated sequence of computations. Scientists can usethese workflows through science gateways or they develop themby their own. In both cases they may have to preprocess their raw data and also may want to further process the workflowoutput. The scientist has to take care about provenance of thewhole data processing pipeline. This is not a trivial task dueto the diverse set of computational tools and environments usedduring the transformation of raw data to the final results. Thuswe created a metadata schema to provide provenance for dataprocessing pipelines and implemented a tool that creates this metadata during the execution of typical scientific computations.