This work is a detailed companion reproducibility paper of the methods and experiments proposed in three previous works by Lastra-Díaz and García-Serrano, which introduce a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the aforementioned works. This work introduces a new representation model for taxonomies called PosetHERep, and a Java software library called Half-Edge Semantic Measures Library (HESML) based on it, which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature.
Unlike most other SAPA datasets available on Dataverse, these data are specifically tied to the reproducible manuscript entitled "The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model." Most of these files are images that should be downloaded and organized in the same location as the source .Rnw file. A few files contain data that have already been processed (and could be independently re-created using code in the .Rnw file) - these are included to shorten the processing time needed to reproduce the original document. The raw data files for most of the analyses are stored in 3 separate locations, 1 for each of the 3 samples. These are: Exploratory sample - doi:10.7910/DVN/SD7SVE Replication sample - doi:10.7910/DVN/3LFNJZ Confirmatory sample - doi:10.7910/DVN/I8I3D3 . If you have any questions about reproducing the file, please first consult the instructions in the Preface of the PDF version. Note that the .Rnw version of the file includes many annotations that are not visible in the PDF version (https://sapa-project.org/research/SPI/SPIdevelopment.pdf) and which may also be useful. If you still have questions, feel free to email me directly. Note that it is unlikely that I will be able to help with technical issues that do not relate of R, Knitr, Sweave, and LaTeX.
Many journal editors are a failing to implement their own authors’ instructions, resulting in the publication of many articles that do not meet basic standards of transparency, employ unsuitable data analysis methods and report overly optimistic conclusions. This problem is particularly acute where quantitative measurements are made and results in the publication of papers that lack scientific rigor and contributes to the concerns with regard to the reproducibility of biomedical research. This hampers research areas such as biomarker identification, as reproducing all but the most striking changes is challenging and translation to patient care rare.
We are currently in one of the most exciting times for science and engineering as we witness unprecedented growth computational and experimental capabilities to generate new data and models. To facilitate data and model sharing, and to enhance reproducibility and rigor in biomechanics research, the Journal of Biomechanics has introduced a number of tools for Content Innovation to allow presentation, sharing, and archiving of methods, models, and data in our articles. The tools include an Interactive Plot Viewer, 3D Geometric Shape and Model Viewer, Virtual Microscope, Interactive MATLAB Figure Viewer, and Audioslides. Authors are highly encouraged to make use of these in upcoming journal submissions.
Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research.
Born-digital news content is increasingly becoming the format of the first draft of history. Archiving and preserving this history is of paramount importance to the future of scholarly research, but many technical, legal, financial, and logistical challenges stand in the way of these efforts. This is especially true for news applications, or custom-built websites that comprise some of the most sophisticated journalism stories today, such as the “Dollars for Docs” project by ProPublica. Many news applications are standalone pieces of software that query a database, and this significant subset of apps cannot be archived in the same way as text-based news stories, or fully captured by web archiving tools such as Archive-It. As such, they are currently disappearing. This paper will outline the various challenges facing the archiving and preservation of born-digital news applications, as well as outline suggestions for how to approach this important work.