Reproducible Data Analysis in Jupyter

Jupyter notebooks provide a useful environment for interactive exploration of data. A common question I get, though, is how you can progress from this nonlinear, interactive, trial-and-error style of exploration to a more linear and reproducible analysis based on organized, packaged, and tested code. This series of videos presents a case study in how I personally approach reproducible data analysis within the Jupyter notebook.

HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

This work is a detailed companion reproducibility paper of the methods and experiments proposed in three previous works by Lastra-Díaz and García-Serrano, which introduce a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the aforementioned works. This work introduces a new representation model for taxonomies called PosetHERep, and a Java software library called Half-Edge Semantic Measures Library (HESML) based on it, which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature.

Reproducibility Data: The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model

Unlike most other SAPA datasets available on Dataverse, these data are specifically tied to the reproducible manuscript entitled "The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model." Most of these files are images that should be downloaded and organized in the same location as the source .Rnw file. A few files contain data that have already been processed (and could be independently re-created using code in the .Rnw file) - these are included to shorten the processing time needed to reproduce the original document. The raw data files for most of the analyses are stored in 3 separate locations, 1 for each of the 3 samples. These are: Exploratory sample - doi:10.7910/DVN/SD7SVE Replication sample - doi:10.7910/DVN/3LFNJZ Confirmatory sample - doi:10.7910/DVN/I8I3D3 . If you have any questions about reproducing the file, please first consult the instructions in the Preface of the PDF version. Note that the .Rnw version of the file includes many annotations that are not visible in the PDF version (https://sapa-project.org/research/SPI/SPIdevelopment.pdf) and which may also be useful. If you still have questions, feel free to email me directly. Note that it is unlikely that I will be able to help with technical issues that do not relate of R, Knitr, Sweave, and LaTeX.

Reproducibility of biomedical research – The importance of editorial vigilance

Many journal editors are a failing to implement their own authors’ instructions, resulting in the publication of many articles that do not meet basic standards of transparency, employ unsuitable data analysis methods and report overly optimistic conclusions. This problem is particularly acute where quantitative measurements are made and results in the publication of papers that lack scientific rigor and contributes to the concerns with regard to the reproducibility of biomedical research. This hampers research areas such as biomarker identification, as reproducing all but the most striking changes is challenging and translation to patient care rare.

Most scientists 'can't replicate studies by their peers'

Science is facing a "reproducibility crisis" where more than two-thirds of researchers have tried and failed to reproduce another scientist's experiments, research suggests. This is frustrating clinicians and drug developers who want solid foundations of pre-clinical research to build upon. From his lab at the University of Virginia's Centre for Open Science, immunologist Dr Tim Errington runs The Reproducibility Project, which attempted to repeat the findings reported in five landmark cancer studies.

When Evidence Says No, but Doctors Say Yes

According to Vinay Prasad, an oncologist and one of the authors of the Mayo Clinic Proceedings paper, medicine is quick to adopt practices based on shaky evidence but slow to drop them once they’ve been blown up by solid proof. As a young doctor, Prasad had an experience that left him determined to banish ineffective procedures. He was the medical resident on a team caring for a middle-aged woman with stable chest pain. She underwent a stent procedure and suffered a stroke, resulting in brain damage. Prasad, now at the Oregon Health and Sciences University, still winces slightly when he talks about it. University of Chicago professor and physician Adam Cifu had a similar experience. Cifu had spent several years convincing newly postmenopausal patients to go on hormone therapy for heart health—a treatment that at the millennium accounted for 90 million annual prescriptions—only to then see a well-designed trial show no heart benefit and perhaps even a risk of harm. "I had to basically run back all those decisions with women," he says. "And, boy, that really sticks with you, when you have patients saying, 'But I thought you said this was the right thing.'" So he and Prasad coauthored a 2015 book, Ending Medical Reversal, a call to raise the evidence bar for adopting new medical standards. "We have a culture where we reward discovery; we don’t reward replication," Prasad says, referring to the process of retesting initial scientific findings to make sure they’re valid.