Skip to Main Content

Data Management Resources for UNLV: Reproducible Research

This guide provides both public and UNLV-specific resources for creating and implementing your data management plan.

Is there a reproducibility crisis? Nature, 2016

"1500 scientists lift the lid on reproducibility," by Monya Baker, Nature (2016)

Computational Tools

Additional Reading

Replication vs. Reproduction

The terms "reproduction" and "replication" have often been used interchangeably when discussing scientific research. However, many argue there is a slight, important distinction. The exact distinction is still frequently debated, as people are more recently investing in reproducible research. According to "Reproducibility vs. Replicability: A Brief History of a Confused Terminology," by Hans E. Plesser and adapted from the Association for Computing Machinery, the terms can be understood as follows (in relation to computational research):

Replication - an independent group can obtain the same result using the author's own artifacts.

Reproduction - an independent group can obtain the same result using artifacts which they develop completely independently.

"The spectrum of reproducibility," Peng, R.D. (2011) Science

From Roger Peng's 2011 paper in Science, the reproducibility spectrum image shows how the concept of reproducibility lies on a spectrum. On the left hand of the spectrum is "Publication only", which is considered "Not reproducible". Along the spectrum are the inclusion of code with the publication, code with data, linked and executable code and data, and finally "Full replication" which is labelled as the Gold standard.

Reproducibility, both computation and otherwise, is a spectrum. Everything in-between "publication only" and "full replication" comes with increasing levels of reproducibility.

Important (Computational) Elements

Organization - the act of ordering something in a specific way. Have you ever been unable to find a file you know you saved somewhere on your computer? This an example of an organizational problem. A short exercise by Woodbridge et al. found that when trying to run a number of code-dependent projects from scratch, they were unable to complete the process due to missing files, data, or dependencies. Researchers should bundle together all files, data, and information relevant to a project and make them easily accessible (i.e. together in a repository).

Documentation - a record, the process of classifying information. In terms of reproducibility, documentation can mean a number of different things. Documentation can occur through "readme"s, a text file that provides information about another file, or through a dockerfile, "a text document that contains all the commands a user could call on the command line to assemble an image (the basis for a project)." Having good documentation increases not only the organization of research but also the transparency of a project and what it entails.

Automation - processes carried out by a machine, accomplished without interference. While not possible in every area of research, increased elements of automation can help eliminate human error and increase the replication of a project involving code or data. Tools such as docker or binder (under Computational Tools) can assist with creating automated projects.

Dissemination - the act of spreading or publishing information. The NIH recommends that data should be published in a public repository and that data in the repository "should be bidirectionally linked to the published article."

- Adapted from "Integrating reproducible best practices into your research," April Clyburne-Sherin, Code Ocean

© University of Nevada Las Vegas