Skip to Main Content
questions, ask us

Getting Started with Data Cleaning and OpenRefine

This guide is meant to introduce readers to the importance of data cleaning through a useful tool for working with "messy" data, OpenRefine.

What is it?

Data cleaning is the act of finding (and correcting) inaccurate data within a given element (such as within records, projects, databases, spreadsheets, etc.). The process of cleaning data can be accomplished in a number of ways, either through scripting or through the use of specific tools (such as OpenRefine).

While this guide will teach you how to navigate and use different elements of OpenRefine, many of the techniques mentioned (i.e. transforming data, dealing with duplicates, adding or removing columns) are important to data cleaning as a practice. Therefore, many of the skills and techniques discussed throughout this tutorial are transferable and can be used in other software and tools outside of OpenRefine.

Why clean your data?

Cleaning your data ultimately improves your data's quality and enables more accurate analysis.

OpenRefine on Campus

OpenRefine is available on most Lied Library computers. It is also downloaded on laptops available for check-out at the Lied Circulation Desk.

For information on downloading a personal copy of OpenRefine or additional extensions, please see their Download Page.

This Guide

This guide is based on the Windows version of OpenRefine 3.3.

Christina Miskey, Scholarly Communication Librarian for Research Infrastructure, contributed to this guide.

© University of Nevada Las Vegas