Skip to Main Content
UNLV Logo
questions, ask us

Getting Started with Data Cleaning and OpenRefine

This guide is meant to introduce readers to the importance of data cleaning through a useful tool for working with "messy" data, OpenRefine.

Changing Header Names

In a way similar to removing columns entirely (below), you are also able to edit the names of column headers. For instance, in the dataset being used in this tutorial, the first column header includes several symbols that perhaps we may not want in our final dataset.

To edit the column name, navigate to the dropdown arrow beside the name of the header you would like to change. From the dropdown, select "edit column," and then "rename this column." A popup box will appear, allowing you to enter a new column name.

Removing Columns

It is during this transformation stage that you would also remove any columns that might be unnecessary for the cleaned dataset you will be creating. For instance, with the dataset being used in this demonstration (below), let's say we do not want the column titled "ObjectId" in our final dataset.

To remove that column, you would navigate to the dropdown arrow beside the "ObjectId" header. In that dropdown, you would select "edit column," and then "remove this column."

Collapsing Columns

Collapsing a column in OpenRefine is similar to Hiding a column in an Excel or CSV file. For instance, let's say we want to collapse the Facility column in our dataset. 

undefined

To collapse that column, you would navigate to the dropdown arrow beside the "Facility" header. In that dropdown, you would select "View," and then "Collapse this Column."

undefined

© University of Nevada Las Vegas