Skip to Main Content
questions, ask us

Getting Started with Data Cleaning and OpenRefine

This guide is meant to introduce readers to the importance of data cleaning through a useful tool for working with "messy" data, OpenRefine.


By using text filtering, you can narrow down a large dataset to look at specific or similar entries. To do this, you would implement a text filter.

Let's say you want to work with all Facility cells that mention "Vegas" specifically. Navigating to the drop-down beside FacilityName, you will select "Text Filter." After selecting this, a text box will appear in the Facet/Filter box on the left-hand side of the OpenRefine interface.




When typing in this text box, you can indicate whether you want the results shown to be case-sensitive (entries coded as "vegas" will not appear when searching "Vegas"). In addition, you can search using a regular expression. Regular expressions are sequences of characters (including symbols) that allow you to search longer text and documents for specific patterns.

By searching "Vegas" in the text box, the data represented in the center of the interface will change to display only cells in FacilityName that contain the word "Vegas."



You can then facet your data to see which unique facility names are represented in this subset.

© University of Nevada Las Vegas