Data cleaning is the act of finding (and correcting) inaccurate data within a given element (such as within records, projects, databases, spreadsheets, etc.). The process of cleaning data can be accomplished in a number of ways, either through scripting or through the use of specific tools (such as OpenRefine).
While this guide will teach you how to navigate and use different elements of OpenRefine, many of the techniques mentioned (i.e. transforming data, dealing with duplicates, adding or removing columns) are important to data cleaning as a practice. Therefore, many of the skills and techniques discussed throughout this tutorial are transferable and can be used in other software and tools outside of OpenRefine.
Cleaning your data ultimately improves your data's quality and enables more accurate analysis.
OpenRefine is available on most Lied Library computers. It is also downloaded on laptops available for check-out at the Lied Circulation Desk.
For information on downloading a personal copy of OpenRefine or additional extensions, please see their Download Page.
This guide is based on the Windows version of OpenRefine 3.3.
Christina Miskey, Scholarly Communication Librarian for Research Infrastructure, contributed to this guide.