A step-by-step lesson (with activities included) that will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make.
These three screencasts give a great overview of the product in just a few minutes. NOTE: These were done by Google when the product was named Google Refine, but they still provide a great introduction and not much has changed except the name.
Building on essential data wrangling skills as described in "Cleaning Data with OpenRefine" (another Programming Historian lesson), this lesson focuses on OpenRefine’s ability to fetch URLs and parse web content.
Programmatically Cleaning Data
OpenRefine is just one tool that can be used to clean your data. Other common methods for cleaning messy data include using Python scripting as well as the pandas library.