Skip to article frontmatterSkip to article content

Welcome to “Getting Started with Python in OpenRefine”!

Who is this for?¶

This course is aimed at people who work with large, messy tabular datasets and have found OpenRefine to be a useful tool in wrangling them. You won’t need to be an OpenRefine wizard, but you will need some familiarity with it otherwise little of the material will make sense. In particular, you will find it useful if:

  • You have a dataset, which you can almost get into the shape you want, but can’t quite figure out the GREL incantation to get there; I find Python to be a beter-structured language for breaking problems like that down into solvable parts
  • You want to take your data wrangling and analysis to the next level and you’ve heard that one tool for that is Python, but taking the leap to installing, configuring and using a whole programming language is somewhat daunting; you’re not alone, but you can start learning Python right there in OpenRefine

It is particularly aimed at those working in Galleries, Libraries, Archives & Museums --- the so-called GLAM sector --- and many of the examples we will use draw on that background. However, you may find it useful even if the examples aren’t directly relevant to you, and indeed I hope that’s the case.

On a practical note, you will need a working copy of OpenRefine to follow along, and the material will assume that you do.

What will I learn?¶

We hope that by the end of this book, you will be able to:

  • Transform & generate data in OpenRefine columns using Python
  • Use elements of the Python standard Library in OpenRefine expressions
  • Explain key differences between GREL & Python
  • Understand how Python OpenRefine expressions relate to a Python script

What will I not learn?¶

These tasks will not be covered in this book, although you will likely find some of them useful follow-up:

  • Installing additional Python modules for use in OpenRefine
  • Creating & accessing your own Python library
  • Translating an OpenRefine process to a Python