Welcome to âGetting Started with Python in OpenRefineâ!
Who is this for?¶
This course is aimed at people who work with large, messy tabular datasets and have found OpenRefine to be a useful tool in wrangling them. You wonât need to be an OpenRefine wizard, but you will need some familiarity with it otherwise little of the material will make sense. In particular, you will find it useful if:
- You have a dataset, which you can almost get into the shape you want, but canât quite figure out the GREL incantation to get there; I find Python to be a beter-structured language for breaking problems like that down into solvable parts
- You want to take your data wrangling and analysis to the next level and youâve heard that one tool for that is Python, but taking the leap to installing, configuring and using a whole programming language is somewhat daunting; youâre not alone, but you can start learning Python right there in OpenRefine
It is particularly aimed at those working in Galleries, Libraries, Archives & Museums --- the so-called GLAM sector --- and many of the examples we will use draw on that background. However, you may find it useful even if the examples arenât directly relevant to you, and indeed I hope thatâs the case.
On a practical note, you will need a working copy of OpenRefine to follow along, and the material will assume that you do.
What will I learn?¶
We hope that by the end of this book, you will be able to:
- Transform & generate data in OpenRefine columns using Python
- Use elements of the Python standard Library in OpenRefine expressions
- Explain key differences between GREL & Python
- Understand how Python OpenRefine expressions relate to a Python script
What will I not learn?¶
These tasks will not be covered in this book, although you will likely find some of them useful follow-up:
- Installing additional Python modules for use in OpenRefine
- Creating & accessing your own Python library
- Translating an OpenRefine process to a Python