These past couple weeks I’ve been playing around with the [Datasette] toolkit in an effort generate some metrics and basic reporting around my reading habits. As part of this exercise I am attempting to build a repeatable data processing pipeline that I can use to rebuild my database and reports from scratch each time.
I could use one of the many orchestration tools we use at work to automate this process, but instead I am trying to keep things fairly simple. With that in mind, I have been reacquainting myself with [GNU Make].
Make is an old CLI tool that is used a lot for building C and C++ programs. Make is useful for these sorts of tasks because it can track a dependency graph, and only perform actions when it detects changes in the underlying source files. This means it can watch for changes in the code and only rebuild the sections changed, and the things that depend on those sections. This behavior makes it useful for coordinating just about any multi-step process.
So far I’ve managed to put together a script that can convert my Excel spreadsheet where I track my reading activity in CSV, and the commands needed to import the CSV data into a sqlite3 database file that Datasette will use for the reporting. I like how it makes me think about how best to encapsulate each step of the pipeline into a discrete program with defined inputs and outputs. The last thing I want is a messy Makefile that will be difficult to update when I want to make changes three or four months in the future.
This probably isn’t a scenario the original authors of Make envisioned when it was invented, but I think it’s a really good use of mature, battle-tested tool. I would recommend to anyone to give Make a try next time you need to orchestrate a process.