Git scraping demo for Hacks/Hackers Brisbane

Last night at Hacks/Hackers Brisbane’s Show & Tell event I demoed one of my favourite web scraping techniques, Simon Willison’s GitHub scraping technique.

My aim to show how easy it is to get started — with the right data source it’ll take you less than 5 minutes to get a scraper working. I have a template repository on GitHub if you want to give it a crack.

For the demo, I showed how to collect air pollution data from the Brisbane City Council website. This is a slightly contrived example, because there are better sources for this data. But you can see the scraper and maybe it’ll provide some inspiration to try the techniqe yourself.

By some coincidence, Josh Nicholas and some Guaridan Australia colleagues have a story out today on Australia’s air pollution hotspots, including a nice interactive map to explore.

data