Reverse geocoding without server-side processing
For those unfamiliar with the art, geocoding is the process of taking a text based description of a location (say, an address or the name of a city) and turning it into geographic coordinates.
For example, converting the name of my city,
Reverse geocoding is the opposite of that — taking geographic coordinates and returning the name of a place or region.
While it’s a solved problem, doing these processes at scale can be an expensive proposition — especially the forward geocoding. Either you need big servers or you have to pay a service provider like Mapbox to do it for you. So, for that reason it’s an interesting and worthwhile challenge to see if you can get reasonable results doing it client side.
I’m going to leave the forward version for another day and look at reverse geocoding.
The most common need that we have at work for reverse geocoding is to let a reader place themselves in a geographic region so they can see a version of the story that’s most relevant to them. For example, we can show you more relevant election results if we know where you are. Or we can do useful service journalism, like helping you find out how far you’re allowed travel during a Covid lockdown.
And for a variety of reasons, we need to be able to do this client-side (that is, in the browser, without running code on a server).
Being able to do this client side also has worthwhile privacy advantages.
Why is client-side reverse geocoding a challenge?
While not as technically challenging as forward geocoding, except for fairly simple cases, it’s not all that easy to do without doing some processing on the server.
Typically, reverse geocoding will iterate over a bunch of regions (for example, electorates or municipal boundaries) and use a point-in-polygon calculation to determine which geographic region the point of interest is inside.
There are more sophisticated ways to do it than simple iteration — maybe there’s an index that cuts down the number of polygons that need to be considered — but this is the basic premise.
The problems with doing this purely in front-end code arise mostly from the size of the dataset. Any dataset that encodes geographic regions to a reasonably high resolution gets big quickly. For example, the the Shapefile for the 2021 Australian Commonwealth electoral boundaries is more than 70MB — not something you’d want to send down to the browser.
So how do we do it?
It’s a binary format, which makes it pretty space efficient for storing large geographic datasets. But the interesting thing about it is that it has a spatial index at the start.
- MB: Magic bytes (0x6667620366676201)
- H: Header (variable size flatbuffer)
- I (optional): Static packed Hilbert R-tree index (static size custom buffer)
- DATA: Features (variable size flatbuffers)
This file structure makes it possible to construct HTTP range requests to fetch only the regions of the file that have the information we need (with a bit of overhead for the header and relevant parts of the index)
This is a static file, so it can be cached on a CDN with a very long TTL.
And now, with just a few HTTP requests totalling somewhere between, say 20kb and 100kb, we could find, accurately, what ASGS SA1 region some given coordinates are in.
We can do that without ever running code on a server and (crucially) without having to download all the SA1 regions. That flatgeobuf file that lives on the CDN and can be cached everywhere is 170MB so it’s good every visitor doesn’t have to download it.
So we now have a reverse geocoder that can put coordinates in a bunch of different types of regions.
Now this particular example obviously only works for Australia because that’s the regions we’ve encoded in the flatgeobuf file, but the idea scales fairly well and could be used to encode all kinds of regions in any part of the world.
You can see the code and a bit more about how to prepare the flatgeobuf files yourself at GitHub or have a play with the result.
Taking it one step further
Once I had this working, curiosity got the better of me and I wanted to know if I could do address level reverse geocoding with the help of Australia’s official address database, the GNAF.
The GNAF is huge because it contains literally every address in Australia. So the idea that you could do address level reverse geocoding without running any code on a server seemed unlikely.
Encoding the entire GNAF as a flatgeobuf produces a big file — 7.24GB.
But assuming you can find somewhere on the web (that supports HTTP range requests) to stick this file, we can leverage the way flatgeobuf files work in just the same way as we did to find regions.
This time though we’re looking through a file of points, so instead of a point-in-polygon calculation, where we look for any features which contain our point of interest, we need to make our point of interest into an area, and look for any points that fall inside that.
Which is why the library has a
First make the area fairly small and increase it incrementally until you get at least a single point returned. Then, with the list of returned points, find the one closest to the actual point of interest.
So was it successful? You be the judge (if you’re in Australia). See how close it gets to your actual address on this demo.
I’d love to hear how close it gets if you give it a try.