Writing a Geohash Wrapper using CFFI
Jun 4, 2019
Aldrin Navarro
5 minute read

alt Geohash explorer

While exploring awesome lists scattered around Github, I stumbled upon Awesome C. The list is overwhelming. One library that caught my attention is libgeohash since it’s my first time to hear of such.

Apparently there is another way to express a location aside from latitude and longitude coordinates. Geohashing is a technique that represents a location using alphanumeric strings. The longer the string, the more precise the representation is to the actual location. This representation could also identify it’s neighboring squares from the north, east, south, and west.

Geohashes are commonly useful for URL and electronic storage.

Paris ~= u09tvw0f6szye
Tokyo ~= xn774c06kdtd9
Ottawa ~= f244mkwzxdcuk

Clicking on the geohashes above will bring you to geohash.org which will reveal the location of these cities from the geohash code itself!

CASE. I want to use libgeohash in Python but not necessarily make a strict implementation. So I decided to build a wrapper around it and expose only the common usages such as encoding from latlong coordinates to a geohash, and decoding it back. For this I will be using C Foreign Function Interface for Python

What is CFFI?

Borrowing from Lisp description, CFFI stands for Common Foreign Function Interface. “Foreign function” means taking a function written in another programming language along with it’s data and calling conventions. Python CFFI follows the same principle and outlines the specific goals here

In our case the primary goal is to compile libgeohash as a shared object (.so), load it using cffi in Python, and build an interface for it.

Building the interface

Using a shared object in Python

First, we will need a shared object of libgeohash. Download the source of libgeohash here

gcc -shared -rdynamic -fPIC geohash.c -o geohash.so

-shared
Produce a shared object which can then be linked with other objects to form an executable.

-rdynamic
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of dlopen or to allow obtaining backtraces from within a program.

-fPIC
The generated machine code is not dependent on being located at a specific address in order to work

For more information on these options see GCC Options for Linking.

Then we can load it in Python like this.

# geohash_build.py
from cffi import FFI
ffi = FFI()
geohash_lib = ffi.dlopen('path/to/geohash.so')

Now we can define our interface.

A simplistic anatomy of the builder

The builder follows the structure:

Definition
All declarations including C types, functions, globals needed to use the shared object. A valid C syntax is required.

Source
Give the interface (the python extension module) a name, a valid C source code. #include statements belong here.

Compilation
Finally, let’s run the build!

# geohash_build.py
from cffi import FFI
ffi = FFI()
# DEFINITION
# For reference, the definition is stripped from libgeohash.h where the data structures are defined
ffi.cdef(
    """
    // Metric in meters
    typedef struct GeoBoxDimensionStruct {
        
        double height;
        double width;
    
    } GeoBoxDimension;
    
    typedef struct GeoCoordStruct {
        
        double latitude;
        double longitude;
        
        double north;
        double east;
        double south;
        double west;
    
        GeoBoxDimension dimension;
        
    } GeoCoord;


    char* geohash_encode(double lat, double lng, int precision);
    GeoCoord geohash_decode(char* hash);
"""
)
# SOURCE
# Since the .so is ready, we can simply put None as the valid C source
ffi.set_source("_geohash", None)
# BUILD
if __name__ == "__main__":
    ffi.compile(verbose=True)

We only need to build this once and if the build is successful a _geohash.c is produced and invoked in the compiler. At runtime, we should be able to do

from _geohash import ffi, lib
print(lib.geohash_encode(35.689487, 139.691706, 6))

Completing the wrapper

Notice that in the cdef section we exposed the functions geohash_encode and geohash_decode. In our interface we could do something like

# geohash.py
from _geohash import ffi, lib

def geohash_encode(latitude: float, longitude: float, precision: int = -1):
    """Takes in latitude and longitude with a desired precision and returns the correct hash value.
    If precision < 0 or precision > 20, a default value of 6 will be used."""
    return ffi.string(lib.geohash_encode(latitude, longitude, precision)).decode()


def geohash_decode(hash_code: str):
    """Returns GeoCoord structure which contains the latitude and longitude
    that was decoded from the geohash. A GeoCoord also provides the bounding box for the
    geohash (north, east, south, west, dimension.height, dimension.width)."""
    return lib.geohash_decode(hash_code.encode())

A few points to take note from here:

  • Use ffi.string to interpret the result into a null-terminated string. Notice the C signature char * in our cdef section.
  • What we get from ffi e.g. <cdata 'char *' 0xdc8770> are objects of type cdata. A pointer object on the allocated memory. These pointers, structures, and arrays do not have an obvious mapping to native types. That’s why in this case it’s appropriate for us to use ffi.string to be explicit.
  • The result of geohash_decode is a GeoCoord object e.g. <cdata 'GeoCoord' owning 64 bytes>. Remember this struct from our cdef section?
  • Member access operators obj.x or obj->x normally works as obj.x in Python. So the following works:

    location = geohash.geohash_decode('xn774c')
    location.latitude, location.longitude
    # >> (35.69183349609375, 139.6966552734375) 
    

Testing

To see if our wrapper works as expected, we can create a simple test.

import pytest
import geohash

geohash_map = (
    ((35.689487, 139.691706), "xn774c"),
    ((35.179554, 129.075642), "wy7b1h"),
    ((48.856614, 2.352222), "u09tvw"),
)


@pytest.mark.parametrize("coordinates, expected", geohash_map)
def test_geohash_encode(coordinates, expected):
    lat, long = coordinates
    # precision >= 0 then precision = 12
    assert geohash.geohash_encode(lat, long, -1) == expected
    assert geohash.geohash_encode(lat, long) == expected


@pytest.mark.parametrize("expected, hash_code", geohash_map)
def test_geohash_decode(hash_code, expected):
    location = geohash.geohash_decode(hash_code)
    assert all(
        [
            location.longitude,
            location.latitude,
            location.north,
            location.east,
            location.south,
            location.west,
            location.dimension,
        ]
    )

The test leverages the use of pytest.mark.parametrize to cover a series of input and expected results without the use of excessive loops.

To execute the tests, simply run py.test

Wrapping Up

The complete source code is available at https://github.com/aldnav/geohash And available at PyPi so you can use it directly for your projects! Documentation? 🍰

pip install geohashcx

If you spot issues feel free to submit an issue or better yet a pull request! 👋

References / Further readings


🐋 hello there! If you enjoy this, a "Thank you" is enough.

Or you can also ...

Buy me a teaBuy me a tea

comments powered by Disqus