spatial.data.table 2 - Encoding Spatial Data

In part 1 of this series I spoke about re-writing a lot of geosphere functions to make them quicker and fit more naturally into data.table syntax, and the birth of spatialdatatable.

My next objective with the package was to decrease the amount of memory used for storing spatial objects. The sf package has brought the concept of Well-Known Text (WKT) into R, which is a fantastic way to display spatial information.

As an example, from the sf package we can look at their North Carolina data

library(sf)
filename <- system.file("shape/nc.shp", package="sf")
nc <- st_read(filename)

head(nc[, 15])
Simple feature collection with 6 features and 0 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
                        geometry
1 MULTIPOLYGON(((-81.47275543...
2 MULTIPOLYGON(((-81.23989105...
3 MULTIPOLYGON(((-80.45634460...
4 MULTIPOLYGON(((-76.00897216...
5 MULTIPOLYGON(((-77.21766662...
6 MULTIPOLYGON(((-76.74506378...

Here we can easily see the objects are Multi-Polygons.

One thing I learned from developing googleway is that Google's Polyline Encoding Algorithm can be used to compress the size of a polyline. And, since a polygon is made up from one or many polylines, I set about writing an EncodeSF function that takes an sf object as input, and encodes the spatial objects into encoded polylines.

sdt <- EncodeSF(nc)
sdt[, .(.id, polyline)]
     .id                polyline
  1:   1 u_d|EtsgpNmmFphLyEbc...
  2:   2 or}|EhdznNyvA~Ce_Dli...
  3:   3 }re|EbcajNakAf|BqKby...
  4:   4 mtt|E`o|nMkpBhs@~I``...
  5:   4 m~b~Ev``oMJcqDfwAe}O...
 ---                            
104:  96 kv|pEhmt|Mv|DvdLkm@`...
105:  97 wyopE~lf{M_|E~jSvJjx...
106:  98 erenEdma_NkodAlxpA_{...
107:  99 wqtoEpuyzMojI`_@wEzy...
108: 100 erenEdma_NcvDgbCmhEw...

str(sdt)
Classes ‘spatial.data.table’, ‘data.table’ and 'data.frame':    108 obs. of  18 variables:
 $ ...
  $ polyline : atomic  u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_Aji| __truncated__ ...
  ..- attr(*, "sdt_polyline")= chr "polyline"

Here, the object sdt is of class spatial.data.table, with an sdt_polyline attribute on the encoded polyline column.

As this object is an extension of a data.table all the usual operations on a data.table should still work (I haven't tested them all!).

So back to my original goal of decreasing the size of the object, we can see that in this case encoding the polylines has reduced it quite significantly.

sapply(mget(c("nc", "sdt")), function(x) format(object.size(x), units = "b"))
            nc            sdt 
"135168 bytes"  "50032 bytes"

And in practical terms, this is beneficial when plotting poylgons in a shiny dashboard using googleway