spatial.data.table 2 - Encoding Spatial Data
In part 1 of this series I spoke about re-writing a lot of geosphere
functions to make them quicker and fit more naturally into data.table
syntax, and the birth of spatialdatatable
.
My next objective with the package was to decrease the amount of memory used for storing spatial objects. The sf
package has brought the concept of Well-Known Text (WKT) into R, which is a fantastic way to display spatial information.
As an example, from the sf
package we can look at their North Carolina data
library(sf)
filename <- system.file("shape/nc.shp", package="sf")
nc <- st_read(filename)
head(nc[, 15])
Simple feature collection with 6 features and 0 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
geometry
1 MULTIPOLYGON(((-81.47275543...
2 MULTIPOLYGON(((-81.23989105...
3 MULTIPOLYGON(((-80.45634460...
4 MULTIPOLYGON(((-76.00897216...
5 MULTIPOLYGON(((-77.21766662...
6 MULTIPOLYGON(((-76.74506378...
Here we can easily see the objects are Multi-Polygons.
One thing I learned from developing googleway
is that Google's Polyline Encoding Algorithm can be used to compress the size of a polyline. And, since a polygon is made up from one or many polylines, I set about writing an EncodeSF
function that takes an sf
object as input, and encodes the spatial objects into encoded polylines.
sdt <- EncodeSF(nc)
sdt[, .(.id, polyline)]
.id polyline
1: 1 u_d|EtsgpNmmFphLyEbc...
2: 2 or}|EhdznNyvA~Ce_Dli...
3: 3 }re|EbcajNakAf|BqKby...
4: 4 mtt|E`o|nMkpBhs@~I``...
5: 4 m~b~Ev``oMJcqDfwAe}O...
---
104: 96 kv|pEhmt|Mv|DvdLkm@`...
105: 97 wyopE~lf{M_|E~jSvJjx...
106: 98 erenEdma_NkodAlxpA_{...
107: 99 wqtoEpuyzMojI`_@wEzy...
108: 100 erenEdma_NcvDgbCmhEw...
str(sdt)
Classes ‘spatial.data.table’, ‘data.table’ and 'data.frame': 108 obs. of 18 variables:
$ ...
$ polyline : atomic u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_Aji| __truncated__ ...
..- attr(*, "sdt_polyline")= chr "polyline"
Here, the object sdt
is of class spatial.data.table
, with an sdt_polyline
attribute on the encoded polyline column.
As this object is an extension of a data.table
all the usual operations on a data.table
should still work (I haven't tested them all!).
So back to my original goal of decreasing the size of the object, we can see that in this case encoding the polylines has reduced it quite significantly.
sapply(mget(c("nc", "sdt")), function(x) format(object.size(x), units = "b"))
nc sdt
"135168 bytes" "50032 bytes"
And in practical terms, this is beneficial when plotting poylgons in a shiny dashboard using googleway