Skip to content

Use JTS for ST_Buffer#13824

Merged
rschlussel merged 3 commits intoprestodb:masterfrom
jagill:jts-buffer
Dec 18, 2019
Merged

Use JTS for ST_Buffer#13824
rschlussel merged 3 commits intoprestodb:masterfrom
jagill:jts-buffer

Conversation

@jagill
Copy link
Contributor

@jagill jagill commented Dec 9, 2019

Convert ST_Buffer implementation to use JTS. JTS is generally more efficient (see benchmarks below). Also, ESRI returns an empty geometry when you buffer by small distances (< 1e-9);
JTS handles these correctly.

JTS
Benchmark Mode Cnt Score Error Units
BenchmarkSTBuffer.stBufferPoint avgt 20 14.221 2.408 us/op
BenchmarkSTBuffer.stBufferMultiPointSparse avgt 20 810.414 43.980 us/op
BenchmarkSTBuffer.stBufferMultiPointDense avgt 20 18539.600 998.499 us/op
BenchmarkSTBuffer.stBufferMultiPointReallyDense avgt 20 1650058.575 71342.838 us/op
BenchmarkSTBuffer.stBufferLineStringCircle avgt 20 562.464 26.039 us/op
BenchmarkSTBuffer.stBufferLineStringDense avgt 20 316479.586 29001.318 us/op
BenchmarkSTBuffer.stBufferPolygonSimple avgt 20 11.892 0.711 us/op
BenchmarkSTBuffer.stBufferPolygonNormal avgt 20 46390.604 2958.109 us/op
BenchmarkSTBuffer.stBufferPolygonDense avgt 20 72822.143 6841.961 us/op

Esri
Benchmark Mode Cnt Score Error Units
BenchmarkSTBuffer.stBufferPoint avgt 20 8.290 0.541 us/op
BenchmarkSTBuffer.stBufferMultiPointSparse avgt 20 16521.536 818.169 us/op
BenchmarkSTBuffer.stBufferMultiPointDense avgt 20 450394.639 94490.620 us/op
BenchmarkSTBuffer.stBufferMultiPointReallyDense avgt 20 3856960.148 300968.259 us/op
BenchmarkSTBuffer.stBufferLineStringCircle avgt 20 13470.522 1308.101 us/op
BenchmarkSTBuffer.stBufferLineStringDense avgt 20 440081.641 37944.614 us/op
BenchmarkSTBuffer.stBufferPolygonSimple avgt 20 21.796 0.871 us/op
BenchmarkSTBuffer.stBufferPolygonNormal avgt 20 16648.597 1348.367 us/op
BenchmarkSTBuffer.stBufferPolygonDense avgt 20 1159728.778 27607.808 us/op

== RELEASE NOTES ==

General Changes
* Use more efficient implementation for ST_Buffer.  This produces fewer buffer points on rounded corners, which will produce very similar but different results.  JTS also better handles buffering with small (<1e-9) distances.

@jagill
Copy link
Contributor Author

jagill commented Dec 9, 2019

cc @rschlussel @mbasmanova

@rschlussel
Copy link
Contributor

A few questions (disclaimer: i don't really know what ST_BUFFER does)

  1. why were we using the other library before?
  2. Are there going to be consistency problems because the other geometry functions still use OGCGeometry and EsriGeometrySerde?
  3. how did you test that it was more efficient?

Extract and expand functions that create geometries for benchmarks.
JTS is generally more efficient (see benchmarks below).  Also, ESRI
returns an empty geometry when you buffer by small distances (< 1e-9);
JTS handles these correctly.

JTS
Benchmark                                        Mode  Cnt        Score       Error  Units
BenchmarkSTBuffer.stBufferPoint                  avgt   20       14.221       2.408  us/op
BenchmarkSTBuffer.stBufferMultiPointSparse       avgt   20      810.414      43.980  us/op
BenchmarkSTBuffer.stBufferMultiPointDense        avgt   20    18539.600     998.499  us/op
BenchmarkSTBuffer.stBufferMultiPointReallyDense  avgt   20  1650058.575   71342.838  us/op
BenchmarkSTBuffer.stBufferLineStringCircle       avgt   20      562.464      26.039  us/op
BenchmarkSTBuffer.stBufferLineStringDense        avgt   20   316479.586   29001.318  us/op
BenchmarkSTBuffer.stBufferPolygonSimple          avgt   20       11.892       0.711  us/op
BenchmarkSTBuffer.stBufferPolygonNormal          avgt   20    46390.604    2958.109  us/op
BenchmarkSTBuffer.stBufferPolygonDense           avgt   20    72822.143    6841.961  us/op

Esri
Benchmark                                        Mode  Cnt        Score        Error  Units
BenchmarkSTBuffer.stBufferPoint                  avgt   20        8.290        0.541  us/op
BenchmarkSTBuffer.stBufferMultiPointSparse       avgt   20    16521.536      818.169  us/op
BenchmarkSTBuffer.stBufferMultiPointDense        avgt   20   450394.639    94490.620  us/op
BenchmarkSTBuffer.stBufferMultiPointReallyDense  avgt   20  3856960.148   300968.259  us/op
BenchmarkSTBuffer.stBufferLineStringCircle       avgt   20    13470.522     1308.101  us/op
BenchmarkSTBuffer.stBufferLineStringDense        avgt   20   440081.641    37944.614  us/op
BenchmarkSTBuffer.stBufferPolygonSimple          avgt   20       21.796        0.871  us/op
BenchmarkSTBuffer.stBufferPolygonNormal          avgt   20    16648.597     1348.367  us/op
BenchmarkSTBuffer.stBufferPolygonDense           avgt   20  1159728.778    27607.808  us/op
@jagill
Copy link
Contributor Author

jagill commented Dec 10, 2019

To buffer a geometry g with a distance d is to create a geometry of all points that are within d of some point in g. So buffering a point should make a disk of radius d centered at the point. Conceptually but not computationally, buffering a geometry is the union of all those disks. This is also known as a dilation operation.

  1. We used Esri for historical reasons in the development of Presto geospatial. One blocker to switching to JTS is simple bandwidth, which I'm providing :).
  2. All geometry inputs and outputs are encoded in slices, which have a library-neutral representation. So using JTS for one function, and ESRI for another, shouldn't be a problem. We have tests for those cases we think could cause a problem, but I'm always happy to write more for anything we're worried about :).
  3. Great point. I've added a benchmark suite and update the description. ESRI is more performant in certain cases, but the majority of cases (and all expensive cases), JTS is cheaper. I'm including some bar graphs below; this are generated from the benchmark results included above.

BufferBenchmarks

BufferBenchmarksReduced

@rschlussel
Copy link
Contributor

Nice!

As far as consistency, I guess my main concern would be if there were something like an inverse function where consistency between the functions would matter.
E.g say there was a function st_geometry_this_is_a_buffer_around where you'd expect st_geometry_this_is_a_buffer_around(st_buffer(my_geometry, d), d) to return my_geometry. In that case I think we'd want to make sure those were using the same library so there wouldn't be any corner cases where you would get a different result. Is there anything like that to worry about?

Other than that seems like a good change.

@jagill
Copy link
Contributor Author

jagill commented Dec 11, 2019

There's no natural inverse to buffer: the geometry is not directly recoverable. Some libraries allow buffering with a negative distance, but we forbid this. (Even if we allowed it, it's not actually an inverse operation. Consider a square with a small hole in the center: the hole would get filled in by the buffering, but would not be excavated with the negative buffering.)

@rschlussel rschlussel merged commit 8abba9e into prestodb:master Dec 18, 2019
@jagill jagill deleted the jts-buffer branch December 19, 2019 18:32
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants