-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Replace ESRI geometry library with JTS in geospatial plugin #27881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
0cc596d
Fix invalid WKT in test data for OGC compliance
dain ddfb2f5
Add assertSpatialEquals helper and convert tests
dain d5d43fb
Convert basic geometry functions from ESRI to JTS
dain 998ba29
Convert geometry accessor functions from ESRI to JTS
dain 91f3aa2
Add JTS geometry methods to GeometryUtils
dain 03ac8fd
Convert EncodedPolylineFunctions from ESRI to JTS
dain 1500a12
Convert BingTileFunctions from ESRI to JTS
dain 7778d78
Convert aggregation functions from ESRI to JTS
dain 789711d
Convert ST_Union and remaining GeoFunctions from ESRI to JTS
dain 6767195
Remove unused ESRI methods from GeometryUtils
dain 2d76117
Convert spatial join and envelope handling from ESRI to JTS
dain a1fb7a8
Replace custom geometry serialization with EWKB
dain 5e224f9
Convert Hadoop geometry reader from ESRI to JTS
dain e0f9db1
Use qualified imports for JTS objects
dain 1306f5d
Use JTS Geometry as native stack type
dain File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
Repository: trinodb/trino
Length of output: 127
🏁 Script executed:
Repository: trinodb/trino
Length of output: 125
🏁 Script executed:
Repository: trinodb/trino
Length of output: 129
🏁 Script executed:
Repository: trinodb/trino
Length of output: 11294
🏁 Script executed:
Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
Repository: trinodb/trino
Length of output: 251
🏁 Script executed:
Repository: trinodb/trino
Length of output: 891
🏁 Script executed:
Repository: trinodb/trino
Length of output: 153
🏁 Script executed:
cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | head -150Repository: trinodb/trino
Length of output: 6703
🏁 Script executed:
cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | tail -50Repository: trinodb/trino
Length of output: 1910
🏁 Script executed:
Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
Repository: trinodb/trino
Length of output: 39
🌐 Web query:
JTS Geometry getNumPoints() what does it return vertices coordinate points💡 Result:
In JTS,
Geometry.getNumPoints()returns the count of the geometry’s vertices—i.e., the total number of coordinate positions stored by the geometry (not the number of distinct/unique points). For composite geometries (e.g.,Multi*,GeometryCollection), this is effectively the sum of the vertex counts of the components. [1]Implications:
Point→1(or0if empty).LineString→ number of coordinates in the line.Polygon→ counts vertices in the shell and holes; ring-closure coordinates (where the last coordinate equals the first) are included because they are vertices in the coordinate sequence. [1]Source: JTS
GeometryJavadoc: “Returns the count of this Geometry’s vertices” / “the number of vertices in this Geometry.” [1]🌐 Web query:
LocationTech JTS Geometry memory object graph retained size structure💡 Result:
In JTS (LocationTech) the retained size you see in a heap/object-graph tool for a
Geometryis driven mostly by (a) the geometry topology objects (Polygon → rings, collections → components) and (b) the coordinate storage (CoordinateSequenceimplementation), plus a few always-present base fields.1) Base
Geometryobject graph (common to all)Geometryholds: anEnvelopefield (protected Envelope envelope), aGeometryFactoryreference (protected final GeometryFactory factory), anint SRID, and optionaluserData. [1]So the retained graph typically starts:
Geometry→Envelope(may be null until computed) +GeometryFactory+userData.2) Concrete geometry topology objects
Pointstores exactly oneCoordinateSequencereference:private CoordinateSequence coordinates;[2]Graph:
Point→CoordinateSequence→ underlying coordinate storage.LineStringstoresprotected CoordinateSequence points;[3]Graph:
LineString→CoordinateSequence→ underlying coordinate storage.Polygonstoresprotected LinearRing shellandprotected LinearRing[] holes[4].Graph:
Polygon→shell(LinearRing) → (LineString.points) →CoordinateSequence→ storage, and similarly for each hole.3) Coordinate storage is usually the biggest driver
JTS uses
CoordinateSequenceto hold vertices. Two common implementations have very different object graphs:A)
CoordinateArraySequence(default)Backed by a
Coordinate[] coordinatesarray. [5]Graph cost: one object for the sequence + one Java array + one
Coordinateobject per vertex.B)
PackedCoordinateSequence(memory-lean for many vertices)Backed by a primitive array (
double[] coordsinPackedCoordinateSequence.Double, orfloat[]in.Float). [6]Graph cost: one object for the sequence + one primitive array (much less per-vertex overhead).
Important retained-size detail:
PackedCoordinateSequencealso has a transientSoftReferencecache (coordRef) to a materializedCoordinate[]created bytoCoordinateArray(). [6] If that cache has been populated and not cleared, your geometry’s retained graph can temporarily include the cachedCoordinate[]plus all the per-vertexCoordinateobjects (even though the “real” storage is packed).Sources: Geometry base fields [1], Point coordinate sequence field [2], LineString points field [3], Polygon shell/holes fields [4],
CoordinateArraySequencebacking array [5],PackedCoordinateSequencepacked arrays + soft cache [6].🏁 Script executed:
Repository: trinodb/trino
Length of output: 369
🏁 Script executed:
Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
Repository: trinodb/trino
Length of output: 137
🏁 Script executed:
Repository: trinodb/trino
Length of output: 9260
This retained-size estimate significantly underestimates JTS geometry memory, especially for composite shapes.
getNumPoints() * 24accounts only for raw coordinate payload but misses the object graph for coordinate sequences, linear rings, component geometries, and envelope caching. For a Polygon with holes, this omits multiple LinearRing objects and their ring arrays; for MultiPolygon or GeometryCollection, it ignores all component Geometry objects. Since this feeds operator memory accounting (viaPagesSpatialIndexSupplier.getEstimatedSize()), large multipart geometries will be materially under-reported and can exceed memory limits without triggering spill-over.Either retain the serialized slice size alongside the geometry (available at deserialization in
buildRTree()line 114) or use a deeper retained-size walk of the JTS object graph.🤖 Prompt for AI Agents