Replace ESRI geometry library with JTS in geospatial plugin#27881
Replace ESRI geometry library with JTS in geospatial plugin#27881
Conversation
b4e8c12 to
aad9700
Compare
b3494aa to
9bec6d6
Compare
9bec6d6 to
45cd417
Compare
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
|
Fixes #10179 |
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
|
Slight merge conflict with #28592 😅 |
There was a problem hiding this comment.
Pull request overview
This PR migrates Trino’s geospatial stack from the ESRI geometry library to JTS, standardizing on JTS Geometry in the execution stack and EWKB for storage/connector interchange (aligning with PostGIS conventions and enabling future Iceberg geometry support).
Changes:
- Remove ESRI dependencies and switch core geospatial types/functions/aggregations to JTS
Geometry. - Replace legacy geometry serialization with EWKB via
JtsGeometrySerdeand update connectors (e.g., PostgreSQL) accordingly. - Update tests/benchmarks for stricter OGC validity and for geometry equality comparisons that tolerate harmless differences (e.g., vertex ordering).
Reviewed changes
Copilot reviewed 55 out of 57 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Drops ESRI geometry dependency from root build. |
| plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java | Switch PostgreSQL geometry mapping to JTS + EWKB serde. |
| plugin/trino-postgresql/pom.xml | Swap to trino-geospatial-toolkit, add JTS, keep trino-geospatial for tests. |
| plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java | Reworks ESRI Hive test to validate WKB→WKT via JTS. |
| plugin/trino-hive/pom.xml | Adds dependency-checker ignores for transitive jts-core used in tests. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.java | Updates union aggregation assertions and adds SRID mismatch coverage. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java | Updates state serializer tests from OGCGeometry to JTS. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java | Updates state factory tests from OGCGeometry to JTS. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.java | Switches aggregation test inputs/equals logic to JTS serde + topo equality. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java | Updates spherical geography tests for JTS serde + spatial equality helper. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java | Updates spatial partitioning internal agg test to JTS geometry/envelopes. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.java | Updates spatial join operator test constants to JTS Geometry. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java | Switches assertions to spatial equality helper. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java | Updates encoded polyline tests for stricter validity expectations. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java | Updates Bing tile tests to use spatial equality helper where needed. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java | Adds shared spatial equality test helpers. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java | Updates benchmark stack types to JTS Geometry. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java | Updates benchmark stack types to JTS Geometry. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java | Updates benchmark to serialize JTS output back to slices. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java | Updates benchmark to JTS geometry + EWKB serde. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java | Updates benchmark stack types and polygon closure. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java | Updates benchmark stack types to JTS Geometry. |
| plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java | Updates benchmark to JTS + topo equality for verification. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java | Switches aggregation implementation to JTS + SRID validation + safe union. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java | Switches state serde to JtsGeometrySerde. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java | Switches state storage to JTS Geometry and updates size estimation. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java | Changes state interface from OGCGeometry to JTS Geometry. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java | Switches convex hull aggregation to JTS + SRID validation + safe union. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java | Converts to new AbstractGeometryType base with JTS stack type. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java | Updates envelope size estimation to Rectangle-based approach. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java | Switches internal agg input from Slice envelope to JTS geometry envelope. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java | Updates signature to take JTS Geometry (still unsupported entrypoint). |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java | Converts to new AbstractGeometryType base with JTS stack type. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java | Ports encoded polyline funcs to JTS geometry types. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java | Ports Bing tile funcs to JTS geometry/envelope handling. |
| plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java | New base type: JTS stack type, EWKB storage, binary operators. |
| plugin/trino-geospatial/pom.xml | Drops ESRI geometry dependency from geospatial plugin. |
| lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.java | Updates ESRI JSON deserializer tests to validate EWKB via JTS. |
| lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java | Adds JTS-based ESRI JSON geometry parsing. |
| lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java | Switches ESRI JSON reader output to EWKB via JtsGeometrySerde. |
| lib/trino-hive-formats/pom.xml | Replaces ESRI dep with trino-geospatial-toolkit + jts-core. |
| lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java | Updates serialization tests to JTS/EWKB expectations. |
| lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java | Removes ESRI-based serde benchmark. |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | Replaces custom binary format with EWKB-based serde + SRID utilities. |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java | Removes legacy serialization type enum (no longer needed). |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java | Removes legacy ESRI-based serialization implementation. |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java | Adds ESRI Shape binary parser emitting JTS geometries. |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java | Ports helpers (contains/disjoint/union/rect checks) to JTS. |
| lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java | Maps JTS LinearRing to internal LINE_STRING type. |
| lib/trino-geospatial-toolkit/pom.xml | Drops ESRI dependency and removes now-unneeded jakarta.annotation optional dep. |
| core/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.java | Switches spatial predicate interface from OGCGeometry to JTS Geometry. |
| core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.java | Switches spatial index build path to JTS geometry + envelopes. |
| core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java | Switches index contents to JTS geometry + envelope queries. |
| core/trino-main/pom.xml | Drops ESRI geometry dependency from core engine. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -40,21 +44,21 @@ public GeometryState createGroupedState() | |||
| public static class GroupedGeometryState | |||
| implements GeometryState, GroupedAccumulatorState | |||
| { | |||
| private final ObjectBigArray<OGCGeometry> geometries = new ObjectBigArray<>(); | |||
| private final ObjectBigArray<Geometry> geometries = new ObjectBigArray<>(); | |||
|
|
|||
| private int groupId; | |||
| private long size; | |||
|
|
|||
| @Override | |||
| public OGCGeometry getGeometry() | |||
| public Geometry getGeometry() | |||
| { | |||
| return geometries.get(groupId); | |||
| } | |||
|
|
|||
| @Override | |||
| public void setGeometry(OGCGeometry geometry) | |||
| public void setGeometry(Geometry geometry) | |||
| { | |||
| OGCGeometry previousValue = this.geometries.getAndSet(groupId, geometry); | |||
| Geometry previousValue = this.geometries.getAndSet(groupId, geometry); | |||
| size -= getGeometryMemorySize(previousValue); | |||
| size += getGeometryMemorySize(geometry); | |||
| } | |||
| @@ -79,34 +83,31 @@ public final void setGroupId(int groupId) | |||
| } | |||
|
|
|||
| // Do a best-effort attempt to estimate the memory size | |||
| private static long getGeometryMemorySize(OGCGeometry geometry) | |||
| private static long getGeometryMemorySize(Geometry geometry) | |||
| { | |||
| if (geometry == null) { | |||
| return 0; | |||
| } | |||
| // Due to the following issue: | |||
| // https://github.com/Esri/geometry-api-java/issues/192 | |||
| // We must check if the geometry is empty before calculating its size. Once the issue is resolved | |||
| // and we bring the fix into our codebase, we can remove this check. | |||
| if (geometry.isEmpty()) { | |||
| return OGC_GEOMETRY_BASE_INSTANCE_SIZE; | |||
| return GEOMETRY_BASE_INSTANCE_SIZE; | |||
| } | |||
| return geometry.estimateMemorySize(); | |||
| // Estimate: base size + size per coordinate | |||
| return GEOMETRY_BASE_INSTANCE_SIZE + (long) geometry.getNumPoints() * COORDINATE_SIZE; | |||
| } | |||
There was a problem hiding this comment.
I rewrote the size calculation to consider the exact type.
| public static boolean spatiallyEquals(String wkt1, String wkt2) | ||
| { | ||
| var geom1 = stGeometryFromText(utf8Slice(wkt1)); | ||
| var geom2 = stGeometryFromText(utf8Slice(wkt2)); | ||
|
|
||
| if (stIsEmpty(geom1) && stIsEmpty(geom2)) { | ||
| return true; | ||
| } | ||
| return stEquals(geom1, geom2); | ||
| } |
There was a problem hiding this comment.
Special handling was added for empty geometries to show that they return the expected type. This did show some tests that had the wrong expected type, where wrong is defined as not the default behavior for JTS.
| var expectedGeometry = stGeometryFromText(utf8Slice(expectedWkt)); | ||
| var actualGeometry = stGeometryFromText(utf8Slice(actualWkt)); | ||
| if (stIsEmpty(expectedGeometry)) { | ||
| assertThat(stIsEmpty(actualGeometry)) | ||
| .withFailMessage("Expected empty geometry, but got: %s", actualWkt) | ||
| .isTrue(); | ||
| return; |
| else if (x != null) { | ||
| // Empty point (x is present but null or NaN) | ||
| return GEOMETRY_FACTORY.createPoint(); |
| if ("NaN".equalsIgnoreCase(value)) { | ||
| return null; | ||
| } | ||
| return Double.parseDouble(value); |
There was a problem hiding this comment.
It was rewritten to follow the original Hive code.
| // Check if we have multiple exterior rings (need MultiPolygon) | ||
| List<Polygon> polygons = new ArrayList<>(); | ||
| LinearRing currentShell = null; | ||
| List<LinearRing> currentHoles = new ArrayList<>(); | ||
|
|
||
| for (Coordinate[] ring : rings) { | ||
| LinearRing linearRing = GEOMETRY_FACTORY.createLinearRing(ring); | ||
| boolean isClockwise = isClockwise(ring); | ||
|
|
||
| if (isClockwise) { | ||
| // Clockwise = exterior ring in ESRI format | ||
| // Save previous polygon if exists | ||
| if (currentShell != null) { | ||
| polygons.add(GEOMETRY_FACTORY.createPolygon(currentShell, currentHoles.toArray(new LinearRing[0]))); | ||
| currentHoles.clear(); | ||
| } | ||
| currentShell = linearRing; | ||
| } | ||
| else { | ||
| // Counter-clockwise = interior ring (hole) in ESRI format | ||
| if (currentShell != null) { | ||
| currentHoles.add(linearRing); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Don't forget the last polygon | ||
| if (currentShell != null) { | ||
| polygons.add(GEOMETRY_FACTORY.createPolygon(currentShell, currentHoles.toArray(new LinearRing[0]))); | ||
| } | ||
|
|
||
| if (polygons.size() == 1) { | ||
| return polygons.getFirst(); | ||
| } | ||
| return GEOMETRY_FACTORY.createMultiPolygon(polygons.toArray(new Polygon[0])); | ||
| } |
There was a problem hiding this comment.
This was rewritten to follow the original Hive code. Additionally, I added test coverage for this because none of this was actually tested.
c2652b2 to
2f14401
Compare
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThe PR migrates the geospatial stack from ESRI OGC types to JTS Geometry across the codebase. ESRI dependencies and custom ESRI serde/serialization enums were removed; new JTS-based serde, utilities, and readers were added (e.g., JtsGeometrySerde, GeometryUtils, EsriShapeReader). Spatial indexes, plugin types, functions, aggregations, and tests were updated to use JTS Geometry and EWKB/WKB serialization. Maven POMs were adjusted to remove ESRI artifacts and add the trino-geospatial-toolkit and JTS artifacts. Comment Tip CodeRabbit can suggest fixes for GitHub Check annotations.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (6)
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java (2)
262-267: Silently dropping orphan holes may mask data issues.If a counter-clockwise ring (hole) appears before any clockwise ring (shell), it's silently ignored. While ESRI spec mandates shells precede their holes, malformed data would lose geometry without any indication.
Consider logging a warning or throwing when encountering a hole without a preceding shell:
🛡️ Suggested defensive logging
else { // Counter-clockwise = interior ring (hole) in ESRI format if (currentShell != null) { currentHoles.add(ring); } + else { + // Log warning: hole ring found before any exterior ring + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java` around lines 262 - 267, The code currently drops counter-clockwise rings when currentShell is null (orphan holes) in EsriShapeReader; update the branch handling holes (the else where currentShell is checked) to detect currentShell == null and either log a warning via the class logger (including ring index/shape id/context from available variables) or throw a descriptive exception instead of silently ignoring; ensure this behavior is applied where currentHoles.add(ring) is invoked and include enough context (e.g., ring, shapeId, stream offset) in the message to aid debugging.
281-292: Javadoc describes wrong sign convention for this formula.The comment states "Positive signed area = counter-clockwise" but the trapezoid formula used here has the opposite sign: positive sum indicates clockwise orientation. The code correctly returns
sum > 0for clockwise, but the documentation is misleading.📝 Suggested Javadoc correction
/** * Determines if a ring is clockwise using the shoelace formula. - * Positive signed area = counter-clockwise, negative signed area = clockwise. + * With the edge-sum variant used here, positive sum indicates clockwise, + * negative sum indicates counter-clockwise. */ private static boolean isClockwise(Coordinate[] ring)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java` around lines 281 - 292, The Javadoc for isClockwise currently states the wrong sign convention; update the comment for the isClockwise(Coordinate[] ring) method to reflect the actual formula used (the implemented shoelace/trapezoid sum yields positive for clockwise and negative for counter-clockwise) so the doc matches the code's return condition (sum > 0). Reference the isClockwise method and the local variable sum in your updated comment to make the correct sign convention explicit.plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java (1)
117-123: Consider catching a more specific exception type.Catching generic
Exception(line 120) is overly broad. TheJtsGeometrySerde.deserialize()method throwsIllegalArgumentExceptionfor invalid WKB, andGeometry.toText()shouldn't throw for valid geometries. Consider catchingIllegalArgumentExceptionorRuntimeExceptionto avoid masking unexpected errors.♻️ Suggested change
try { return JtsGeometrySerde.deserialize(getSlice(block, position)).toText(); } - catch (Exception e) { + catch (IllegalArgumentException e) { return "<invalid geometry>"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java` around lines 117 - 123, The current catch in the method calling JtsGeometrySerde.deserialize(getSlice(block, position)).toText() is too broad; replace the catch of Exception with a more specific catch for IllegalArgumentException (the error type thrown for invalid WKB) so only invalid-geometry input is mapped to "<invalid geometry>" and other unexpected errors still bubble up; update the catch block that surrounds JtsGeometrySerde.deserialize(...) and .toText() to catch IllegalArgumentException (or IllegalArgumentException and RuntimeException if you want to be slightly broader) and return "<invalid geometry>" there.plugin/trino-hive/pom.xml (1)
580-586: Transitive dependency suppression is acceptable but consider explicit declaration.The configuration correctly suppresses maven-dependency-plugin warnings for
jts-corebeing used via the transitive dependency fromtrino-hive-formats.For improved maintainability, consider explicitly declaring
jts-coreas a<scope>test</scope>dependency instead of relying on the transitive dependency. This would make the dependency explicit and prevent potential breakage iftrino-hive-formatschanges its dependencies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-hive/pom.xml` around lines 580 - 586, The pom currently suppresses warnings for the transitive dependency org.locationtech.jts:jts-core (referenced in ignoredNonTestScopedDependency and ignoredUsedUndeclaredDependency and used by TestEsriTable via trino-hive-formats); instead add an explicit test-scoped dependency declaration for org.locationtech.jts:jts-core (scope test) in the project POM so the test usage is declared directly (you can keep the suppression entries if desired, but ensure org.locationtech.jts:jts-core appears under <dependencies> with <scope>test</scope> to avoid relying on trino-hive-formats being the transitive provider).plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java (1)
24-25: Consider thread-safety of sharedWKTReader.JTS
WKTReaderis not thread-safe. While JUnit Jupiter runs test methods sequentially by default within the same class, a shared static instance could cause issues if test parallelization is enabled in the future. Consider creating a new instance per test method or using aThreadLocal.♻️ Safer alternative
- private static final WKTReader WKT_READER = new WKTReader(); + private final WKTReader wktReader = new WKTReader();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java` around lines 24 - 25, The shared static WKT_READER in TestGeometryStateFactory is unsafe because JTS WKTReader is not thread-safe; replace the static instance with a thread-safe approach by creating a new WKTReader per test (e.g., construct a WKTReader inside each test method or inside a `@BeforeEach` method) or change it to a ThreadLocal<WKTReader> (referencing WKT_READER) so each thread/test gets its own instance; update all usages in TestGeometryStateFactory to use the new per-test or ThreadLocal instance.lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java (1)
187-199: Add a non-zero SRID round-trip case here.All current fixtures come from plain WKT, so every geometry in this suite has SRID
0. That leaves the new EWKB SRID handling untested in the main serialization coverage.Minimal coverage addition
+ `@Test` + public void testSridRoundTrip() + { + Geometry geometry = createJtsGeometry("POINT (1 2)"); + geometry.setSRID(4326); + + Geometry deserialized = deserialize(serialize(geometry)); + assertThat(deserialized.getSRID()).isEqualTo(4326); + assertThat(deserialized.norm()).isEqualTo(geometry.norm()); + }Also applies to: 216-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java` around lines 187 - 199, Add a test case that verifies non-zero SRID round-trip: create a JTS Geometry via createJtsGeometry(...), call geometry.setSRID(e.g. 4326) before serializing with serialize(...), then deserialize(...) and assert both geometry.norm() and deserialized.getSRID() (and/or geometry.getSRID()) match the non-zero SRID; apply the same pattern to the other similar helper/test near the geometryFromText usage (lines around the second occurrence) so EWKB SRID handling is exercised for both serialize/deserialize paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java`:
- Around line 96-100: The current getEstimatedMemorySizeInBytes() in
PagesRTreeIndex underestimates memory by only using geometry.getNumPoints();
update it to account for full retained size by either (A) storing and returning
the serialized Slice size captured during deserialization in buildRTree() (reuse
the existing serialized slice retained at build time and return INSTANCE_SIZE +
serializedSlice.length()), or (B) implement a deep retained-size traversal of
the JTS object graph starting from geometry (walk Geometry, LinearRing,
CoordinateSequence, component Geometries, and cached Envelope) and sum
object+array sizes, then return INSTANCE_SIZE + deepSize; ensure
PagesSpatialIndexSupplier.getEstimatedSize() reads this corrected value so
operator accounting reflects multipart/composite geometries accurately.
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java`:
- Around line 98-108: The contains() method currently unwraps all
GeometryCollection instances and returns true if any single child contains the
argument, which breaks JTS semantics for
MultiPolygon/MultiLineString/MultiPoint; change the logic in
GeometryUtils.contains so that it only iterates children for pure, heterogeneous
GeometryCollection instances, but for instances of MultiPolygon,
MultiLineString, and MultiPoint delegate to geometry.contains(tileGeometry)
(i.e., do not short-circuit by checking individual components for these
multi-types) so the union semantics from JTS are preserved.
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 54-63: The current external-input validation in
JtsGeometrySerde.deserialize (and the other similar methods noted) uses
verify(...) which throws VerifyException; change those checks to throw
IllegalArgumentException instead to be consistent with parse error
handling—replace calls like verify(shape.length() > 0, "shape is empty") with an
explicit if-check that throws new IllegalArgumentException("shape is empty")
(and similarly for null checks or other verify usages) so all invalid WKB/EWKB
input paths use IllegalArgumentException (matching the existing catch block that
wraps ParseException).
- Around line 97-114: The code in JtsGeometrySerde that decodes the WKB type
reads the endianness byte into variable endianness and then falls back to the
little-endian branch for any value != 0, which lets invalid EWKB through; add an
explicit validation immediately after reading endianness (shape.getByte(0)) to
check that the value is either 0 or 1 and throw a descriptive exception (e.g.,
IllegalArgumentException or a Trino-specific parsing error) if it is not, so the
method (where wkbType is computed) rejects invalid byte-order markers rather
than silently interpreting them as little-endian.
In
`@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java`:
- Around line 137-143: The coordinate-parsing branch in EsriJsonParser currently
coerces strings and booleans into doubles, causing silent data corruption;
update the method that reads x/y coordinates to accept only numeric tokens
(VALUE_NUMBER_FLOAT or VALUE_NUMBER_INT) and return parser.getDoubleValue() (or
equivalent) for those, treat VALUE_STRING by explicitly parsing the string with
Double.parseDouble and throwing a descriptive parse exception if it fails, and
refuse JsonToken.VALUE_TRUE / JsonToken.VALUE_FALSE (throw an error) rather than
converting them; reference the tokens VALUE_NUMBER_FLOAT, VALUE_NUMBER_INT,
VALUE_STRING, JsonToken.VALUE_TRUE, JsonToken.VALUE_FALSE and the EsriJsonParser
coordinate read method when making the change.
In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java`:
- Around line 50-54: The SRID validation must run before taking the
empty-geometry fast path; call validateAndGetSrid(...) to reconcile and validate
SRIDs whenever either the input geometry or the accumulator geometry exists
(even if one is empty) so conflicting non-zero SRIDs are rejected; update
ConvexHullAggregation to compute srid = validateAndGetSrid(state.getGeometry(),
geometry) (or equivalent) before any early-return/empty-geometry branches, then
proceed to safeUnion(...).convexHull() and setSRID(srid) in both the non-empty
and empty-input merge/update paths (the blocks around the current use of
validateAndGetSrid, safeUnion, state.getGeometry(), and geometry.isEmpty()).
In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java`:
- Around line 47-51: GeometryUnionAgg currently skips calling validateAndGetSrid
when one side is empty, allowing mismatched non-zero SRIDs to be silently
accepted; update both branches where one geometry is empty (the blocks around
safeUnion/state.setGeometry) to always call
validateAndGetSrid(state.getGeometry(), geometry) (and the symmetric call in the
other branch) before creating/setting the result so the SRID is reconciled, then
set the returned SRID on the resulting Geometry via result.setSRID(srid) (or on
the empty geometry if safeUnion isn't invoked) and finally call
state.setGeometry(result).
In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java`:
- Around line 94-100: The decode branch that currently returns an empty
LineString when coordinates.size() < 2 should instead reject the unsupported
single-point case by throwing a clear runtime error; replace the conditional
that returns GEOMETRY_FACTORY.createLineString() with logic that throws an
appropriate Trino-level exception (with a descriptive message) when
coordinates.size() == 1 so the point is not silently dropped. Apply the same
change to the analogous block later (the other coordinates.size() < 2 branch
around lines 106-112) so both single-point decodes fail explicitly rather than
producing LINESTRING EMPTY; locate these in EncodedPolylineFunctions where
CoordinateArraySequence, LineString, and GEOMETRY_FACTORY are used.
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java`:
- Around line 67-75: The benchmarks deserializeSimpleGeometry and
deserializeEnvelopeSimpleGeometry (and the same pattern at lines 115-123) are
implicitly measuring serialize(...) + allocation + deserialize(...); precompute
the serialized EWKB once in a `@Setup` method on BenchmarkData (e.g., add byte[]
fields like serializedSimpleGeometry, serializedOtherCase and fill them by
calling serialize(data.simpleGeometry) in `@Setup`) and change these `@Benchmark`
methods to call deserialize(serializedSimpleGeometry) and
deserializeEnvelope(serializedSimpleGeometry) so they only measure the
deserialize path; apply the same change for the other similar benchmark methods
referenced.
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java`:
- Around line 81-105: The array helper assertSpatialArrayEquals currently passes
the 'NULL' sentinel into stGeometryFromText and skips the empty-geometry type
check present in assertSpatialEquals; update assertSpatialArrayEquals so for
each element it mirrors assertSpatialEquals: first treat the transformed actual
string that equals 'NULL' (or actual null) as a null/empty geometry and assert
accordingly (avoiding calling stGeometryFromText on the sentinel), and for
non-sentinel values perform the empty-geometry type check exactly as
assertSpatialEquals does before calling stGeometryFromText(utf8Slice(...)) and
comparing with stEquals; ensure you reference the same predicates/handling used
in assertSpatialEquals so array and scalar behaviors match.
---
Nitpick comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`:
- Around line 262-267: The code currently drops counter-clockwise rings when
currentShell is null (orphan holes) in EsriShapeReader; update the branch
handling holes (the else where currentShell is checked) to detect currentShell
== null and either log a warning via the class logger (including ring
index/shape id/context from available variables) or throw a descriptive
exception instead of silently ignoring; ensure this behavior is applied where
currentHoles.add(ring) is invoked and include enough context (e.g., ring,
shapeId, stream offset) in the message to aid debugging.
- Around line 281-292: The Javadoc for isClockwise currently states the wrong
sign convention; update the comment for the isClockwise(Coordinate[] ring)
method to reflect the actual formula used (the implemented shoelace/trapezoid
sum yields positive for clockwise and negative for counter-clockwise) so the doc
matches the code's return condition (sum > 0). Reference the isClockwise method
and the local variable sum in your updated comment to make the correct sign
convention explicit.
In
`@lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java`:
- Around line 187-199: Add a test case that verifies non-zero SRID round-trip:
create a JTS Geometry via createJtsGeometry(...), call geometry.setSRID(e.g.
4326) before serializing with serialize(...), then deserialize(...) and assert
both geometry.norm() and deserialized.getSRID() (and/or geometry.getSRID())
match the non-zero SRID; apply the same pattern to the other similar helper/test
near the geometryFromText usage (lines around the second occurrence) so EWKB
SRID handling is exercised for both serialize/deserialize paths.
In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java`:
- Around line 117-123: The current catch in the method calling
JtsGeometrySerde.deserialize(getSlice(block, position)).toText() is too broad;
replace the catch of Exception with a more specific catch for
IllegalArgumentException (the error type thrown for invalid WKB) so only
invalid-geometry input is mapped to "<invalid geometry>" and other unexpected
errors still bubble up; update the catch block that surrounds
JtsGeometrySerde.deserialize(...) and .toText() to catch
IllegalArgumentException (or IllegalArgumentException and RuntimeException if
you want to be slightly broader) and return "<invalid geometry>" there.
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java`:
- Around line 24-25: The shared static WKT_READER in TestGeometryStateFactory is
unsafe because JTS WKTReader is not thread-safe; replace the static instance
with a thread-safe approach by creating a new WKTReader per test (e.g.,
construct a WKTReader inside each test method or inside a `@BeforeEach` method) or
change it to a ThreadLocal<WKTReader> (referencing WKT_READER) so each
thread/test gets its own instance; update all usages in TestGeometryStateFactory
to use the new per-test or ThreadLocal instance.
In `@plugin/trino-hive/pom.xml`:
- Around line 580-586: The pom currently suppresses warnings for the transitive
dependency org.locationtech.jts:jts-core (referenced in
ignoredNonTestScopedDependency and ignoredUsedUndeclaredDependency and used by
TestEsriTable via trino-hive-formats); instead add an explicit test-scoped
dependency declaration for org.locationtech.jts:jts-core (scope test) in the
project POM so the test usage is declared directly (you can keep the suppression
entries if desired, but ensure org.locationtech.jts:jts-core appears under
<dependencies> with <scope>test</scope> to avoid relying on trino-hive-formats
being the transitive provider).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d5f9042b-8dd4-4604-a02f-3dd402b8df8c
📒 Files selected for processing (57)
core/trino-main/pom.xmlcore/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.javacore/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.javacore/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.javalib/trino-geospatial-toolkit/pom.xmllib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.javalib/trino-hive-formats/pom.xmllib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.javalib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.javaplugin/trino-geospatial/pom.xmlplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeoFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.javaplugin/trino-hive/pom.xmlplugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.javaplugin/trino-hive/src/test/resources/esri/counties_expected.txtplugin/trino-postgresql/pom.xmlplugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.javapom.xml
💤 Files with no reviewable changes (7)
- core/trino-main/pom.xml
- plugin/trino-geospatial/pom.xml
- lib/trino-geospatial-toolkit/pom.xml
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
- pom.xml
- lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java
| public long getEstimatedMemorySizeInBytes() | ||
| { | ||
| return INSTANCE_SIZE + ogcGeometry.estimateMemorySize(); | ||
| // Estimate based on number of coordinates (each coordinate is 2-3 doubles) | ||
| return INSTANCE_SIZE + (long) geometry.getNumPoints() * 24; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -type f -name "PagesRTreeIndex.java" 2>/dev/null | head -5Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
# First, let's find and examine the file
find . -name "PagesRTreeIndex.java" -type f 2>/dev/nullRepository: trinodb/trino
Length of output: 127
🏁 Script executed:
# Let's search for the file using git
git ls-files | grep -i "PagesRTreeIndex"Repository: trinodb/trino
Length of output: 125
🏁 Script executed:
wc -l core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.javaRepository: trinodb/trino
Length of output: 129
🏁 Script executed:
cat -n core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.javaRepository: trinodb/trino
Length of output: 11294
🏁 Script executed:
# Search for other memory size estimates for geometries in the codebase
rg -i "geometry.*memory\|memory.*geometry" --type java | head -20Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
# Search for JtsGeometrySerde which is imported
find . -name "*JtsGeometrySerde*" -type f 2>/dev/nullRepository: trinodb/trino
Length of output: 251
🏁 Script executed:
# Look for getNumPoints usage in the codebase
rg "getNumPoints" --type javaRepository: trinodb/trino
Length of output: 891
🏁 Script executed:
wc -l ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javaRepository: trinodb/trino
Length of output: 153
🏁 Script executed:
cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | head -150Repository: trinodb/trino
Length of output: 6703
🏁 Script executed:
cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | tail -50Repository: trinodb/trino
Length of output: 1910
🏁 Script executed:
# Check if there's any pattern for retaining serialized slice sizes in the codebase
rg "slice.*length\|getSliceLength\|estimatedSize" --type java | grep -i geometry | head -10Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
# Look at how geometries are stored in the spatial index
rg "STRtree\|rtree" --type java -B 2 -A 2 | head -40Repository: trinodb/trino
Length of output: 39
🌐 Web query:
JTS Geometry getNumPoints() what does it return vertices coordinate points
💡 Result:
In JTS, Geometry.getNumPoints() returns the count of the geometry’s vertices—i.e., the total number of coordinate positions stored by the geometry (not the number of distinct/unique points). For composite geometries (e.g., Multi*, GeometryCollection), this is effectively the sum of the vertex counts of the components. [1]
Implications:
Point→1(or0if empty).LineString→ number of coordinates in the line.Polygon→ counts vertices in the shell and holes; ring-closure coordinates (where the last coordinate equals the first) are included because they are vertices in the coordinate sequence. [1]
Source: JTS Geometry Javadoc: “Returns the count of this Geometry’s vertices” / “the number of vertices in this Geometry.” [1]
🌐 Web query:
LocationTech JTS Geometry memory object graph retained size structure
💡 Result:
In JTS (LocationTech) the retained size you see in a heap/object-graph tool for a Geometry is driven mostly by (a) the geometry topology objects (Polygon → rings, collections → components) and (b) the coordinate storage (CoordinateSequence implementation), plus a few always-present base fields.
1) Base Geometry object graph (common to all)
Geometry holds: an Envelope field (protected Envelope envelope), a GeometryFactory reference (protected final GeometryFactory factory), an int SRID, and optional userData. [1]
So the retained graph typically starts:
Geometry → Envelope (may be null until computed) + GeometryFactory + userData.
2) Concrete geometry topology objects
- Point:
Pointstores exactly oneCoordinateSequencereference:private CoordinateSequence coordinates;[2]
Graph:Point→CoordinateSequence→ underlying coordinate storage. - LineString / LinearRing:
LineStringstoresprotected CoordinateSequence points;[3]
Graph:LineString→CoordinateSequence→ underlying coordinate storage. - Polygon:
Polygonstoresprotected LinearRing shellandprotected LinearRing[] holes[4].
Graph:Polygon→shell(LinearRing) → (LineString.points) →CoordinateSequence→ storage, and similarly for each hole.
3) Coordinate storage is usually the biggest driver
JTS uses CoordinateSequence to hold vertices. Two common implementations have very different object graphs:
A) CoordinateArraySequence (default)
Backed by a Coordinate[] coordinates array. [5]
Graph cost: one object for the sequence + one Java array + one Coordinate object per vertex.
B) PackedCoordinateSequence (memory-lean for many vertices)
Backed by a primitive array (double[] coords in PackedCoordinateSequence.Double, or float[] in .Float). [6]
Graph cost: one object for the sequence + one primitive array (much less per-vertex overhead).
Important retained-size detail: PackedCoordinateSequence also has a transient SoftReference cache (coordRef) to a materialized Coordinate[] created by toCoordinateArray(). [6] If that cache has been populated and not cleared, your geometry’s retained graph can temporarily include the cached Coordinate[] plus all the per-vertex Coordinate objects (even though the “real” storage is packed).
Sources: Geometry base fields [1], Point coordinate sequence field [2], LineString points field [3], Polygon shell/holes fields [4], CoordinateArraySequence backing array [5], PackedCoordinateSequence packed arrays + soft cache [6].
🏁 Script executed:
# Search for usage of getEstimatedMemorySizeInBytes
rg "getEstimatedMemorySizeInBytes" --type javaRepository: trinodb/trino
Length of output: 369
🏁 Script executed:
# Check if there's memory tracking or operator memory accounting that uses this
rg "PagesRTreeIndex\|GeometryWithPosition" --type java -B 2 -A 2 | head -60Repository: trinodb/trino
Length of output: 39
🏁 Script executed:
find . -name "PagesSpatialIndexSupplier.java" -type f 2>/dev/nullRepository: trinodb/trino
Length of output: 137
🏁 Script executed:
cat -n core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.javaRepository: trinodb/trino
Length of output: 9260
This retained-size estimate significantly underestimates JTS geometry memory, especially for composite shapes.
getNumPoints() * 24 accounts only for raw coordinate payload but misses the object graph for coordinate sequences, linear rings, component geometries, and envelope caching. For a Polygon with holes, this omits multiple LinearRing objects and their ring arrays; for MultiPolygon or GeometryCollection, it ignores all component Geometry objects. Since this feeds operator memory accounting (via PagesSpatialIndexSupplier.getEstimatedSize()), large multipart geometries will be materially under-reported and can exceed memory limits without triggering spill-over.
Either retain the serialized slice size alongside the geometry (available at deserialization in buildRTree() line 114) or use a deeper retained-size walk of the JTS object graph.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java` around
lines 96 - 100, The current getEstimatedMemorySizeInBytes() in PagesRTreeIndex
underestimates memory by only using geometry.getNumPoints(); update it to
account for full retained size by either (A) storing and returning the
serialized Slice size captured during deserialization in buildRTree() (reuse the
existing serialized slice retained at build time and return INSTANCE_SIZE +
serializedSlice.length()), or (B) implement a deep retained-size traversal of
the JTS object graph starting from geometry (walk Geometry, LinearRing,
CoordinateSequence, component Geometries, and cached Envelope) and sum
object+array sizes, then return INSTANCE_SIZE + deepSize; ensure
PagesSpatialIndexSupplier.getEstimatedSize() reads this corrected value so
operator accounting reflects multipart/composite geometries accurately.
Fix test data that was accepted by ESRI but rejected by JTS which strictly enforces the OGC Simple Features Specification: - Close polygon rings (first point must equal last point) - Fix single-point LINESTRING to have two points (minimum required) - Fix MULTILINESTRING EMPTY syntax (remove extra parentheses) - Replace invalid MULTIPOLYGON with overlapping polygons using ST_Union - Replace degenerate polygons in GEOMETRYCOLLECTION with valid geometries
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java (1)
114-116: Fail fast when an expected county fixture entry is missing.Assert key presence before value comparison so failures clearly identify missing test data instead of comparing against
null.Suggested patch
+ assertThat(expectedWkt) + .describedAs("Missing expected WKT fixture for county: %s", name) + .containsKey(name); assertThat(actualWkt) .describedAs("WKT for county: %s", name) .isEqualTo(expectedWkt.get(name));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java` around lines 114 - 116, The test currently compares actualWkt to expectedWkt.get(name) which yields unclear failures when the expected fixture key is missing; before the comparison in TestEsriTable (around variables actualWkt, expectedWkt, name), add an assertion that the expectedWkt contains the key (e.g., assertThat(expectedWkt).containsKey(name) or equivalent) so the test fails fast with a clear message about the missing fixture key, then proceed to the existing describedAs/isEqualTo comparison.plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java (1)
154-157: Consider simplifying redundant field assignments.Since
geometryis already of typeGeometry, the assignmentjtsGeometry = geometrycreates an alias to the same object. The current setup may be intentional for benchmark naming clarity (distinguishingGeoFunctionspaths from direct JTS paths), so this is a minor observation.If simplification is desired, you could inline these or rename for consistency:
♻️ Optional simplification
- private Geometry jtsGeometry; - private Point innerJtsPoint; - private Point outerJtsPointInEnvelope; - private Point outerJtsPointNotInEnvelope; ... - jtsGeometry = geometry; - innerJtsPoint = (Point) innerPoint; - outerJtsPointInEnvelope = (Point) outerPointInEnvelope; - outerJtsPointNotInEnvelope = (Point) outerPointNotInEnvelope;And update the direct JTS benchmark methods to cast inline:
- return data.jtsGeometry.contains(data.innerJtsPoint); + return data.geometry.contains((Point) data.innerPoint);Alternatively, keep the current structure if the explicit separation improves benchmark readability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java` around lines 154 - 157, The field assignments jtsGeometry = geometry and the Point casts (innerJtsPoint = (Point) innerPoint, outerJtsPointInEnvelope = (Point) outerPointInEnvelope, outerJtsPointNotInEnvelope = (Point) outerPointNotInEnvelope) are redundant aliases; either remove the intermediate fields and use geometry/innerPoint/outerPoint... directly in the JTS benchmark methods, or eliminate the jts assignment and perform the casts inline where innerJtsPoint, outerJtsPointInEnvelope, and outerJtsPointNotInEnvelope are used; update usages in the JTS benchmark methods accordingly (referencing jtsGeometry, innerJtsPoint, outerJtsPointInEnvelope, outerJtsPointNotInEnvelope) to avoid duplicate aliases while preserving benchmark naming clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`:
- Around line 172-180: The loop in EsriShapeReader that builds
LineString/LinearRing parts uses startIndex/endIndex without validating ranges
(in the block creating lineStrings and the analogous block around lines
222-230), which can lead to negative partLength or ArrayIndexOutOfBounds; add
explicit guards in the part-construction loops inside EsriShapeReader: check
that 0 <= startIndex < numPoints, that endIndex is between startIndex and
numPoints, and that partLength = endIndex - startIndex > 0; if any check fails,
throw a parsing exception (or return a parse error) instead of proceeding, so
System.arraycopy and GEOMETRY_FACTORY.createLineString/createLinearRing are only
called with valid indices. Ensure the same validation is applied to the second
block mentioned (lines ~222-230).
- Around line 111-123: The code trusts numPoints/numParts from the input and
allocates arrays directly which can lead to OOM or late read failures; update
the readers (e.g., the block that reads numPoints and returns
GEOMETRY_FACTORY.createMultiPointFromCoords(coords), and the analogous blocks
around lines handling numParts) to validate counts before allocation: ensure
counts are non-negative and within a reasonable maximum (or derived from
remaining input length), throw a clear ParseException/IOException on invalid or
excessively large counts, and only then allocate Coordinate[] (or int[] for
parts) and proceed to read coordinates/part indices; apply the same validation
pattern to the other occurrences noted (the blocks at ~149-168 and ~199-218).
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java`:
- Around line 51-60: The benchmark currently measures intersection plus EWKB
serialization because the `@Benchmark` methods envelopes and geometries call
serialize(...); to isolate intersection cost, remove serialize(...) from the
timed path: have envelopes and geometries return or consume only the raw result
of stIntersection(data.envelope, data.otherEnvelope) and
stIntersection(data.geometry, data.otherGeometry) (or accept a JMH Blackhole
parameter and consume the returned geometry there), and move EWKB serialization
into a separate benchmark method that calls serialize(...) on the intersection
result; update references to the serialize and stIntersection calls accordingly.
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java`:
- Around line 71-79: The tests testSphericalGeographyResult and the earlier test
only assert spatial equality of the WKT payload, so update each test that calls
computeActual(...) and spatiallyEquals(...) (e.g., uses ST_GeometryFromText and
to_spherical_geography in these methods) to also assert the returned SQL type
before checking spatial content: call the existing API you use for type checks
on the result of computeActual(...) (or inspect the
MaterializedResult/Block/Type returned by computeActual) to assert the expected
SQL type (e.g., geometry or geography) is correct, then keep the
spatiallyEquals(...) assertion for the WKT payload. Ensure you reference the
same result value used by spatiallyEquals when performing the type assertion so
both checks validate the same query output.
In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java`:
- Around line 132-133: The fixture loader in TestEsriTable uses
content.split("\n") which leaves CR on CRLF files and causes WKT mismatches;
change the line iteration to be line-ending-agnostic (e.g., use Java's
line-break-aware splitting or a BufferedReader/lines() stream) so that the loop
over content -> line and subsequent parsing into parts (the code that calls
content.split("\n") and then line.split("\t", 2) / uses parts) will strip CRs
and parse consistently across platforms.
In
`@plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java`:
- Around line 1945-1955: Replace the bind expression so PostGIS reads EWKB
(preserving SRID): in PostgreSqlClient, update the getBindExpression() method
(the block paired with the set(PreparedStatement, int, Object) method that calls
serialize(geometry)) to return "ST_GeomFromEWKB(?)" instead of
"ST_GeomFromWKB(?)" so the EWKB emitted by serialize(geometry) is correctly
interpreted.
---
Nitpick comments:
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java`:
- Around line 154-157: The field assignments jtsGeometry = geometry and the
Point casts (innerJtsPoint = (Point) innerPoint, outerJtsPointInEnvelope =
(Point) outerPointInEnvelope, outerJtsPointNotInEnvelope = (Point)
outerPointNotInEnvelope) are redundant aliases; either remove the intermediate
fields and use geometry/innerPoint/outerPoint... directly in the JTS benchmark
methods, or eliminate the jts assignment and perform the casts inline where
innerJtsPoint, outerJtsPointInEnvelope, and outerJtsPointNotInEnvelope are used;
update usages in the JTS benchmark methods accordingly (referencing jtsGeometry,
innerJtsPoint, outerJtsPointInEnvelope, outerJtsPointNotInEnvelope) to avoid
duplicate aliases while preserving benchmark naming clarity.
In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java`:
- Around line 114-116: The test currently compares actualWkt to
expectedWkt.get(name) which yields unclear failures when the expected fixture
key is missing; before the comparison in TestEsriTable (around variables
actualWkt, expectedWkt, name), add an assertion that the expectedWkt contains
the key (e.g., assertThat(expectedWkt).containsKey(name) or equivalent) so the
test fails fast with a clear message about the missing fixture key, then proceed
to the existing describedAs/isEqualTo comparison.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 09043123-7516-4ed1-8d5b-c17f15247462
📒 Files selected for processing (59)
core/trino-main/pom.xmlcore/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.javacore/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.javacore/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.javalib/trino-geospatial-toolkit/pom.xmllib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.javalib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/TestGeometryUtils.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.javalib/trino-hive-formats/pom.xmllib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.javalib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.javaplugin/trino-geospatial/pom.xmlplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeoFunctions.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.javaplugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryConvexHullGeoAggregation.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.javaplugin/trino-hive/pom.xmlplugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.javaplugin/trino-hive/src/test/resources/esri/counties_expected.txtplugin/trino-postgresql/pom.xmlplugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.javapom.xml
💤 Files with no reviewable changes (7)
- core/trino-main/pom.xml
- plugin/trino-geospatial/pom.xml
- lib/trino-geospatial-toolkit/pom.xml
- lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
- pom.xml
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java
✅ Files skipped from review due to trivial changes (4)
- lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java
🚧 Files skipped from review as they are similar to previous changes (22)
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java
- lib/trino-hive-formats/pom.xml
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java
- lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java
- plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java
Adds assertSpatialEquals helper to TestGeoFunctions that uses stEquals for geometry comparison. Converts testSTGeometryType and testSTBuffer to use the new helper. testSTBuffer was updated to use property-based assertions (ST_Envelope and ST_Area with tolerance) instead of exact WKT coordinate matching. This makes the tests stable across CPU architectures (ARM vs x86) where trigonometric functions can produce slightly different floating-point results.
Migrate simple geometry functions to use JTS library. Test updates for behavior differences: - ST_Boundary returns LINESTRING instead of MULTILINESTRING for simple polygons - ST_Buffer with infinity returns POLYGON EMPTY instead of MULTIPOLYGON EMPTY - Minor floating-point precision differences in some calculations
Migrate ST_NumPoints and related accessor functions to JTS. Test updates for behavior differences: - ST_NumPoints now counts closing vertices in polygons per OGC standard - Ring vertex ordering may differ cosmetically (same geometry)
Add JTS-compatible overloads for geometry utility methods to support incremental migration from ESRI to JTS. The ESRI versions remain for existing callers until they are converted.
Rewrite stUnion to use JTS UnaryUnionOp instead of ESRI cursors. Behavior differences: - Point-on-line union does not insert vertices - Empty inputs return empty geometry collection instead of null
- Migrate spatial join operator to JTS for intersection and containment tests - Switch GeoFunctions envelope operations to use JTS Envelope (deserializeEnvelope, ST_XMin/XMax/YMin/YMax, ST_IsEmpty)
Use Extended Well-Known Binary (EWKB) format for geometry serialization. EWKB is the standard used by PostGIS and retains the SRID (Spatial Reference System Identifier) for coordinate system information.
Note: TestEsriTable's expected values file was converted from Trino's old internal binary format to WKT. This change cannot be separated into an earlier commit because the old format's deserializer was deleted in the EWKB commit, and circular Maven dependencies prevent adding geospatial as a test dependency to trino-hive.
With ESRI removed JTS objects no longer need fully qualified names
Change the internal representation of geometry values to use JTS Geometry objects directly, avoiding unnecessary serialization cycles between function calls.
Description
Migrate the geospatial plugin from ESRI geometry-api-java to JTS (Java Topology Suite) as the core geometry library. JTS is more widely used, better maintained, and provides the foundation for upcoming Iceberg geometry type support.
Key changes:
Geometryas the native stack type instead of serialized bytesAdditional context and related issues
JTS is the de facto standard geometry library in the Java ecosystem, used by GeoTools, PostGIS, and Apache Sedona. This change aligns Trino's geospatial implementation with the broader ecosystem and enables future improvements like Iceberg geometry support.
Reviewer Guide
Commit Organization
The migration is structured to minimize risk and make review manageable:
like unclosed rings that JTS correctly rejects) from Refactors (adding
assertSpatialEqualstohandle harmless differences like vertex ordering). Any test changes in subsequent commits indicate
actual behavior differences between ESRI and JTS.
the PostGIS standard that preserves SRID.
were needed while both libraries coexisted. Please do not comment on verbose imports in earlier commits.
Sliceto JTSGeometryas the native stack type,eliminating serialization cycles between function calls.
Behavior Differences
Test changes document intentional behavior differences:
default, whereas ESRI used 24. This results in buffer polygons with fewer vertices (e.g., 32 vs 96 for
a point buffer) while maintaining standard precision.
triangle has 4 points: A-B-C-A).
LINESTRINGinstead ofMULTILINESTRINGfor simple polygons (simpler, morestandard return type).
ST_Buffer(infinity)returnsPOLYGON EMPTY(instead ofMULTIPOLYGON EMPTY).GEOMETRYCOLLECTIONinstead ofnullfor empty inputs, improvingnull-safety in downstream SQL.
ST_CentroidandST_Areadue to differentunderlying math implementations (verified via tolerance checks).
ST_Points.Release notes
(x) Release notes are required, with the following suggested text: