Skip to content

Replace ESRI geometry library with JTS in geospatial plugin#27881

Merged
dain merged 15 commits intomasterfrom
user/dain/geo-jts
Mar 25, 2026
Merged

Replace ESRI geometry library with JTS in geospatial plugin#27881
dain merged 15 commits intomasterfrom
user/dain/geo-jts

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Jan 8, 2026

Description

Migrate the geospatial plugin from ESRI geometry-api-java to JTS (Java Topology Suite) as the core geometry library. JTS is more widely used, better maintained, and provides the foundation for upcoming Iceberg geometry type support.

Key changes:

  • Use JTS Geometry as the native stack type instead of serialized bytes
  • Replace custom geometry serialization with standard EWKB format
  • Convert all geometry functions (ST_*, Bing tiles, aggregations, spatial joins)
  • Simplify Hadoop ESRI JSON reader and PostgreSQL connector geometry handling

Additional context and related issues

JTS is the de facto standard geometry library in the Java ecosystem, used by GeoTools, PostGIS, and Apache Sedona. This change aligns Trino's geospatial implementation with the broader ecosystem and enables future improvements like Iceberg geometry support.

Reviewer Guide

Commit Organization

The migration is structured to minimize risk and make review manageable:

  1. Test infrastructure first: Initial commits separate Repairs (fixing invalid OGC inputs
    like unclosed rings that JTS correctly rejects) from Refactors (adding assertSpatialEquals to
    handle harmless differences like vertex ordering). Any test changes in subsequent commits indicate
    actual behavior differences between ESRI and JTS.
  2. Incremental function conversion: Functions are ported in logical groups:
    • Basic geometry functions (ST_Boundary, ST_Buffer, etc.)
    • Accessor functions (ST_NumPoints, etc.)
    • EncodedPolyline, BingTile, aggregation functions
    • ST_Union and remaining GeoFunctions
    • Spatial join and envelope handling
  3. Serialization change: Replaces custom binary format with EWKB (Extended Well-Known Binary),
    the PostGIS standard that preserves SRID.
  4. Hadoop reader: Converted separately due to Maven dependency constraints.
  5. Import cleanup: Using qualified imports for JTS objects removes the fully-qualified names that
    were needed while both libraries coexisted. Please do not comment on verbose imports in earlier commits.
  6. Stack type change: Final commit switches from Slice to JTS Geometry as the native stack type,
    eliminating serialization cycles between function calls.

Behavior Differences

Test changes document intentional behavior differences:

  • ST_Buffer approximation: JTS uses 8 segments per quadrant (matching PostGIS/GEOS standards) by
    default, whereas ESRI used 24. This results in buffer polygons with fewer vertices (e.g., 32 vs 96 for
    a point buffer) while maintaining standard precision.
  • ST_NumPoints: Now counts the closing vertex of polygons, complying with the OGC standard (where a
    triangle has 4 points: A-B-C-A).
  • ST_Boundary: Returns LINESTRING instead of MULTILINESTRING for simple polygons (simpler, more
    standard return type).
  • Empty Geometries: ST_Buffer(infinity) returns POLYGON EMPTY (instead of MULTIPOLYGON EMPTY).
  • ST_Union: Returns an empty GEOMETRYCOLLECTION instead of null for empty inputs, improving
    null-safety in downstream SQL.
  • Floating Point: Minor precision differences in ST_Centroid and ST_Area due to different
    underlying math implementations (verified via tolerance checks).
  • Vertex Ordering: Polygons are now normalized to standard winding (e.g., Clockwise for exterior rings) during serialization, which may change the order of vertices returned by ST_Points.

Release notes

(x) Release notes are required, with the following suggested text:

## Geospatial
  * Replace ESRI geometry library with JTS for improved ecosystem compatibility. ({issue}`issuenumber`)
  * WKT parsing is now stricter per OGC standards and rejects previously accepted invalid syntax. ({issue}`issuenumber`)
  * `ST_Union` edge case changes: empty inputs return empty geometry collection instead of null, and point-on-line unions no longer insert vertices at intersection points. ({issue}`issuenumber`)

@cla-bot cla-bot Bot added the cla-signed label Jan 8, 2026
@github-actions github-actions Bot added the postgresql PostgreSQL connector label Jan 8, 2026
@dain dain force-pushed the user/dain/geo-jts branch from b4e8c12 to aad9700 Compare January 9, 2026 01:13
@github-actions github-actions Bot added the hive Hive connector label Jan 9, 2026
@dain dain force-pushed the user/dain/geo-jts branch 8 times, most recently from b3494aa to 9bec6d6 Compare January 10, 2026 04:19
@dain dain marked this pull request as ready for review January 10, 2026 05:57
@dain dain force-pushed the user/dain/geo-jts branch from 9bec6d6 to 45cd417 Compare January 11, 2026 00:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 2, 2026

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Feb 2, 2026
@martint
Copy link
Copy Markdown
Member

martint commented Feb 4, 2026

Fixes #10179

@github-actions github-actions Bot removed the stale label Feb 4, 2026
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Feb 25, 2026
@dain dain added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Feb 25, 2026
@gertjanal
Copy link
Copy Markdown
Contributor

Slight merge conflict with #28592 😅

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Trino’s geospatial stack from the ESRI geometry library to JTS, standardizing on JTS Geometry in the execution stack and EWKB for storage/connector interchange (aligning with PostGIS conventions and enabling future Iceberg geometry support).

Changes:

  • Remove ESRI dependencies and switch core geospatial types/functions/aggregations to JTS Geometry.
  • Replace legacy geometry serialization with EWKB via JtsGeometrySerde and update connectors (e.g., PostgreSQL) accordingly.
  • Update tests/benchmarks for stricter OGC validity and for geometry equality comparisons that tolerate harmless differences (e.g., vertex ordering).

Reviewed changes

Copilot reviewed 55 out of 57 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pom.xml Drops ESRI geometry dependency from root build.
plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java Switch PostgreSQL geometry mapping to JTS + EWKB serde.
plugin/trino-postgresql/pom.xml Swap to trino-geospatial-toolkit, add JTS, keep trino-geospatial for tests.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java Reworks ESRI Hive test to validate WKB→WKT via JTS.
plugin/trino-hive/pom.xml Adds dependency-checker ignores for transitive jts-core used in tests.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.java Updates union aggregation assertions and adds SRID mismatch coverage.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java Updates state serializer tests from OGCGeometry to JTS.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java Updates state factory tests from OGCGeometry to JTS.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.java Switches aggregation test inputs/equals logic to JTS serde + topo equality.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java Updates spherical geography tests for JTS serde + spatial equality helper.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java Updates spatial partitioning internal agg test to JTS geometry/envelopes.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.java Updates spatial join operator test constants to JTS Geometry.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java Switches assertions to spatial equality helper.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java Updates encoded polyline tests for stricter validity expectations.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java Updates Bing tile tests to use spatial equality helper where needed.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java Adds shared spatial equality test helpers.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java Updates benchmark stack types to JTS Geometry.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java Updates benchmark stack types to JTS Geometry.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java Updates benchmark to serialize JTS output back to slices.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java Updates benchmark to JTS geometry + EWKB serde.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java Updates benchmark stack types and polygon closure.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java Updates benchmark stack types to JTS Geometry.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java Updates benchmark to JTS + topo equality for verification.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java Switches aggregation implementation to JTS + SRID validation + safe union.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java Switches state serde to JtsGeometrySerde.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java Switches state storage to JTS Geometry and updates size estimation.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java Changes state interface from OGCGeometry to JTS Geometry.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java Switches convex hull aggregation to JTS + SRID validation + safe union.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java Converts to new AbstractGeometryType base with JTS stack type.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java Updates envelope size estimation to Rectangle-based approach.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java Switches internal agg input from Slice envelope to JTS geometry envelope.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java Updates signature to take JTS Geometry (still unsupported entrypoint).
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java Converts to new AbstractGeometryType base with JTS stack type.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java Ports encoded polyline funcs to JTS geometry types.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java Ports Bing tile funcs to JTS geometry/envelope handling.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java New base type: JTS stack type, EWKB storage, binary operators.
plugin/trino-geospatial/pom.xml Drops ESRI geometry dependency from geospatial plugin.
lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.java Updates ESRI JSON deserializer tests to validate EWKB via JTS.
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java Adds JTS-based ESRI JSON geometry parsing.
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java Switches ESRI JSON reader output to EWKB via JtsGeometrySerde.
lib/trino-hive-formats/pom.xml Replaces ESRI dep with trino-geospatial-toolkit + jts-core.
lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java Updates serialization tests to JTS/EWKB expectations.
lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java Removes ESRI-based serde benchmark.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java Replaces custom binary format with EWKB-based serde + SRID utilities.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java Removes legacy serialization type enum (no longer needed).
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java Removes legacy ESRI-based serialization implementation.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java Adds ESRI Shape binary parser emitting JTS geometries.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java Ports helpers (contains/disjoint/union/rect checks) to JTS.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java Maps JTS LinearRing to internal LINE_STRING type.
lib/trino-geospatial-toolkit/pom.xml Drops ESRI dependency and removes now-unneeded jakarta.annotation optional dep.
core/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.java Switches spatial predicate interface from OGCGeometry to JTS Geometry.
core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.java Switches spatial index build path to JTS geometry + envelopes.
core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java Switches index contents to JTS geometry + envelope queries.
core/trino-main/pom.xml Drops ESRI geometry dependency from core engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 27 to 96
@@ -40,21 +44,21 @@ public GeometryState createGroupedState()
public static class GroupedGeometryState
implements GeometryState, GroupedAccumulatorState
{
private final ObjectBigArray<OGCGeometry> geometries = new ObjectBigArray<>();
private final ObjectBigArray<Geometry> geometries = new ObjectBigArray<>();

private int groupId;
private long size;

@Override
public OGCGeometry getGeometry()
public Geometry getGeometry()
{
return geometries.get(groupId);
}

@Override
public void setGeometry(OGCGeometry geometry)
public void setGeometry(Geometry geometry)
{
OGCGeometry previousValue = this.geometries.getAndSet(groupId, geometry);
Geometry previousValue = this.geometries.getAndSet(groupId, geometry);
size -= getGeometryMemorySize(previousValue);
size += getGeometryMemorySize(geometry);
}
@@ -79,34 +83,31 @@ public final void setGroupId(int groupId)
}

// Do a best-effort attempt to estimate the memory size
private static long getGeometryMemorySize(OGCGeometry geometry)
private static long getGeometryMemorySize(Geometry geometry)
{
if (geometry == null) {
return 0;
}
// Due to the following issue:
// https://github.com/Esri/geometry-api-java/issues/192
// We must check if the geometry is empty before calculating its size. Once the issue is resolved
// and we bring the fix into our codebase, we can remove this check.
if (geometry.isEmpty()) {
return OGC_GEOMETRY_BASE_INSTANCE_SIZE;
return GEOMETRY_BASE_INSTANCE_SIZE;
}
return geometry.estimateMemorySize();
// Estimate: base size + size per coordinate
return GEOMETRY_BASE_INSTANCE_SIZE + (long) geometry.getNumPoints() * COORDINATE_SIZE;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the size calculation to consider the exact type.

Comment on lines +38 to +47
public static boolean spatiallyEquals(String wkt1, String wkt2)
{
var geom1 = stGeometryFromText(utf8Slice(wkt1));
var geom2 = stGeometryFromText(utf8Slice(wkt2));

if (stIsEmpty(geom1) && stIsEmpty(geom2)) {
return true;
}
return stEquals(geom1, geom2);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special handling was added for empty geometries to show that they return the expected type. This did show some tests that had the wrong expected type, where wrong is defined as not the default behavior for JTS.

Comment on lines +63 to +69
var expectedGeometry = stGeometryFromText(utf8Slice(expectedWkt));
var actualGeometry = stGeometryFromText(utf8Slice(actualWkt));
if (stIsEmpty(expectedGeometry)) {
assertThat(stIsEmpty(actualGeometry))
.withFailMessage("Expected empty geometry, but got: %s", actualWkt)
.isTrue();
return;
Comment on lines +95 to +97
else if (x != null) {
// Empty point (x is present but null or NaN)
return GEOMETRY_FACTORY.createPoint();
if ("NaN".equalsIgnoreCase(value)) {
return null;
}
return Double.parseDouble(value);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was rewritten to follow the original Hive code.

Comment on lines +227 to +262
// Check if we have multiple exterior rings (need MultiPolygon)
List<Polygon> polygons = new ArrayList<>();
LinearRing currentShell = null;
List<LinearRing> currentHoles = new ArrayList<>();

for (Coordinate[] ring : rings) {
LinearRing linearRing = GEOMETRY_FACTORY.createLinearRing(ring);
boolean isClockwise = isClockwise(ring);

if (isClockwise) {
// Clockwise = exterior ring in ESRI format
// Save previous polygon if exists
if (currentShell != null) {
polygons.add(GEOMETRY_FACTORY.createPolygon(currentShell, currentHoles.toArray(new LinearRing[0])));
currentHoles.clear();
}
currentShell = linearRing;
}
else {
// Counter-clockwise = interior ring (hole) in ESRI format
if (currentShell != null) {
currentHoles.add(linearRing);
}
}
}

// Don't forget the last polygon
if (currentShell != null) {
polygons.add(GEOMETRY_FACTORY.createPolygon(currentShell, currentHoles.toArray(new LinearRing[0])));
}

if (polygons.size() == 1) {
return polygons.getFirst();
}
return GEOMETRY_FACTORY.createMultiPolygon(polygons.toArray(new Polygon[0]));
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was rewritten to follow the original Hive code. Additionally, I added test coverage for this because none of this was actually tested.

@dain dain force-pushed the user/dain/geo-jts branch 3 times, most recently from c2652b2 to 2f14401 Compare March 20, 2026 03:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

The PR migrates the geospatial stack from ESRI OGC types to JTS Geometry across the codebase. ESRI dependencies and custom ESRI serde/serialization enums were removed; new JTS-based serde, utilities, and readers were added (e.g., JtsGeometrySerde, GeometryUtils, EsriShapeReader). Spatial indexes, plugin types, functions, aggregations, and tests were updated to use JTS Geometry and EWKB/WKB serialization. Maven POMs were adjusted to remove ESRI artifacts and add the trino-geospatial-toolkit and JTS artifacts.


Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (6)
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java (2)

262-267: Silently dropping orphan holes may mask data issues.

If a counter-clockwise ring (hole) appears before any clockwise ring (shell), it's silently ignored. While ESRI spec mandates shells precede their holes, malformed data would lose geometry without any indication.

Consider logging a warning or throwing when encountering a hole without a preceding shell:

🛡️ Suggested defensive logging
             else {
                 // Counter-clockwise = interior ring (hole) in ESRI format
                 if (currentShell != null) {
                     currentHoles.add(ring);
                 }
+                else {
+                    // Log warning: hole ring found before any exterior ring
+                }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`
around lines 262 - 267, The code currently drops counter-clockwise rings when
currentShell is null (orphan holes) in EsriShapeReader; update the branch
handling holes (the else where currentShell is checked) to detect currentShell
== null and either log a warning via the class logger (including ring
index/shape id/context from available variables) or throw a descriptive
exception instead of silently ignoring; ensure this behavior is applied where
currentHoles.add(ring) is invoked and include enough context (e.g., ring,
shapeId, stream offset) in the message to aid debugging.

281-292: Javadoc describes wrong sign convention for this formula.

The comment states "Positive signed area = counter-clockwise" but the trapezoid formula used here has the opposite sign: positive sum indicates clockwise orientation. The code correctly returns sum > 0 for clockwise, but the documentation is misleading.

📝 Suggested Javadoc correction
     /**
      * Determines if a ring is clockwise using the shoelace formula.
-     * Positive signed area = counter-clockwise, negative signed area = clockwise.
+     * With the edge-sum variant used here, positive sum indicates clockwise,
+     * negative sum indicates counter-clockwise.
      */
     private static boolean isClockwise(Coordinate[] ring)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`
around lines 281 - 292, The Javadoc for isClockwise currently states the wrong
sign convention; update the comment for the isClockwise(Coordinate[] ring)
method to reflect the actual formula used (the implemented shoelace/trapezoid
sum yields positive for clockwise and negative for counter-clockwise) so the doc
matches the code's return condition (sum > 0). Reference the isClockwise method
and the local variable sum in your updated comment to make the correct sign
convention explicit.
plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java (1)

117-123: Consider catching a more specific exception type.

Catching generic Exception (line 120) is overly broad. The JtsGeometrySerde.deserialize() method throws IllegalArgumentException for invalid WKB, and Geometry.toText() shouldn't throw for valid geometries. Consider catching IllegalArgumentException or RuntimeException to avoid masking unexpected errors.

♻️ Suggested change
         try {
             return JtsGeometrySerde.deserialize(getSlice(block, position)).toText();
         }
-        catch (Exception e) {
+        catch (IllegalArgumentException e) {
             return "<invalid geometry>";
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java`
around lines 117 - 123, The current catch in the method calling
JtsGeometrySerde.deserialize(getSlice(block, position)).toText() is too broad;
replace the catch of Exception with a more specific catch for
IllegalArgumentException (the error type thrown for invalid WKB) so only
invalid-geometry input is mapped to "<invalid geometry>" and other unexpected
errors still bubble up; update the catch block that surrounds
JtsGeometrySerde.deserialize(...) and .toText() to catch
IllegalArgumentException (or IllegalArgumentException and RuntimeException if
you want to be slightly broader) and return "<invalid geometry>" there.
plugin/trino-hive/pom.xml (1)

580-586: Transitive dependency suppression is acceptable but consider explicit declaration.

The configuration correctly suppresses maven-dependency-plugin warnings for jts-core being used via the transitive dependency from trino-hive-formats.

For improved maintainability, consider explicitly declaring jts-core as a <scope>test</scope> dependency instead of relying on the transitive dependency. This would make the dependency explicit and prevent potential breakage if trino-hive-formats changes its dependencies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-hive/pom.xml` around lines 580 - 586, The pom currently
suppresses warnings for the transitive dependency org.locationtech.jts:jts-core
(referenced in ignoredNonTestScopedDependency and
ignoredUsedUndeclaredDependency and used by TestEsriTable via
trino-hive-formats); instead add an explicit test-scoped dependency declaration
for org.locationtech.jts:jts-core (scope test) in the project POM so the test
usage is declared directly (you can keep the suppression entries if desired, but
ensure org.locationtech.jts:jts-core appears under <dependencies> with
<scope>test</scope> to avoid relying on trino-hive-formats being the transitive
provider).
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java (1)

24-25: Consider thread-safety of shared WKTReader.

JTS WKTReader is not thread-safe. While JUnit Jupiter runs test methods sequentially by default within the same class, a shared static instance could cause issues if test parallelization is enabled in the future. Consider creating a new instance per test method or using a ThreadLocal.

♻️ Safer alternative
-    private static final WKTReader WKT_READER = new WKTReader();
+    private final WKTReader wktReader = new WKTReader();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java`
around lines 24 - 25, The shared static WKT_READER in TestGeometryStateFactory
is unsafe because JTS WKTReader is not thread-safe; replace the static instance
with a thread-safe approach by creating a new WKTReader per test (e.g.,
construct a WKTReader inside each test method or inside a `@BeforeEach` method) or
change it to a ThreadLocal<WKTReader> (referencing WKT_READER) so each
thread/test gets its own instance; update all usages in TestGeometryStateFactory
to use the new per-test or ThreadLocal instance.
lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java (1)

187-199: Add a non-zero SRID round-trip case here.

All current fixtures come from plain WKT, so every geometry in this suite has SRID 0. That leaves the new EWKB SRID handling untested in the main serialization coverage.

Minimal coverage addition
+    `@Test`
+    public void testSridRoundTrip()
+    {
+        Geometry geometry = createJtsGeometry("POINT (1 2)");
+        geometry.setSRID(4326);
+
+        Geometry deserialized = deserialize(serialize(geometry));
+        assertThat(deserialized.getSRID()).isEqualTo(4326);
+        assertThat(deserialized.norm()).isEqualTo(geometry.norm());
+    }

Also applies to: 216-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java`
around lines 187 - 199, Add a test case that verifies non-zero SRID round-trip:
create a JTS Geometry via createJtsGeometry(...), call geometry.setSRID(e.g.
4326) before serializing with serialize(...), then deserialize(...) and assert
both geometry.norm() and deserialized.getSRID() (and/or geometry.getSRID())
match the non-zero SRID; apply the same pattern to the other similar helper/test
near the geometryFromText usage (lines around the second occurrence) so EWKB
SRID handling is exercised for both serialize/deserialize paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java`:
- Around line 96-100: The current getEstimatedMemorySizeInBytes() in
PagesRTreeIndex underestimates memory by only using geometry.getNumPoints();
update it to account for full retained size by either (A) storing and returning
the serialized Slice size captured during deserialization in buildRTree() (reuse
the existing serialized slice retained at build time and return INSTANCE_SIZE +
serializedSlice.length()), or (B) implement a deep retained-size traversal of
the JTS object graph starting from geometry (walk Geometry, LinearRing,
CoordinateSequence, component Geometries, and cached Envelope) and sum
object+array sizes, then return INSTANCE_SIZE + deepSize; ensure
PagesSpatialIndexSupplier.getEstimatedSize() reads this corrected value so
operator accounting reflects multipart/composite geometries accurately.

In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java`:
- Around line 98-108: The contains() method currently unwraps all
GeometryCollection instances and returns true if any single child contains the
argument, which breaks JTS semantics for
MultiPolygon/MultiLineString/MultiPoint; change the logic in
GeometryUtils.contains so that it only iterates children for pure, heterogeneous
GeometryCollection instances, but for instances of MultiPolygon,
MultiLineString, and MultiPoint delegate to geometry.contains(tileGeometry)
(i.e., do not short-circuit by checking individual components for these
multi-types) so the union semantics from JTS are preserved.

In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 54-63: The current external-input validation in
JtsGeometrySerde.deserialize (and the other similar methods noted) uses
verify(...) which throws VerifyException; change those checks to throw
IllegalArgumentException instead to be consistent with parse error
handling—replace calls like verify(shape.length() > 0, "shape is empty") with an
explicit if-check that throws new IllegalArgumentException("shape is empty")
(and similarly for null checks or other verify usages) so all invalid WKB/EWKB
input paths use IllegalArgumentException (matching the existing catch block that
wraps ParseException).
- Around line 97-114: The code in JtsGeometrySerde that decodes the WKB type
reads the endianness byte into variable endianness and then falls back to the
little-endian branch for any value != 0, which lets invalid EWKB through; add an
explicit validation immediately after reading endianness (shape.getByte(0)) to
check that the value is either 0 or 1 and throw a descriptive exception (e.g.,
IllegalArgumentException or a Trino-specific parsing error) if it is not, so the
method (where wkbType is computed) rejects invalid byte-order markers rather
than silently interpreting them as little-endian.

In
`@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java`:
- Around line 137-143: The coordinate-parsing branch in EsriJsonParser currently
coerces strings and booleans into doubles, causing silent data corruption;
update the method that reads x/y coordinates to accept only numeric tokens
(VALUE_NUMBER_FLOAT or VALUE_NUMBER_INT) and return parser.getDoubleValue() (or
equivalent) for those, treat VALUE_STRING by explicitly parsing the string with
Double.parseDouble and throwing a descriptive parse exception if it fails, and
refuse JsonToken.VALUE_TRUE / JsonToken.VALUE_FALSE (throw an error) rather than
converting them; reference the tokens VALUE_NUMBER_FLOAT, VALUE_NUMBER_INT,
VALUE_STRING, JsonToken.VALUE_TRUE, JsonToken.VALUE_FALSE and the EsriJsonParser
coordinate read method when making the change.

In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java`:
- Around line 50-54: The SRID validation must run before taking the
empty-geometry fast path; call validateAndGetSrid(...) to reconcile and validate
SRIDs whenever either the input geometry or the accumulator geometry exists
(even if one is empty) so conflicting non-zero SRIDs are rejected; update
ConvexHullAggregation to compute srid = validateAndGetSrid(state.getGeometry(),
geometry) (or equivalent) before any early-return/empty-geometry branches, then
proceed to safeUnion(...).convexHull() and setSRID(srid) in both the non-empty
and empty-input merge/update paths (the blocks around the current use of
validateAndGetSrid, safeUnion, state.getGeometry(), and geometry.isEmpty()).

In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java`:
- Around line 47-51: GeometryUnionAgg currently skips calling validateAndGetSrid
when one side is empty, allowing mismatched non-zero SRIDs to be silently
accepted; update both branches where one geometry is empty (the blocks around
safeUnion/state.setGeometry) to always call
validateAndGetSrid(state.getGeometry(), geometry) (and the symmetric call in the
other branch) before creating/setting the result so the SRID is reconciled, then
set the returned SRID on the resulting Geometry via result.setSRID(srid) (or on
the empty geometry if safeUnion isn't invoked) and finally call
state.setGeometry(result).

In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java`:
- Around line 94-100: The decode branch that currently returns an empty
LineString when coordinates.size() < 2 should instead reject the unsupported
single-point case by throwing a clear runtime error; replace the conditional
that returns GEOMETRY_FACTORY.createLineString() with logic that throws an
appropriate Trino-level exception (with a descriptive message) when
coordinates.size() == 1 so the point is not silently dropped. Apply the same
change to the analogous block later (the other coordinates.size() < 2 branch
around lines 106-112) so both single-point decodes fail explicitly rather than
producing LINESTRING EMPTY; locate these in EncodedPolylineFunctions where
CoordinateArraySequence, LineString, and GEOMETRY_FACTORY are used.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java`:
- Around line 67-75: The benchmarks deserializeSimpleGeometry and
deserializeEnvelopeSimpleGeometry (and the same pattern at lines 115-123) are
implicitly measuring serialize(...) + allocation + deserialize(...); precompute
the serialized EWKB once in a `@Setup` method on BenchmarkData (e.g., add byte[]
fields like serializedSimpleGeometry, serializedOtherCase and fill them by
calling serialize(data.simpleGeometry) in `@Setup`) and change these `@Benchmark`
methods to call deserialize(serializedSimpleGeometry) and
deserializeEnvelope(serializedSimpleGeometry) so they only measure the
deserialize path; apply the same change for the other similar benchmark methods
referenced.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java`:
- Around line 81-105: The array helper assertSpatialArrayEquals currently passes
the 'NULL' sentinel into stGeometryFromText and skips the empty-geometry type
check present in assertSpatialEquals; update assertSpatialArrayEquals so for
each element it mirrors assertSpatialEquals: first treat the transformed actual
string that equals 'NULL' (or actual null) as a null/empty geometry and assert
accordingly (avoiding calling stGeometryFromText on the sentinel), and for
non-sentinel values perform the empty-geometry type check exactly as
assertSpatialEquals does before calling stGeometryFromText(utf8Slice(...)) and
comparing with stEquals; ensure you reference the same predicates/handling used
in assertSpatialEquals so array and scalar behaviors match.

---

Nitpick comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`:
- Around line 262-267: The code currently drops counter-clockwise rings when
currentShell is null (orphan holes) in EsriShapeReader; update the branch
handling holes (the else where currentShell is checked) to detect currentShell
== null and either log a warning via the class logger (including ring
index/shape id/context from available variables) or throw a descriptive
exception instead of silently ignoring; ensure this behavior is applied where
currentHoles.add(ring) is invoked and include enough context (e.g., ring,
shapeId, stream offset) in the message to aid debugging.
- Around line 281-292: The Javadoc for isClockwise currently states the wrong
sign convention; update the comment for the isClockwise(Coordinate[] ring)
method to reflect the actual formula used (the implemented shoelace/trapezoid
sum yields positive for clockwise and negative for counter-clockwise) so the doc
matches the code's return condition (sum > 0). Reference the isClockwise method
and the local variable sum in your updated comment to make the correct sign
convention explicit.

In
`@lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java`:
- Around line 187-199: Add a test case that verifies non-zero SRID round-trip:
create a JTS Geometry via createJtsGeometry(...), call geometry.setSRID(e.g.
4326) before serializing with serialize(...), then deserialize(...) and assert
both geometry.norm() and deserialized.getSRID() (and/or geometry.getSRID())
match the non-zero SRID; apply the same pattern to the other similar helper/test
near the geometryFromText usage (lines around the second occurrence) so EWKB
SRID handling is exercised for both serialize/deserialize paths.

In
`@plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java`:
- Around line 117-123: The current catch in the method calling
JtsGeometrySerde.deserialize(getSlice(block, position)).toText() is too broad;
replace the catch of Exception with a more specific catch for
IllegalArgumentException (the error type thrown for invalid WKB) so only
invalid-geometry input is mapped to "<invalid geometry>" and other unexpected
errors still bubble up; update the catch block that surrounds
JtsGeometrySerde.deserialize(...) and .toText() to catch
IllegalArgumentException (or IllegalArgumentException and RuntimeException if
you want to be slightly broader) and return "<invalid geometry>" there.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java`:
- Around line 24-25: The shared static WKT_READER in TestGeometryStateFactory is
unsafe because JTS WKTReader is not thread-safe; replace the static instance
with a thread-safe approach by creating a new WKTReader per test (e.g.,
construct a WKTReader inside each test method or inside a `@BeforeEach` method) or
change it to a ThreadLocal<WKTReader> (referencing WKT_READER) so each
thread/test gets its own instance; update all usages in TestGeometryStateFactory
to use the new per-test or ThreadLocal instance.

In `@plugin/trino-hive/pom.xml`:
- Around line 580-586: The pom currently suppresses warnings for the transitive
dependency org.locationtech.jts:jts-core (referenced in
ignoredNonTestScopedDependency and ignoredUsedUndeclaredDependency and used by
TestEsriTable via trino-hive-formats); instead add an explicit test-scoped
dependency declaration for org.locationtech.jts:jts-core (scope test) in the
project POM so the test usage is declared directly (you can keep the suppression
entries if desired, but ensure org.locationtech.jts:jts-core appears under
<dependencies> with <scope>test</scope> to avoid relying on trino-hive-formats
being the transitive provider).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d5f9042b-8dd4-4604-a02f-3dd402b8df8c

📥 Commits

Reviewing files that changed from the base of the PR and between 40f81aa and 211d506.

📒 Files selected for processing (57)
  • core/trino-main/pom.xml
  • core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java
  • core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.java
  • core/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.java
  • lib/trino-geospatial-toolkit/pom.xml
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java
  • lib/trino-hive-formats/pom.xml
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.java
  • plugin/trino-geospatial/pom.xml
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeoFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.java
  • plugin/trino-hive/pom.xml
  • plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java
  • plugin/trino-hive/src/test/resources/esri/counties_expected.txt
  • plugin/trino-postgresql/pom.xml
  • plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java
  • pom.xml
💤 Files with no reviewable changes (7)
  • core/trino-main/pom.xml
  • plugin/trino-geospatial/pom.xml
  • lib/trino-geospatial-toolkit/pom.xml
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
  • pom.xml
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java

Comment on lines 96 to 100
public long getEstimatedMemorySizeInBytes()
{
return INSTANCE_SIZE + ogcGeometry.estimateMemorySize();
// Estimate based on number of coordinates (each coordinate is 2-3 doubles)
return INSTANCE_SIZE + (long) geometry.getNumPoints() * 24;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -type f -name "PagesRTreeIndex.java" 2>/dev/null | head -5

Repository: trinodb/trino

Length of output: 39


🏁 Script executed:

# First, let's find and examine the file
find . -name "PagesRTreeIndex.java" -type f 2>/dev/null

Repository: trinodb/trino

Length of output: 127


🏁 Script executed:

# Let's search for the file using git
git ls-files | grep -i "PagesRTreeIndex"

Repository: trinodb/trino

Length of output: 125


🏁 Script executed:

wc -l core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java

Repository: trinodb/trino

Length of output: 129


🏁 Script executed:

cat -n core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java

Repository: trinodb/trino

Length of output: 11294


🏁 Script executed:

# Search for other memory size estimates for geometries in the codebase
rg -i "geometry.*memory\|memory.*geometry" --type java | head -20

Repository: trinodb/trino

Length of output: 39


🏁 Script executed:

# Search for JtsGeometrySerde which is imported
find . -name "*JtsGeometrySerde*" -type f 2>/dev/null

Repository: trinodb/trino

Length of output: 251


🏁 Script executed:

# Look for getNumPoints usage in the codebase
rg "getNumPoints" --type java

Repository: trinodb/trino

Length of output: 891


🏁 Script executed:

wc -l ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java

Repository: trinodb/trino

Length of output: 153


🏁 Script executed:

cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | head -150

Repository: trinodb/trino

Length of output: 6703


🏁 Script executed:

cat -n ./lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java | tail -50

Repository: trinodb/trino

Length of output: 1910


🏁 Script executed:

# Check if there's any pattern for retaining serialized slice sizes in the codebase
rg "slice.*length\|getSliceLength\|estimatedSize" --type java | grep -i geometry | head -10

Repository: trinodb/trino

Length of output: 39


🏁 Script executed:

# Look at how geometries are stored in the spatial index
rg "STRtree\|rtree" --type java -B 2 -A 2 | head -40

Repository: trinodb/trino

Length of output: 39


🌐 Web query:

JTS Geometry getNumPoints() what does it return vertices coordinate points

💡 Result:

In JTS, Geometry.getNumPoints() returns the count of the geometry’s vertices—i.e., the total number of coordinate positions stored by the geometry (not the number of distinct/unique points). For composite geometries (e.g., Multi*, GeometryCollection), this is effectively the sum of the vertex counts of the components. [1]

Implications:

  • Point1 (or 0 if empty).
  • LineString → number of coordinates in the line.
  • Polygon → counts vertices in the shell and holes; ring-closure coordinates (where the last coordinate equals the first) are included because they are vertices in the coordinate sequence. [1]

Source: JTS Geometry Javadoc: “Returns the count of this Geometry’s vertices” / “the number of vertices in this Geometry.” [1]


🌐 Web query:

LocationTech JTS Geometry memory object graph retained size structure

💡 Result:

In JTS (LocationTech) the retained size you see in a heap/object-graph tool for a Geometry is driven mostly by (a) the geometry topology objects (Polygon → rings, collections → components) and (b) the coordinate storage (CoordinateSequence implementation), plus a few always-present base fields.

1) Base Geometry object graph (common to all)

Geometry holds: an Envelope field (protected Envelope envelope), a GeometryFactory reference (protected final GeometryFactory factory), an int SRID, and optional userData. [1]

So the retained graph typically starts:
GeometryEnvelope (may be null until computed) + GeometryFactory + userData.

2) Concrete geometry topology objects

  • Point: Point stores exactly one CoordinateSequence reference: private CoordinateSequence coordinates; [2]
    Graph: PointCoordinateSequence → underlying coordinate storage.
  • LineString / LinearRing: LineString stores protected CoordinateSequence points; [3]
    Graph: LineStringCoordinateSequence → underlying coordinate storage.
  • Polygon: Polygon stores protected LinearRing shell and protected LinearRing[] holes [4].
    Graph: Polygonshell (LinearRing) → (LineString.points) → CoordinateSequence → storage, and similarly for each hole.

3) Coordinate storage is usually the biggest driver

JTS uses CoordinateSequence to hold vertices. Two common implementations have very different object graphs:

A) CoordinateArraySequence (default)
Backed by a Coordinate[] coordinates array. [5]
Graph cost: one object for the sequence + one Java array + one Coordinate object per vertex.

B) PackedCoordinateSequence (memory-lean for many vertices)
Backed by a primitive array (double[] coords in PackedCoordinateSequence.Double, or float[] in .Float). [6]
Graph cost: one object for the sequence + one primitive array (much less per-vertex overhead).

Important retained-size detail: PackedCoordinateSequence also has a transient SoftReference cache (coordRef) to a materialized Coordinate[] created by toCoordinateArray(). [6] If that cache has been populated and not cleared, your geometry’s retained graph can temporarily include the cached Coordinate[] plus all the per-vertex Coordinate objects (even though the “real” storage is packed).


Sources: Geometry base fields [1], Point coordinate sequence field [2], LineString points field [3], Polygon shell/holes fields [4], CoordinateArraySequence backing array [5], PackedCoordinateSequence packed arrays + soft cache [6].


🏁 Script executed:

# Search for usage of getEstimatedMemorySizeInBytes
rg "getEstimatedMemorySizeInBytes" --type java

Repository: trinodb/trino

Length of output: 369


🏁 Script executed:

# Check if there's memory tracking or operator memory accounting that uses this
rg "PagesRTreeIndex\|GeometryWithPosition" --type java -B 2 -A 2 | head -60

Repository: trinodb/trino

Length of output: 39


🏁 Script executed:

find . -name "PagesSpatialIndexSupplier.java" -type f 2>/dev/null

Repository: trinodb/trino

Length of output: 137


🏁 Script executed:

cat -n core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.java

Repository: trinodb/trino

Length of output: 9260


This retained-size estimate significantly underestimates JTS geometry memory, especially for composite shapes.

getNumPoints() * 24 accounts only for raw coordinate payload but misses the object graph for coordinate sequences, linear rings, component geometries, and envelope caching. For a Polygon with holes, this omits multiple LinearRing objects and their ring arrays; for MultiPolygon or GeometryCollection, it ignores all component Geometry objects. Since this feeds operator memory accounting (via PagesSpatialIndexSupplier.getEstimatedSize()), large multipart geometries will be materially under-reported and can exceed memory limits without triggering spill-over.

Either retain the serialized slice size alongside the geometry (available at deserialization in buildRTree() line 114) or use a deeper retained-size walk of the JTS object graph.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java` around
lines 96 - 100, The current getEstimatedMemorySizeInBytes() in PagesRTreeIndex
underestimates memory by only using geometry.getNumPoints(); update it to
account for full retained size by either (A) storing and returning the
serialized Slice size captured during deserialization in buildRTree() (reuse the
existing serialized slice retained at build time and return INSTANCE_SIZE +
serializedSlice.length()), or (B) implement a deep retained-size traversal of
the JTS object graph starting from geometry (walk Geometry, LinearRing,
CoordinateSequence, component Geometries, and cached Envelope) and sum
object+array sizes, then return INSTANCE_SIZE + deepSize; ensure
PagesSpatialIndexSupplier.getEstimatedSize() reads this corrected value so
operator accounting reflects multipart/composite geometries accurately.

Fix test data that was accepted by ESRI but rejected by JTS which
strictly enforces the OGC Simple Features Specification:

- Close polygon rings (first point must equal last point)
- Fix single-point LINESTRING to have two points (minimum required)
- Fix MULTILINESTRING EMPTY syntax (remove extra parentheses)
- Replace invalid MULTIPOLYGON with overlapping polygons using ST_Union
- Replace degenerate polygons in GEOMETRYCOLLECTION with valid geometries
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java (1)

114-116: Fail fast when an expected county fixture entry is missing.

Assert key presence before value comparison so failures clearly identify missing test data instead of comparing against null.

Suggested patch
+            assertThat(expectedWkt)
+                    .describedAs("Missing expected WKT fixture for county: %s", name)
+                    .containsKey(name);
             assertThat(actualWkt)
                     .describedAs("WKT for county: %s", name)
                     .isEqualTo(expectedWkt.get(name));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java`
around lines 114 - 116, The test currently compares actualWkt to
expectedWkt.get(name) which yields unclear failures when the expected fixture
key is missing; before the comparison in TestEsriTable (around variables
actualWkt, expectedWkt, name), add an assertion that the expectedWkt contains
the key (e.g., assertThat(expectedWkt).containsKey(name) or equivalent) so the
test fails fast with a clear message about the missing fixture key, then proceed
to the existing describedAs/isEqualTo comparison.
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java (1)

154-157: Consider simplifying redundant field assignments.

Since geometry is already of type Geometry, the assignment jtsGeometry = geometry creates an alias to the same object. The current setup may be intentional for benchmark naming clarity (distinguishing GeoFunctions paths from direct JTS paths), so this is a minor observation.

If simplification is desired, you could inline these or rename for consistency:

♻️ Optional simplification
-        private Geometry jtsGeometry;
-        private Point innerJtsPoint;
-        private Point outerJtsPointInEnvelope;
-        private Point outerJtsPointNotInEnvelope;
...
-            jtsGeometry = geometry;
-            innerJtsPoint = (Point) innerPoint;
-            outerJtsPointInEnvelope = (Point) outerPointInEnvelope;
-            outerJtsPointNotInEnvelope = (Point) outerPointNotInEnvelope;

And update the direct JTS benchmark methods to cast inline:

-        return data.jtsGeometry.contains(data.innerJtsPoint);
+        return data.geometry.contains((Point) data.innerPoint);

Alternatively, keep the current structure if the explicit separation improves benchmark readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java`
around lines 154 - 157, The field assignments jtsGeometry = geometry and the
Point casts (innerJtsPoint = (Point) innerPoint, outerJtsPointInEnvelope =
(Point) outerPointInEnvelope, outerJtsPointNotInEnvelope = (Point)
outerPointNotInEnvelope) are redundant aliases; either remove the intermediate
fields and use geometry/innerPoint/outerPoint... directly in the JTS benchmark
methods, or eliminate the jts assignment and perform the casts inline where
innerJtsPoint, outerJtsPointInEnvelope, and outerJtsPointNotInEnvelope are used;
update usages in the JTS benchmark methods accordingly (referencing jtsGeometry,
innerJtsPoint, outerJtsPointInEnvelope, outerJtsPointNotInEnvelope) to avoid
duplicate aliases while preserving benchmark naming clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java`:
- Around line 172-180: The loop in EsriShapeReader that builds
LineString/LinearRing parts uses startIndex/endIndex without validating ranges
(in the block creating lineStrings and the analogous block around lines
222-230), which can lead to negative partLength or ArrayIndexOutOfBounds; add
explicit guards in the part-construction loops inside EsriShapeReader: check
that 0 <= startIndex < numPoints, that endIndex is between startIndex and
numPoints, and that partLength = endIndex - startIndex > 0; if any check fails,
throw a parsing exception (or return a parse error) instead of proceeding, so
System.arraycopy and GEOMETRY_FACTORY.createLineString/createLinearRing are only
called with valid indices. Ensure the same validation is applied to the second
block mentioned (lines ~222-230).
- Around line 111-123: The code trusts numPoints/numParts from the input and
allocates arrays directly which can lead to OOM or late read failures; update
the readers (e.g., the block that reads numPoints and returns
GEOMETRY_FACTORY.createMultiPointFromCoords(coords), and the analogous blocks
around lines handling numParts) to validate counts before allocation: ensure
counts are non-negative and within a reasonable maximum (or derived from
remaining input length), throw a clear ParseException/IOException on invalid or
excessively large counts, and only then allocate Coordinate[] (or int[] for
parts) and proceed to read coordinates/part indices; apply the same validation
pattern to the other occurrences noted (the blocks at ~149-168 and ~199-218).

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java`:
- Around line 51-60: The benchmark currently measures intersection plus EWKB
serialization because the `@Benchmark` methods envelopes and geometries call
serialize(...); to isolate intersection cost, remove serialize(...) from the
timed path: have envelopes and geometries return or consume only the raw result
of stIntersection(data.envelope, data.otherEnvelope) and
stIntersection(data.geometry, data.otherGeometry) (or accept a JMH Blackhole
parameter and consume the returned geometry there), and move EWKB serialization
into a separate benchmark method that calls serialize(...) on the intersection
result; update references to the serialize and stIntersection calls accordingly.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java`:
- Around line 71-79: The tests testSphericalGeographyResult and the earlier test
only assert spatial equality of the WKT payload, so update each test that calls
computeActual(...) and spatiallyEquals(...) (e.g., uses ST_GeometryFromText and
to_spherical_geography in these methods) to also assert the returned SQL type
before checking spatial content: call the existing API you use for type checks
on the result of computeActual(...) (or inspect the
MaterializedResult/Block/Type returned by computeActual) to assert the expected
SQL type (e.g., geometry or geography) is correct, then keep the
spatiallyEquals(...) assertion for the WKT payload. Ensure you reference the
same result value used by spatiallyEquals when performing the type assertion so
both checks validate the same query output.

In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java`:
- Around line 132-133: The fixture loader in TestEsriTable uses
content.split("\n") which leaves CR on CRLF files and causes WKT mismatches;
change the line iteration to be line-ending-agnostic (e.g., use Java's
line-break-aware splitting or a BufferedReader/lines() stream) so that the loop
over content -> line and subsequent parsing into parts (the code that calls
content.split("\n") and then line.split("\t", 2) / uses parts) will strip CRs
and parse consistently across platforms.

In
`@plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java`:
- Around line 1945-1955: Replace the bind expression so PostGIS reads EWKB
(preserving SRID): in PostgreSqlClient, update the getBindExpression() method
(the block paired with the set(PreparedStatement, int, Object) method that calls
serialize(geometry)) to return "ST_GeomFromEWKB(?)" instead of
"ST_GeomFromWKB(?)" so the EWKB emitted by serialize(geometry) is correctly
interpreted.

---

Nitpick comments:
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java`:
- Around line 154-157: The field assignments jtsGeometry = geometry and the
Point casts (innerJtsPoint = (Point) innerPoint, outerJtsPointInEnvelope =
(Point) outerPointInEnvelope, outerJtsPointNotInEnvelope = (Point)
outerPointNotInEnvelope) are redundant aliases; either remove the intermediate
fields and use geometry/innerPoint/outerPoint... directly in the JTS benchmark
methods, or eliminate the jts assignment and perform the casts inline where
innerJtsPoint, outerJtsPointInEnvelope, and outerJtsPointNotInEnvelope are used;
update usages in the JTS benchmark methods accordingly (referencing jtsGeometry,
innerJtsPoint, outerJtsPointInEnvelope, outerJtsPointNotInEnvelope) to avoid
duplicate aliases while preserving benchmark naming clarity.

In `@plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java`:
- Around line 114-116: The test currently compares actualWkt to
expectedWkt.get(name) which yields unclear failures when the expected fixture
key is missing; before the comparison in TestEsriTable (around variables
actualWkt, expectedWkt, name), add an assertion that the expectedWkt contains
the key (e.g., assertThat(expectedWkt).containsKey(name) or equivalent) so the
test fails fast with a clear message about the missing fixture key, then proceed
to the existing describedAs/isEqualTo comparison.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 09043123-7516-4ed1-8d5b-c17f15247462

📥 Commits

Reviewing files that changed from the base of the PR and between 211d506 and e64fa13.

📒 Files selected for processing (59)
  • core/trino-main/pom.xml
  • core/trino-main/src/main/java/io/trino/operator/PagesRTreeIndex.java
  • core/trino-main/src/main/java/io/trino/operator/PagesSpatialIndexSupplier.java
  • core/trino-main/src/main/java/io/trino/operator/SpatialIndexBuilderOperator.java
  • lib/trino-geospatial-toolkit/pom.xml
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryUtils.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/EsriShapeReader.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/TestGeometryUtils.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java
  • lib/trino-hive-formats/pom.xml
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriJsonParser.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.java
  • plugin/trino-geospatial/pom.xml
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeoFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkEnvelopeIntersection.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTContains.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/GeoTestUtils.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestGeoSpatialQueries.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialJoinOperator.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/AbstractTestGeoAggregationFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryConvexHullGeoAggregation.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateFactory.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryUnionGeoAggregation.java
  • plugin/trino-hive/pom.xml
  • plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java
  • plugin/trino-hive/src/test/resources/esri/counties_expected.txt
  • plugin/trino-postgresql/pom.xml
  • plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java
  • pom.xml
💤 Files with no reviewable changes (7)
  • core/trino-main/pom.xml
  • plugin/trino-geospatial/pom.xml
  • lib/trino-geospatial-toolkit/pom.xml
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/BenchmarkGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerializationType.java
  • pom.xml
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/GeometrySerde.java
✅ Files skipped from review due to trivial changes (4)
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateSerializer.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SphericalGeographyType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/AbstractGeometryType.java
🚧 Files skipped from review as they are similar to previous changes (22)
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/GeometryType.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningStateFactory.java
  • lib/trino-hive-formats/pom.xml
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTEnvelope.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/aggregation/TestGeometryStateSerializer.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryState.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkGeometryToBingTiles.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestEncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTXMin.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningAggregateFunction.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/SpatialPartitioningInternalAggregateFunction.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTIntersects.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestBingTileFunctions.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryStateFactory.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/GeometryUnionAgg.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/aggregation/ConvexHullAggregation.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSpatialPartitioningInternalAggregation.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/BingTileFunctions.java
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/EncodedPolylineFunctions.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/BenchmarkSTArea.java
  • plugin/trino-geospatial/src/main/java/io/trino/plugin/geospatial/GeometryType.java

Comment thread plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestEsriTable.java Outdated
dain added 14 commits March 20, 2026 18:09
Adds assertSpatialEquals helper to TestGeoFunctions that uses
stEquals for geometry comparison. Converts testSTGeometryType
and testSTBuffer to use the new helper.

testSTBuffer was updated to use property-based assertions (ST_Envelope
and ST_Area with tolerance) instead of exact WKT coordinate matching.
This makes the tests stable across CPU architectures (ARM vs x86)
where trigonometric functions can produce slightly different
floating-point results.
Migrate simple geometry functions to use JTS library.

Test updates for behavior differences:
- ST_Boundary returns LINESTRING instead of MULTILINESTRING for simple polygons
- ST_Buffer with infinity returns POLYGON EMPTY instead of MULTIPOLYGON EMPTY
- Minor floating-point precision differences in some calculations
Migrate ST_NumPoints and related accessor functions to JTS.

Test updates for behavior differences:
- ST_NumPoints now counts closing vertices in polygons per OGC standard
- Ring vertex ordering may differ cosmetically (same geometry)
Add JTS-compatible overloads for geometry utility methods to support
incremental migration from ESRI to JTS. The ESRI versions remain for
existing callers until they are converted.
Rewrite stUnion to use JTS UnaryUnionOp instead of ESRI cursors.

Behavior differences:
- Point-on-line union does not insert vertices
- Empty inputs return empty geometry collection instead of null
- Migrate spatial join operator to JTS for intersection and
  containment tests
- Switch GeoFunctions envelope operations to use JTS Envelope
  (deserializeEnvelope, ST_XMin/XMax/YMin/YMax, ST_IsEmpty)
Use Extended Well-Known Binary (EWKB) format for geometry serialization.
EWKB is the standard used by PostGIS and retains the SRID (Spatial
Reference System Identifier) for coordinate system information.
Note: TestEsriTable's expected values file was converted from Trino's
old internal binary format to WKT. This change cannot be separated
into an earlier commit because the old format's deserializer was
deleted in the EWKB commit, and circular Maven dependencies prevent
adding geospatial as a test dependency to trino-hive.
With ESRI removed JTS objects no longer need fully qualified names
Change the internal representation of geometry values to use JTS
Geometry objects directly, avoiding unnecessary serialization cycles
between function calls.
@dain dain force-pushed the user/dain/geo-jts branch from e64fa13 to 1306f5d Compare March 21, 2026 04:31
@dain dain merged commit 0ace6d2 into master Mar 25, 2026
120 checks passed
@dain dain deleted the user/dain/geo-jts branch March 25, 2026 17:22
@github-actions github-actions Bot added this to the 481 milestone Mar 25, 2026
@ebyhr ebyhr mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector postgresql PostgreSQL connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

Development

Successfully merging this pull request may close these issues.

5 participants