Skip to content

Add Hive storage format ESRI GeoJsonSerDe using JTS#28789

Closed
gertjanal wants to merge 15 commits intotrinodb:user/dain/geo-jtsfrom
gertjanal:jts-geojson
Closed

Add Hive storage format ESRI GeoJsonSerDe using JTS#28789
gertjanal wants to merge 15 commits intotrinodb:user/dain/geo-jtsfrom
gertjanal:jts-geojson

Conversation

@gertjanal
Copy link
Copy Markdown
Contributor

Description

Support tables created with Row format com.esri.hadoop.hive.serde.GeoJsonSerDe and com.esri.json.hadoop.EnclosedGeoJsonInputFormat.

Originally started with PR #28592 but this PR is based on the JTS branch by @dain #27881

My tests work, but the destination branch has failing tests.

See

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X ) Release notes are required, with the following suggested text:

## Hive
* Add Hive Esri GeoJson support

dain added 3 commits March 20, 2026 12:08
Adds assertSpatialEquals helper to TestGeoFunctions that uses
stEquals for geometry comparison. Converts testSTGeometryType
and testSTBuffer to use the new helper.

testSTBuffer was updated to use property-based assertions (ST_Envelope
and ST_Area with tolerance) instead of exact WKT coordinate matching.
This makes the tests stable across CPU architectures (ARM vs x86)
where trigonometric functions can produce slightly different
floating-point results.
Migrate simple geometry functions to use JTS library.

Test updates for behavior differences:
- ST_Boundary returns LINESTRING instead of MULTILINESTRING for simple polygons
- ST_Buffer with infinity returns POLYGON EMPTY instead of MULTIPOLYGON EMPTY
- Minor floating-point precision differences in some calculations
Migrate ST_NumPoints and related accessor functions to JTS.

Test updates for behavior differences:
- ST_NumPoints now counts closing vertices in polygons per OGC standard
- Ring vertex ordering may differ cosmetically (same geometry)
@cla-bot cla-bot Bot added the cla-signed label Mar 20, 2026
@github-actions github-actions Bot added docs hive Hive connector postgresql PostgreSQL connector labels Mar 20, 2026
@gertjanal gertjanal changed the title Jts geojson Add Hive storage format ESRI GeoJsonSerDe using JTS Mar 20, 2026
dain added 5 commits March 20, 2026 13:47
Add JTS-compatible overloads for geometry utility methods to support
incremental migration from ESRI to JTS. The ESRI versions remain for
existing callers until they are converted.
Rewrite stUnion to use JTS UnaryUnionOp instead of ESRI cursors.

Behavior differences:
- Point-on-line union does not insert vertices
- Empty inputs return empty geometry collection instead of null
@gertjanal gertjanal force-pushed the jts-geojson branch 2 times, most recently from 3db3acf to df553d8 Compare March 20, 2026 21:03
@gertjanal gertjanal requested a review from dain March 20, 2026 21:07
@gertjanal gertjanal marked this pull request as ready for review March 20, 2026 21:07
case ESRI -> EsriJsonParser.parseGeometry(parser);
case GEO_JSON -> {

String json = mapper.writeValueAsString(mapper.readTree(parser));
Copy link
Copy Markdown
Contributor Author

@gertjanal gertjanal Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very proud of this, but the GeoJsonReader only reads Reader and String input

dain added 6 commits March 20, 2026 14:18
- Migrate spatial join operator to JTS for intersection and
  containment tests
- Switch GeoFunctions envelope operations to use JTS Envelope
  (deserializeEnvelope, ST_XMin/XMax/YMin/YMax, ST_IsEmpty)
Use Extended Well-Known Binary (EWKB) format for geometry serialization.
EWKB is the standard used by PostGIS and retains the SRID (Spatial
Reference System Identifier) for coordinate system information.
Note: TestEsriTable's expected values file was converted from Trino's
old internal binary format to WKT. This change cannot be separated
into an earlier commit because the old format's deserializer was
deleted in the EWKB commit, and circular Maven dependencies prevent
adding geospatial as a test dependency to trino-hive.
With ESRI removed JTS objects no longer need fully qualified names
Change the internal representation of geometry values to use JTS
Geometry objects directly, avoiding unnecessary serialization cycles
between function calls.
@dain dain force-pushed the user/dain/geo-jts branch from 211d506 to e64fa13 Compare March 20, 2026 22:10
@gertjanal gertjanal marked this pull request as draft March 21, 2026 00:16
@dain dain deleted the branch trinodb:user/dain/geo-jts March 25, 2026 17:22
@dain dain closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed docs hive Hive connector postgresql PostgreSQL connector

Development

Successfully merging this pull request may close these issues.

2 participants