Skip to content

Add hive support for ESRI GeoJson#28859

Merged
dain merged 1 commit intotrinodb:masterfrom
gertjanal:jts-geojson
Mar 28, 2026
Merged

Add hive support for ESRI GeoJson#28859
dain merged 1 commit intotrinodb:masterfrom
gertjanal:jts-geojson

Conversation

@gertjanal
Copy link
Copy Markdown
Contributor

Description

Support tables created with Row format com.esri.hadoop.hive.serde.GeoJsonSerDe and com.esri.json.hadoop.EnclosedGeoJsonInputFormat.

Originally started with PR #28592 but this PR is based on the JTS branch by @dain #27881

My tests work, but the destination branch has failing tests.

See

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X ) Release notes are required, with the following suggested text:

## Hive
* Add Hive Esri GeoJson support

@cla-bot cla-bot bot added the cla-signed label Mar 25, 2026
@github-actions github-actions bot added docs hive Hive connector labels Mar 25, 2026
@gertjanal gertjanal force-pushed the jts-geojson branch 2 times, most recently from 78726e7 to 2c93f3a Compare March 25, 2026 20:07
@gertjanal gertjanal requested review from dain and electrum March 26, 2026 08:01
@gertjanal gertjanal marked this pull request as ready for review March 26, 2026 08:01
@dain
Copy link
Copy Markdown
Member

dain commented Mar 26, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

This PR adds support for ESRI GeoJSON format to the Hive connector alongside the existing ESRI format. It introduces a Format enum to EsriDeserializer to distinguish between ESRI and GEO_JSON parsing modes, adds JTS library dependency for GeoJSON geometry parsing, creates a new HiveStorageFormat.ESRI_GEO_JSON variant, updates EsriPageSourceFactory to recognize GeoJSON configurations, and includes new test classes and test resources for validating GeoJSON deserialization and reading behavior. The format is marked as read-only in test utilities.

Possibly related PRs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.1)
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java (1)

106-107: Consider making fields final for immutability.

The reader and mapper fields are initialized once in the constructor and never reassigned. Marking them final clarifies intent and prevents accidental reassignment.

♻️ Suggested fix
     private final Format format;
-    private GeoJsonReader reader;
-    private ObjectMapper mapper;
+    private final GeoJsonReader reader;
+    private final ObjectMapper mapper;

Then update the constructor to always assign (use null for ESRI format):

     public EsriDeserializer(List<Column> columns, Format format)
     {
         this.columns = ImmutableList.copyOf(requireNonNull(columns, "columns is null"));
         this.types = columns.stream()
                 .map(Column::type)
                 .collect(toImmutableList());
         this.fieldWritten = new boolean[columns.size()];
         this.format = format;

         if (format == GEO_JSON) {
-            reader = new GeoJsonReader();
-            mapper = new ObjectMapper();
+            this.reader = new GeoJsonReader();
+            this.mapper = new ObjectMapper();
+        }
+        else {
+            this.reader = null;
+            this.mapper = null;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java`
around lines 106 - 107, Mark the instance fields reader and mapper in
EsriDeserializer as final to enforce immutability; update the EsriDeserializer
constructor to always assign both fields (assign null where appropriate for ESRI
format) so they are initialized exactly once and satisfy final semantics,
ensuring no subsequent reassignment occurs.
lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java (1)

92-94: Consider asserting polygon geometry, not only row count.

Line 92–94 currently verifies only that one row is emitted. Adding assertGeometry(...) here would make this test catch geometry parsing regressions, not just record counting.

📌 Suggested assertion addition
         Page page = parse(json);
         assertThat(page.getPositionCount()).isEqualTo(1);
+        assertGeometry(page, "POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))", 4326);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java`
around lines 92 - 94, The test TestGeoJsonDeserializer currently only checks
Page page position count via parse(json) and
assertThat(page.getPositionCount()).isEqualTo(1); update the test to also assert
the parsed polygon geometry so geometry regressions are caught: after creating
Page page = parse(json) add an assertion that validates the geometry column
contents (use the existing assertGeometry helper or equivalent assertion on the
Block returned from page.getBlock(<geometryColumnIndex>) / page.getBlock(0) to
compare against the expected polygon WKB/WKT or shape coordinates), ensuring you
reference parse(json), Page, and assertGeometry (or the Block accessors) when
adding the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonReader.java`:
- Around line 132-136: The test caches reader.isClosed() into a boolean and
asserts that cached value inside the loop, which can miss an unexpected change
during the current iteration; update the loop to assert directly on
reader.isClosed() after each successful reader.next(pageBuilder) call (replace
the cached-boolean check with assertThat(reader.isClosed()).isFalse()), keeping
use of reader.next(pageBuilder) and pageBuilder as in TestGeoJsonReader.

---

Nitpick comments:
In
`@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java`:
- Around line 106-107: Mark the instance fields reader and mapper in
EsriDeserializer as final to enforce immutability; update the EsriDeserializer
constructor to always assign both fields (assign null where appropriate for ESRI
format) so they are initialized exactly once and satisfy final semantics,
ensuring no subsequent reassignment occurs.

In
`@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java`:
- Around line 92-94: The test TestGeoJsonDeserializer currently only checks Page
page position count via parse(json) and
assertThat(page.getPositionCount()).isEqualTo(1); update the test to also assert
the parsed polygon geometry so geometry regressions are caught: after creating
Page page = parse(json) add an assertion that validates the geometry column
contents (use the existing assertGeometry helper or equivalent assertion on the
Block returned from page.getBlock(<geometryColumnIndex>) / page.getBlock(0) to
compare against the expected polygon WKB/WKT or shape coordinates), ensuring you
reference parse(json), Page, and assertGeometry (or the Block accessors) when
adding the check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daf225d9-79af-4145-b94e-b900a0dd4024

📥 Commits

Reviewing files that changed from the base of the PR and between 39fca46 and 2c93f3a.

📒 Files selected for processing (15)
  • docs/src/main/sphinx/connector/hive.md
  • lib/trino-hive-formats/pom.xml
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/HiveClassNames.java
  • lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriReader.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java
  • lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonReader.java
  • plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveStorageFormat.java
  • plugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSourceFactory.java
  • plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
  • plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHivePageSink.java
  • plugin/trino-hive/src/test/resources/geojson/capitals.geojson
  • plugin/trino-hive/src/test/resources/geojson/capitals_expected.txt
  • testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveStorageFormats.java

Copy link
Copy Markdown
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, mostly about testing and some nits. Otherwise, this is looking good.

@dain dain merged commit 5dc36e8 into trinodb:master Mar 28, 2026
66 checks passed
@github-actions github-actions bot added this to the 481 milestone Mar 28, 2026
@ebyhr ebyhr mentioned this pull request Mar 31, 2026
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Apr 8, 2026

@gertjanal Please avoid removing ({issue}`issuenumber`) placeholer. You should add a PR number when the corresponding issue doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

3 participants