Add hive support for ESRI GeoJson#28859
Conversation
78726e7 to
2c93f3a
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis PR adds support for ESRI GeoJSON format to the Hive connector alongside the existing ESRI format. It introduces a Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.41.1)plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.javaComment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java (1)
106-107: Consider making fieldsfinalfor immutability.The
readerandmapperfields are initialized once in the constructor and never reassigned. Marking themfinalclarifies intent and prevents accidental reassignment.♻️ Suggested fix
private final Format format; - private GeoJsonReader reader; - private ObjectMapper mapper; + private final GeoJsonReader reader; + private final ObjectMapper mapper;Then update the constructor to always assign (use
nullfor ESRI format):public EsriDeserializer(List<Column> columns, Format format) { this.columns = ImmutableList.copyOf(requireNonNull(columns, "columns is null")); this.types = columns.stream() .map(Column::type) .collect(toImmutableList()); this.fieldWritten = new boolean[columns.size()]; this.format = format; if (format == GEO_JSON) { - reader = new GeoJsonReader(); - mapper = new ObjectMapper(); + this.reader = new GeoJsonReader(); + this.mapper = new ObjectMapper(); + } + else { + this.reader = null; + this.mapper = null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java` around lines 106 - 107, Mark the instance fields reader and mapper in EsriDeserializer as final to enforce immutability; update the EsriDeserializer constructor to always assign both fields (assign null where appropriate for ESRI format) so they are initialized exactly once and satisfy final semantics, ensuring no subsequent reassignment occurs.lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java (1)
92-94: Consider asserting polygon geometry, not only row count.Line 92–94 currently verifies only that one row is emitted. Adding
assertGeometry(...)here would make this test catch geometry parsing regressions, not just record counting.📌 Suggested assertion addition
Page page = parse(json); assertThat(page.getPositionCount()).isEqualTo(1); + assertGeometry(page, "POLYGON ((100 0, 101 0, 101 1, 100 1, 100 0))", 4326);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java` around lines 92 - 94, The test TestGeoJsonDeserializer currently only checks Page page position count via parse(json) and assertThat(page.getPositionCount()).isEqualTo(1); update the test to also assert the parsed polygon geometry so geometry regressions are caught: after creating Page page = parse(json) add an assertion that validates the geometry column contents (use the existing assertGeometry helper or equivalent assertion on the Block returned from page.getBlock(<geometryColumnIndex>) / page.getBlock(0) to compare against the expected polygon WKB/WKT or shape coordinates), ensuring you reference parse(json), Page, and assertGeometry (or the Block accessors) when adding the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonReader.java`:
- Around line 132-136: The test caches reader.isClosed() into a boolean and
asserts that cached value inside the loop, which can miss an unexpected change
during the current iteration; update the loop to assert directly on
reader.isClosed() after each successful reader.next(pageBuilder) call (replace
the cached-boolean check with assertThat(reader.isClosed()).isFalse()), keeping
use of reader.next(pageBuilder) and pageBuilder as in TestGeoJsonReader.
---
Nitpick comments:
In
`@lib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.java`:
- Around line 106-107: Mark the instance fields reader and mapper in
EsriDeserializer as final to enforce immutability; update the EsriDeserializer
constructor to always assign both fields (assign null where appropriate for ESRI
format) so they are initialized exactly once and satisfy final semantics,
ensuring no subsequent reassignment occurs.
In
`@lib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.java`:
- Around line 92-94: The test TestGeoJsonDeserializer currently only checks Page
page position count via parse(json) and
assertThat(page.getPositionCount()).isEqualTo(1); update the test to also assert
the parsed polygon geometry so geometry regressions are caught: after creating
Page page = parse(json) add an assertion that validates the geometry column
contents (use the existing assertGeometry helper or equivalent assertion on the
Block returned from page.getBlock(<geometryColumnIndex>) / page.getBlock(0) to
compare against the expected polygon WKB/WKT or shape coordinates), ensuring you
reference parse(json), Page, and assertGeometry (or the Block accessors) when
adding the check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: daf225d9-79af-4145-b94e-b900a0dd4024
📒 Files selected for processing (15)
docs/src/main/sphinx/connector/hive.mdlib/trino-hive-formats/pom.xmllib/trino-hive-formats/src/main/java/io/trino/hive/formats/HiveClassNames.javalib/trino-hive-formats/src/main/java/io/trino/hive/formats/esri/EsriDeserializer.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriDeserializer.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestEsriReader.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonDeserializer.javalib/trino-hive-formats/src/test/java/io/trino/hive/formats/esri/TestGeoJsonReader.javaplugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveStorageFormat.javaplugin/trino-hive/src/main/java/io/trino/plugin/hive/esri/EsriPageSourceFactory.javaplugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.javaplugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHivePageSink.javaplugin/trino-hive/src/test/resources/geojson/capitals.geojsonplugin/trino-hive/src/test/resources/geojson/capitals_expected.txttesting/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveStorageFormats.java
dain
left a comment
There was a problem hiding this comment.
Just a few minor comments, mostly about testing and some nits. Otherwise, this is looking good.
|
@gertjanal Please avoid removing |
Description
Support tables created with Row format
com.esri.hadoop.hive.serde.GeoJsonSerDeandcom.esri.json.hadoop.EnclosedGeoJsonInputFormat.Originally started with PR #28592 but this PR is based on the JTS branch by @dain #27881
My tests work, but the destination branch has failing tests.
See
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X ) Release notes are required, with the following suggested text: