Skip to content

[Iceberg v3] Geometry and Geography types#27893

Open
dain wants to merge 2 commits intomasterfrom
user/dain/iceberg-geo
Open

[Iceberg v3] Geometry and Geography types#27893
dain wants to merge 2 commits intomasterfrom
user/dain/iceberg-geo

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Jan 11, 2026

Description

This PR is based on #27881, and only the last commit is relevent to Iceberg.

Geometry

geometry(crs) → Trino Geometry

  • CRS is converted to SRID (e.g., EPSG:3857 → SRID 3857, OGC:CRS84 → SRID 4326)
  • Default CRS (null) is treated as OGC:CRS84 (SRID 4326)
  • On read: SRID is injected into the WKB to produce EWKB
  • On write: SRID is validated against column CRS, then stripped from EWKB to produce WKB (SRID 0 is accepted for any CRS)

Geography

geography(crs, algorithm) → Trino SphericalGeography

  • CRS must be WGS84 (OGC:CRS84, EPSG:4326, or null for default)
  • Algorithm must be SPHERICAL (or null for default)
  • Non-WGS84 CRS is not supported
  • Non-spherical algorithms (e.g., VINCENTY) are not supported

Release notes

(X) Release notes are required, with the following suggested text:

## Iceberg
* Add support for reading and writing `geometry` and `geography` column types. ({issue}`issuenumber`)

@cla-bot cla-bot Bot added the cla-signed label Jan 11, 2026
@github-actions github-actions Bot added iceberg Iceberg connector hive Hive connector postgresql PostgreSQL connector labels Jan 11, 2026
@dain dain force-pushed the user/dain/iceberg-geo branch 2 times, most recently from bf401e7 to b2ccfe0 Compare January 11, 2026 05:07
@dain dain changed the title Add geometry and geography type support for Iceberg v3 [Iceberg v3] Geometry and Geography types Jan 11, 2026
@dain dain marked this pull request as ready for review January 11, 2026 07:43
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 2, 2026

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Feb 2, 2026
@dain dain added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Feb 17, 2026
@dain dain force-pushed the user/dain/iceberg-geo branch from b2ccfe0 to 7dfe940 Compare March 25, 2026 18:52
@dain dain removed the stale label Mar 25, 2026
@dain dain force-pushed the user/dain/iceberg-geo branch 2 times, most recently from c5c66d5 to 29af654 Compare March 25, 2026 20:23
Comment on lines +203 to +210
int type = ewkb.getInt(1);
if (bigEndian) {
type = Integer.reverseBytes(type);
}
if ((type & EWKB_SRID_FLAG) == 0) {
return 0;
}
int srid = ewkb.getInt(5);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do a single getLong instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out more complex.

Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java Outdated
Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java Outdated
Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java Outdated
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 25, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

This PR adds EWKB SRID handling utilities to JtsGeometrySerde (extracting/injecting SRID, converting between EWKB and WKB, and mapping CRS strings to SRID). The Iceberg plugin is updated to treat GEOMETRY/GEOGRAPHY as binary for file I/O, convert Trino geometry types to VARBINARY for writers/readers, inject SRIDs on read, strip SRIDs on write with validation, and exclude geospatial types from predicate pushdown and extended statistics. Tests and build deps for geospatial modules were added and Iceberg round-trip tests for geometry/geography were introduced.

Possibly related PRs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java (1)

172-178: ⚠️ Potential issue | 🟠 Major

Recursively rewrite geospatial types; current conversion misses nested shapes.

toFileSchema(...) / toFileTrinoType(...) only handle top-level geospatial columns. Nested geospatial types (e.g., array(geometry), row(... geography ...)) are left unchanged, so writer schemas/types can still contain unsupported geospatial primitives.

Proposed fix
@@
-        List<String> fileColumnNames = icebergSchema.columns().stream()
+        Schema fileSchema = toFileSchema(icebergSchema);
+        List<String> fileColumnNames = fileSchema.columns().stream()
                 .map(Types.NestedField::name)
                 .collect(toImmutableList());
-        List<Type> fileColumnTypes = icebergSchema.columns().stream()
+        List<Type> fileColumnTypes = fileSchema.columns().stream()
                 .map(column -> toTrinoType(column.type(), typeManager))
-                .map(IcebergFileWriterFactory::toFileTrinoType)
                 .collect(toImmutableList());
@@
-                    convert(toFileSchema(icebergSchema), "table"),
+                    convert(fileSchema, "table"),
@@
-            List<Types.NestedField> columnFields = icebergSchema.columns();
+            Schema fileSchema = toFileSchema(icebergSchema);
+            List<Types.NestedField> columnFields = fileSchema.columns();
@@
                     .map(Types.NestedField::type)
                     .map(type -> toTrinoType(type, typeManager))
-                    .map(IcebergFileWriterFactory::toFileTrinoType)
                     .collect(toImmutableList());
@@
-                    toOrcType(toFileSchema(icebergSchema)),
+                    toOrcType(fileSchema),
@@
-        List<Type> columnTypes = icebergSchema.columns().stream()
+        Schema fileSchema = toFileSchema(icebergSchema);
+        List<Type> columnTypes = fileSchema.columns().stream()
                 .map(column -> toTrinoType(column.type(), typeManager))
-                .map(IcebergFileWriterFactory::toFileTrinoType)
                 .collect(toImmutableList());
@@
-                toFileSchema(icebergSchema),
+                fileSchema,
                 columnTypes,
                 compressionCodec);
@@
-    private static Schema toFileSchema(Schema icebergSchema)
-    {
-        List<Types.NestedField> columns = icebergSchema.columns().stream()
-                .map(IcebergFileWriterFactory::toFileType)
-                .collect(toImmutableList());
-        return new Schema(columns);
-    }
+    private static Schema toFileSchema(Schema icebergSchema)
+    {
+        return new Schema(icebergSchema.columns().stream()
+                .map(IcebergFileWriterFactory::toFileType)
+                .collect(toImmutableList()));
+    }
@@
     private static Types.NestedField toFileType(Types.NestedField field)
     {
-        org.apache.iceberg.types.Type type = field.type();
-        if (type.typeId() == TypeID.GEOMETRY || type.typeId() == TypeID.GEOGRAPHY) {
-            // Replace geometry/geography with binary for file writing
-            return Types.NestedField.of(field.fieldId(), field.isOptional(), field.name(), Types.BinaryType.get(), field.doc());
-        }
-        return field;
-    }
-
-    /**
-     * Convert Trino type for file writing - geometry/geography become varbinary.
-     */
-    private static Type toFileTrinoType(Type type)
-    {
-        if (type instanceof GeometryType || type instanceof SphericalGeographyType) {
-            return VarbinaryType.VARBINARY;
-        }
-        return type;
+        org.apache.iceberg.types.Type rewritten = toFileType(field.type());
+        if (rewritten == field.type()) {
+            return field;
+        }
+        return Types.NestedField.of(field.fieldId(), field.isOptional(), field.name(), rewritten, field.doc());
+    }
+
+    private static org.apache.iceberg.types.Type toFileType(org.apache.iceberg.types.Type type)
+    {
+        return switch (type.typeId()) {
+            case GEOMETRY, GEOGRAPHY -> Types.BinaryType.get();
+            case STRUCT -> Types.StructType.of(type.asStructType().fields().stream()
+                    .map(IcebergFileWriterFactory::toFileType)
+                    .collect(toImmutableList()));
+            case LIST -> {
+                Types.ListType listType = type.asListType();
+                org.apache.iceberg.types.Type elementType = toFileType(listType.elementType());
+                yield listType.isElementRequired()
+                        ? Types.ListType.ofRequired(listType.elementId(), elementType)
+                        : Types.ListType.ofOptional(listType.elementId(), elementType);
+            }
+            case MAP -> {
+                Types.MapType mapType = type.asMapType();
+                org.apache.iceberg.types.Type keyType = toFileType(mapType.keyType());
+                org.apache.iceberg.types.Type valueType = toFileType(mapType.valueType());
+                yield mapType.isValueRequired()
+                        ? Types.MapType.ofRequired(mapType.keyId(), mapType.valueId(), keyType, valueType)
+                        : Types.MapType.ofOptional(mapType.keyId(), mapType.valueId(), keyType, valueType);
+            }
+            default -> type;
+        };
     }

Also applies to: 233-237, 312-315, 332-359

🧹 Nitpick comments (2)
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java (1)

93-93: Consider reversing assertion order for clearer failure messages.

AssertJ convention is assertThat(actual).isEqualTo(expected). The current order places the expected value in assertThat, which would produce a confusing failure message like "expected: [actual] but was: [expected]".

💡 Suggested fix
-            assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block, i));
+            assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block, i)).isEqualTo(wktList.get(i));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java`
at line 93, The assertion currently uses
assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)), which puts the expected value in the assertThat() slot and yields confusing
failure messages; swap the operands so the actual value from
SPHERICAL_GEOGRAPHY.getObjectValue(block, i) is passed to assertThat(...) and
isEqualTo(...) compares to wktList.get(i) (i.e., use
assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)).isEqualTo(wktList.get(i))) to follow AssertJ convention.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (1)

1780-1790: Assert the geography payload, not just row presence.

This test still passes if the new sphericalgeography read/write path corrupts the payload, because it never inspects geog. Please round-trip a deterministic value instead of only checking that one row exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`
around lines 1780 - 1790, In testGeographyRoundTrip, change the assertion to
validate the actual geography payload instead of only row presence: insert a
deterministic geometry using to_spherical_geography(ST_Point(...)) (as already
done) and then query the stored column (e.g., SELECT ST_AsText(geog) or SELECT
geog = to_spherical_geography(ST_Point(...))) to assert the round-tripped value
equals the original; update the assertThat(...) call in testGeographyRoundTrip
to compare the retrieved geog text/binary to the expected deterministic value so
corruption would fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java`:
- Around line 358-374: transformGeometryColumnsForWrite currently only processes
top-level geometry channels and therefore skips SRID validation and EWKB→WKB
conversion for nested geospatial values; update it to recurse into complex
blocks (RowBlock, ArrayBlock, MapBlock) so nested geometry blocks are visited
and passed to transformGeometryBlockForWrite (or alternately add an early
validation that rejects nested geometry types). Specifically, modify
transformGeometryColumnsForWrite to detect complex Block instances and walk
their children recursively, applying transformGeometryBlockForWrite when a child
block corresponds to a geometry type (preserving position counts and offsets),
and ensure SRID lookup uses the same channel/type mapping used now so nested
geometries get validated and converted.

In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java`:
- Around line 1694-1710: getSridInjectionTransform currently only handles when
the projected column itself is a GeometryType, so selecting structured types
(ROW/ARRAY/MAP) that contain geometry skips SRID injection; modify
getSridInjectionTransform to inspect nested schema/type structures (RowType,
ArrayType, MapType) for GeometryType leaves by recursing the tableSchema
field.type and/or IcebergColumnHandle.getType and build a Block->Block transform
that traverses structured Blocks and calls injectSridIntoBlock for each nested
geometry leaf (preserve positions and offsets for arrays/maps and fields for
rows), or if recursion is undesirable, explicitly detect any nested GeometryType
and throw/return an error disallowing nested geospatial columns; use the
existing symbols getSridInjectionTransform, injectSridIntoBlock,
IcebergColumnHandle, tableSchema.findField and Types.{RowType, ListType,
MapType}/Types.GeometryType to locate and implement the recursion or validation.
- Around line 932-942: getFileReadType currently only converts top-level
GeometryType/SphericalGeographyType to VARBINARY, but for Avro you must recurse
into complex types so nested geospatial leaves are also read as VARBINARY:
update getFileReadType(Type columnType) to detect complex types (RowType,
ArrayType, MapType) and return a new type with the same structure where each
child/field/element/key/value type is replaced by calling getFileReadType
recursively (preserve field names and nullability for RowType, preserve map
key/value semantics for MapType, and element type for ArrayType); keep the
existing simple-case behavior for GeometryType and SphericalGeographyType and
leave other scalar types unchanged so behavior matches getOrcReadType.
- Around line 1715-1754: injectSridIntoBlock currently returns the original
block whenever it isn't a VariableWidthBlock, which allows dictionary-, RLE- or
lazy-backed geometry blocks to bypass SRID injection; update injectSridIntoBlock
to unwrap and handle those wrappers: if block is a DictionaryBlock, call
injectSridIntoBlock on dictionaryBlock.getDictionary() and then return a new
DictionaryBlock that reuses the original ids and positionCount but wraps the
transformed dictionary; if block is a RunLengthEncodedBlock, call
injectSridIntoBlock on runLengthEncodedBlock.getValue() and return a new
RunLengthEncodedBlock with the transformed value and the original positionCount;
if block is a LazyBlock, force/load the underlying block (lazyBlock.getBlock())
and recurse similarly; for any other block types continue to return the original
block. Ensure you still perform the existing VariableWidthBlock SRID
transformation (keeping function name injectSridIntoBlock and reusing the
existing VariableWidthBlock handling) and preserve nulls, offsets and original
wrapper semantics when reconstructing wrapper blocks.

---

Nitpick comments:
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java`:
- Line 93: The assertion currently uses
assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)), which puts the expected value in the assertThat() slot and yields confusing
failure messages; swap the operands so the actual value from
SPHERICAL_GEOGRAPHY.getObjectValue(block, i) is passed to assertThat(...) and
isEqualTo(...) compares to wktList.get(i) (i.e., use
assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)).isEqualTo(wktList.get(i))) to follow AssertJ convention.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1780-1790: In testGeographyRoundTrip, change the assertion to
validate the actual geography payload instead of only row presence: insert a
deterministic geometry using to_spherical_geography(ST_Point(...)) (as already
done) and then query the stored column (e.g., SELECT ST_AsText(geog) or SELECT
geog = to_spherical_geography(ST_Point(...))) to assert the round-tripped value
equals the original; update the assertThat(...) call in testGeographyRoundTrip
to compare the retrieved geog text/binary to the expected deterministic value so
corruption would fail the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cfe86063-c19c-407a-a2f2-c1f96b5aa41b

📥 Commits

Reviewing files that changed from the base of the PR and between 39fca46 and 29af654.

📒 Files selected for processing (14)
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-iceberg/pom.xml
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/aggregation/IcebergThetaSketchForStats.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java

@dain dain force-pushed the user/dain/iceberg-geo branch from 29af654 to 09bef94 Compare March 25, 2026 23:02
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 26, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java (1)

352-362: Avoid rebuilding unchanged struct types.

Unlike the list/map branches, this always allocates a new StructType. That makes toFileSchema() rewrite every struct-typed subtree, even when there are no geometry/geography leaves under it. Mirroring the existing fast-path keeps the rewrite limited to the cases this PR actually needs.

♻️ Suggested refinement
         if (type instanceof Types.StructType structType) {
-            return Types.StructType.of(structType.fields().stream()
-                    .map(IcebergFileWriterFactory::toFileType)
-                    .collect(toImmutableList()));
+            List<Types.NestedField> fields = structType.fields().stream()
+                    .map(IcebergFileWriterFactory::toFileType)
+                    .collect(toImmutableList());
+            if (IntStream.range(0, fields.size()).allMatch(index -> fields.get(index) == structType.fields().get(index))) {
+                return type;
+            }
+            return Types.StructType.of(fields);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java`
around lines 352 - 362, The toFileType method always allocates a new
Types.StructType even when none of its child fields change; update the
StructType branch in toFileType to map each field through
IcebergFileWriterFactory::toFileType but detect whether any child type differs
from the original (by identity/equality) and only call Types.StructType.of(...)
to construct and return a new StructType when at least one field's type changed;
otherwise return the original structType to avoid unnecessary rewrites.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (2)

1729-1758: Add a positive custom-SRID round-trip here.

This only proves registration/description on an empty EPSG:3857 table. It never exercises the non-default CRS read/write path, so a regression in SRID injection/stripping would still pass. Please insert a 3857 geometry and assert both the shape and ST_SRID.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`
around lines 1729 - 1758, Test currently only registers and describes an empty
EPSG:3857 table; add a positive round-trip to exercise SRID injection/stripping
by inserting a geometry with SRID 3857 and then reading it back to assert both
the shape and ST_SRID. In the testGeometryWithCustomSrid method use assertUpdate
to INSERT a row into the registered table (via the registered identifier), then
use query/assertThat on SELECT ST_AsText(geom) and ST_SRID(geom) (or equivalent)
from the same table to verify the geometry WKT matches the inserted shape and
that ST_SRID returns 3857, and keep cleanup (DROP TABLE) as-is.

1829-1839: This doesn't actually round-trip the geography column.

The test inserts geog, but it only reads id, so the geography decode path is never exercised. Please materialize geog itself or run a geography function against it so read-side regressions can't slip through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`
around lines 1829 - 1839, The test testGeographyRoundTrip currently inserts into
the geog column but only selects id, so the geography decode path isn't
exercised; update the assertion to materialize the geography (e.g., SELECT geog
FROM <table>.getName() or SELECT ST_AsText(geog) FROM <table>.getName()) or run
a geography function against it and assert the expected result so the read-side
geography decoding in testGeographyRoundTrip is actually exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 251-269: The crsToSrid method currently accepts EPSG:0 and
negative EPSG values; update crsToSrid (in class JtsGeometrySerde) so after
parsing the numeric part from "EPSG:" you validate that the parsed int is > 0
and throw new IllegalArgumentException for non-positive values (e.g., "Invalid
EPSG code: " + crs) instead of returning 0; keep the existing OGC:CRS84/CRS84
special-case logic and preserve the same exception message style for unsupported
formats.

---

Nitpick comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java`:
- Around line 352-362: The toFileType method always allocates a new
Types.StructType even when none of its child fields change; update the
StructType branch in toFileType to map each field through
IcebergFileWriterFactory::toFileType but detect whether any child type differs
from the original (by identity/equality) and only call Types.StructType.of(...)
to construct and return a new StructType when at least one field's type changed;
otherwise return the original structType to avoid unnecessary rewrites.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1729-1758: Test currently only registers and describes an empty
EPSG:3857 table; add a positive round-trip to exercise SRID injection/stripping
by inserting a geometry with SRID 3857 and then reading it back to assert both
the shape and ST_SRID. In the testGeometryWithCustomSrid method use assertUpdate
to INSERT a row into the registered table (via the registered identifier), then
use query/assertThat on SELECT ST_AsText(geom) and ST_SRID(geom) (or equivalent)
from the same table to verify the geometry WKT matches the inserted shape and
that ST_SRID returns 3857, and keep cleanup (DROP TABLE) as-is.
- Around line 1829-1839: The test testGeographyRoundTrip currently inserts into
the geog column but only selects id, so the geography decode path isn't
exercised; update the assertion to materialize the geography (e.g., SELECT geog
FROM <table>.getName() or SELECT ST_AsText(geog) FROM <table>.getName()) or run
a geography function against it and assert the expected result so the read-side
geography decoding in testGeographyRoundTrip is actually exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 363a8e5b-2119-40d1-85a5-a87b160985c5

📥 Commits

Reviewing files that changed from the base of the PR and between 29af654 and 09bef94.

📒 Files selected for processing (13)
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-iceberg/pom.xml
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
✅ Files skipped from review due to trivial changes (6)
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-iceberg/pom.xml
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java

@dain dain force-pushed the user/dain/iceberg-geo branch from 09bef94 to a419dd1 Compare March 26, 2026 17:49
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 26, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (1)

1684-1715: Exercise the custom-SRID data path, not just registration.

This test stops at metadata checks, so it never hits the EPSG:3857 read/write logic added in IcebergPageSourceProvider and IcebergPageSink. Adding one row and asserting ST_SRID(geom) = 3857 would cover the new behavior end to end.

Suggested test extension
         assertThat(query("DESCRIBE " + registered))
                 .matches("VALUES (VARCHAR 'id', VARCHAR 'integer', VARCHAR '', VARCHAR ''), " +
                         "(VARCHAR 'geom', VARCHAR 'Geometry', VARCHAR '', VARCHAR '')");
 
+        assertUpdate("INSERT INTO " + registered + " VALUES (1, ST_SetSRID(ST_Point(1, 2), 3857))", 1);
+        assertThat(query("SELECT ST_SRID(geom) FROM " + registered))
+                .matches("VALUES 3857");
+
         assertUpdate("DROP TABLE " + registered);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`
around lines 1684 - 1715, Extend TestIcebergV3.testGeometryWithCustomSrid to
exercise actual read/write of the EPSG:3857 geometry: after registering the
table (registered), INSERT one row with a geometry value into the registered
table (so IcebergPageSink is used) and then SELECT the geometry back asserting
ST_SRID(geom) = 3857 (and optionally validating the geometry value) to exercise
the IcebergPageSink/IcebergPageSourceProvider code paths for custom SRID
handling.
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java (1)

275-295: Consider defensive check for already-present SRID flag.

If the input slice is EWKB (already has SRID flag set), the method would produce corrupted output by copying the old SRID as part of the geometry data. While the Javadoc clearly states the input should be "WKB", a defensive check would prevent silent data corruption from accidental misuse.

🛡️ Optional defensive check
     public static Slice wkbToEwkb(Slice wkb, int srid)
     {
         checkArgument(wkb.length() >= 5, "WKB too short");
         boolean bigEndian = wkb.getByte(0) == 0;
         int type = wkb.getInt(1);
         if (bigEndian) {
             type = Integer.reverseBytes(type);
         }
+        checkArgument((type & EWKB_SRID_FLAG) == 0, "Input already has SRID flag set (expected WKB, got EWKB)");
 
         // Add SRID flag
         int newType = type | EWKB_SRID_FLAG;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`
around lines 275 - 295, wkbToEwkb currently blindly injects an SRID flag and
copies bytes, which corrupts input if the incoming Slice already has EWKB SRID
set; add a defensive check at the start of wkbToEwkb by inspecting the parsed
type (the local variable type obtained from wkb.getInt(1) and possibly reversed)
and if (type & EWKB_SRID_FLAG) != 0 either throw an IllegalArgumentException or
simply return the original Slice unchanged (choose consistent behavior with the
codebase), with a clear message mentioning EWKB_SRID_FLAG and the method
JtsGeometrySerde.wkbToEwkb so callers know the input already contains an SRID
and no injection is needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java`:
- Around line 119-121: In the GEOMETRY branch of TypeConverter, validate the
geometry CRS before returning GEOMETRY: extract the CRS string from the Iceberg
geometry(crs) type, call crsToSrid(crs) to validate/parse it, and if crsToSrid
indicates an unsupported format or invalid EPSG, reject the type
deterministically (e.g., throw a TrinoException with NOT_SUPPORTED) instead of
silently returning GEOMETRY; only return GEOMETRY when crsToSrid succeeds.

---

Nitpick comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 275-295: wkbToEwkb currently blindly injects an SRID flag and
copies bytes, which corrupts input if the incoming Slice already has EWKB SRID
set; add a defensive check at the start of wkbToEwkb by inspecting the parsed
type (the local variable type obtained from wkb.getInt(1) and possibly reversed)
and if (type & EWKB_SRID_FLAG) != 0 either throw an IllegalArgumentException or
simply return the original Slice unchanged (choose consistent behavior with the
codebase), with a clear message mentioning EWKB_SRID_FLAG and the method
JtsGeometrySerde.wkbToEwkb so callers know the input already contains an SRID
and no injection is needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1684-1715: Extend TestIcebergV3.testGeometryWithCustomSrid to
exercise actual read/write of the EPSG:3857 geometry: after registering the
table (registered), INSERT one row with a geometry value into the registered
table (so IcebergPageSink is used) and then SELECT the geometry back asserting
ST_SRID(geom) = 3857 (and optionally validating the geometry value) to exercise
the IcebergPageSink/IcebergPageSourceProvider code paths for custom SRID
handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f363ea84-a267-4c8f-b184-cd96420316b7

📥 Commits

Reviewing files that changed from the base of the PR and between 09bef94 and a419dd1.

📒 Files selected for processing (14)
  • lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java
  • lib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.java
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-iceberg/pom.xml
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
✅ Files skipped from review due to trivial changes (5)
  • plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
  • plugin/trino-iceberg/pom.xml
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java

Comment thread plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java Outdated
@dain dain force-pushed the user/dain/iceberg-geo branch from a419dd1 to 2950790 Compare March 26, 2026 18:43
@raunaqmorarka
Copy link
Copy Markdown
Member

Can we separate the commits for geometry and geography ?
Now that parquet spec has added these types https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geometry, can we use those logical annotations ?

@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 27, 2026

Can we separate the commits for geometry and geography ?

@raunaqmorarka Why? Seems like a bunch of what for not much win. This is not a big PR.

Now that parquet spec has added these types https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geometry, can we use those logical annotations ?

I can take a look.

@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 27, 2026

Now that parquet spec has added these types https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geometry, can we use those logical annotations ?

I can take a look.

I took a look, and this is not as simple as adding a hard-coded logical annotation. The geometry/geography parameters would need to be carried through the Parquet schema annotations, and as far as I can tell that would require Parquet/Iceberg library support updates on our side to propagate those parameters end-to-end. So I think this is better handled as follow-up work in the Parquet schema conversion stack rather than folded into this PR.

@raunaqmorarka
Copy link
Copy Markdown
Member

Can we separate the commits for geometry and geography ?

@raunaqmorarka Why? Seems like a bunch of what for not much win. This is not a big PR.

We usually make separate commits for independent pieces of functionality. It helps with review, nicer git history etc. Unless these functionalities cannot exist separately, I think it makes sense for the commits to be separate. Hopefully claude/codex can do this without too much fuss.

@raunaqmorarka
Copy link
Copy Markdown
Member

Now that parquet spec has added these types https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geometry, can we use those logical annotations ?

I can take a look.

I took a look, and this is not as simple as adding a hard-coded logical annotation. The geometry/geography parameters would need to be carried through the Parquet schema annotations, and as far as I can tell that would require Parquet/Iceberg library support updates on our side to propagate those parameters end-to-end. So I think this is better handled as follow-up work in the Parquet schema conversion stack rather than folded into this PR.

I'm mainly worried about cross-engine compatibility. I see that Databricks/Spark doesn't support it yet, but SnowFlake does https://docs.snowflake.com/en/en/user-guide/tables-iceberg-data-types#label-tables-iceberg-v3-data-types
It would be nice if we can do this in a way that aligns with other iceberg/parquet reader/writers.

@wendigo wendigo force-pushed the user/dain/iceberg-geo branch from 2950790 to bc1fc0c Compare March 31, 2026 09:23
@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Mar 31, 2026

Rebased to solve conflicts

Comment thread plugin/trino-iceberg/pom.xml Outdated
Comment thread plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java Outdated
@wendigo wendigo force-pushed the user/dain/iceberg-geo branch from b59d0c6 to c5e5819 Compare March 31, 2026 18:09
Implement reading and writing of geometry columns in Iceberg tables
using the Iceberg v3 geometry type specification.
@dain dain force-pushed the user/dain/iceberg-geo branch from c5e5819 to 2964e92 Compare April 8, 2026 23:33
@dain
Copy link
Copy Markdown
Member Author

dain commented Apr 8, 2026

Now that parquet spec has added these types https://github.com/apache/parquet-format/blob/master/Geospatial.md#logical-types https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geometry, can we use those logical annotations ?

I can take a look.

I took a look, and this is not as simple as adding a hard-coded logical annotation. The geometry/geography parameters would need to be carried through the Parquet schema annotations, and as far as I can tell that would require Parquet/Iceberg library support updates on our side to propagate those parameters end-to-end. So I think this is better handled as follow-up work in the Parquet schema conversion stack rather than folded into this PR.

I'm mainly worried about cross-engine compatibility. I see that Databricks/Spark doesn't support it yet, but SnowFlake does https://docs.snowflake.com/en/en/user-guide/tables-iceberg-data-types#label-tables-iceberg-v3-data-types It would be nice if we can do this in a way that aligns with other iceberg/parquet reader/writers.

I added a commit that does this.

@dain dain force-pushed the user/dain/iceberg-geo branch from 2964e92 to 52e283b Compare April 9, 2026 01:48
@dain dain requested a review from wendigo April 24, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector iceberg Iceberg connector postgresql PostgreSQL connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.

Development

Successfully merging this pull request may close these issues.

4 participants