Conversation
bf401e7 to
b2ccfe0
Compare
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
b2ccfe0 to
7dfe940
Compare
c5c66d5 to
29af654
Compare
| int type = ewkb.getInt(1); | ||
| if (bigEndian) { | ||
| type = Integer.reverseBytes(type); | ||
| } | ||
| if ((type & EWKB_SRID_FLAG) == 0) { | ||
| return 0; | ||
| } | ||
| int srid = ewkb.getInt(5); |
There was a problem hiding this comment.
can we do a single getLong instead?
There was a problem hiding this comment.
It turned out more complex.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis PR adds EWKB SRID handling utilities to Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.41.1)plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaComment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java (1)
172-178:⚠️ Potential issue | 🟠 MajorRecursively rewrite geospatial types; current conversion misses nested shapes.
toFileSchema(...)/toFileTrinoType(...)only handle top-level geospatial columns. Nested geospatial types (e.g.,array(geometry),row(... geography ...)) are left unchanged, so writer schemas/types can still contain unsupported geospatial primitives.Proposed fix
@@ - List<String> fileColumnNames = icebergSchema.columns().stream() + Schema fileSchema = toFileSchema(icebergSchema); + List<String> fileColumnNames = fileSchema.columns().stream() .map(Types.NestedField::name) .collect(toImmutableList()); - List<Type> fileColumnTypes = icebergSchema.columns().stream() + List<Type> fileColumnTypes = fileSchema.columns().stream() .map(column -> toTrinoType(column.type(), typeManager)) - .map(IcebergFileWriterFactory::toFileTrinoType) .collect(toImmutableList()); @@ - convert(toFileSchema(icebergSchema), "table"), + convert(fileSchema, "table"), @@ - List<Types.NestedField> columnFields = icebergSchema.columns(); + Schema fileSchema = toFileSchema(icebergSchema); + List<Types.NestedField> columnFields = fileSchema.columns(); @@ .map(Types.NestedField::type) .map(type -> toTrinoType(type, typeManager)) - .map(IcebergFileWriterFactory::toFileTrinoType) .collect(toImmutableList()); @@ - toOrcType(toFileSchema(icebergSchema)), + toOrcType(fileSchema), @@ - List<Type> columnTypes = icebergSchema.columns().stream() + Schema fileSchema = toFileSchema(icebergSchema); + List<Type> columnTypes = fileSchema.columns().stream() .map(column -> toTrinoType(column.type(), typeManager)) - .map(IcebergFileWriterFactory::toFileTrinoType) .collect(toImmutableList()); @@ - toFileSchema(icebergSchema), + fileSchema, columnTypes, compressionCodec); @@ - private static Schema toFileSchema(Schema icebergSchema) - { - List<Types.NestedField> columns = icebergSchema.columns().stream() - .map(IcebergFileWriterFactory::toFileType) - .collect(toImmutableList()); - return new Schema(columns); - } + private static Schema toFileSchema(Schema icebergSchema) + { + return new Schema(icebergSchema.columns().stream() + .map(IcebergFileWriterFactory::toFileType) + .collect(toImmutableList())); + } @@ private static Types.NestedField toFileType(Types.NestedField field) { - org.apache.iceberg.types.Type type = field.type(); - if (type.typeId() == TypeID.GEOMETRY || type.typeId() == TypeID.GEOGRAPHY) { - // Replace geometry/geography with binary for file writing - return Types.NestedField.of(field.fieldId(), field.isOptional(), field.name(), Types.BinaryType.get(), field.doc()); - } - return field; - } - - /** - * Convert Trino type for file writing - geometry/geography become varbinary. - */ - private static Type toFileTrinoType(Type type) - { - if (type instanceof GeometryType || type instanceof SphericalGeographyType) { - return VarbinaryType.VARBINARY; - } - return type; + org.apache.iceberg.types.Type rewritten = toFileType(field.type()); + if (rewritten == field.type()) { + return field; + } + return Types.NestedField.of(field.fieldId(), field.isOptional(), field.name(), rewritten, field.doc()); + } + + private static org.apache.iceberg.types.Type toFileType(org.apache.iceberg.types.Type type) + { + return switch (type.typeId()) { + case GEOMETRY, GEOGRAPHY -> Types.BinaryType.get(); + case STRUCT -> Types.StructType.of(type.asStructType().fields().stream() + .map(IcebergFileWriterFactory::toFileType) + .collect(toImmutableList())); + case LIST -> { + Types.ListType listType = type.asListType(); + org.apache.iceberg.types.Type elementType = toFileType(listType.elementType()); + yield listType.isElementRequired() + ? Types.ListType.ofRequired(listType.elementId(), elementType) + : Types.ListType.ofOptional(listType.elementId(), elementType); + } + case MAP -> { + Types.MapType mapType = type.asMapType(); + org.apache.iceberg.types.Type keyType = toFileType(mapType.keyType()); + org.apache.iceberg.types.Type valueType = toFileType(mapType.valueType()); + yield mapType.isValueRequired() + ? Types.MapType.ofRequired(mapType.keyId(), mapType.valueId(), keyType, valueType) + : Types.MapType.ofOptional(mapType.keyId(), mapType.valueId(), keyType, valueType); + } + default -> type; + }; }Also applies to: 233-237, 312-315, 332-359
🧹 Nitpick comments (2)
plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java (1)
93-93: Consider reversing assertion order for clearer failure messages.AssertJ convention is
assertThat(actual).isEqualTo(expected). The current order places the expected value inassertThat, which would produce a confusing failure message like "expected: [actual] but was: [expected]".💡 Suggested fix
- assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block, i)); + assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block, i)).isEqualTo(wktList.get(i));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java` at line 93, The assertion currently uses assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block, i)), which puts the expected value in the assertThat() slot and yields confusing failure messages; swap the operands so the actual value from SPHERICAL_GEOGRAPHY.getObjectValue(block, i) is passed to assertThat(...) and isEqualTo(...) compares to wktList.get(i) (i.e., use assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block, i)).isEqualTo(wktList.get(i))) to follow AssertJ convention.plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (1)
1780-1790: Assert the geography payload, not just row presence.This test still passes if the new
sphericalgeographyread/write path corrupts the payload, because it never inspectsgeog. Please round-trip a deterministic value instead of only checking that one row exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java` around lines 1780 - 1790, In testGeographyRoundTrip, change the assertion to validate the actual geography payload instead of only row presence: insert a deterministic geometry using to_spherical_geography(ST_Point(...)) (as already done) and then query the stored column (e.g., SELECT ST_AsText(geog) or SELECT geog = to_spherical_geography(ST_Point(...))) to assert the round-tripped value equals the original; update the assertThat(...) call in testGeographyRoundTrip to compare the retrieved geog text/binary to the expected deterministic value so corruption would fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java`:
- Around line 358-374: transformGeometryColumnsForWrite currently only processes
top-level geometry channels and therefore skips SRID validation and EWKB→WKB
conversion for nested geospatial values; update it to recurse into complex
blocks (RowBlock, ArrayBlock, MapBlock) so nested geometry blocks are visited
and passed to transformGeometryBlockForWrite (or alternately add an early
validation that rejects nested geometry types). Specifically, modify
transformGeometryColumnsForWrite to detect complex Block instances and walk
their children recursively, applying transformGeometryBlockForWrite when a child
block corresponds to a geometry type (preserving position counts and offsets),
and ensure SRID lookup uses the same channel/type mapping used now so nested
geometries get validated and converted.
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java`:
- Around line 1694-1710: getSridInjectionTransform currently only handles when
the projected column itself is a GeometryType, so selecting structured types
(ROW/ARRAY/MAP) that contain geometry skips SRID injection; modify
getSridInjectionTransform to inspect nested schema/type structures (RowType,
ArrayType, MapType) for GeometryType leaves by recursing the tableSchema
field.type and/or IcebergColumnHandle.getType and build a Block->Block transform
that traverses structured Blocks and calls injectSridIntoBlock for each nested
geometry leaf (preserve positions and offsets for arrays/maps and fields for
rows), or if recursion is undesirable, explicitly detect any nested GeometryType
and throw/return an error disallowing nested geospatial columns; use the
existing symbols getSridInjectionTransform, injectSridIntoBlock,
IcebergColumnHandle, tableSchema.findField and Types.{RowType, ListType,
MapType}/Types.GeometryType to locate and implement the recursion or validation.
- Around line 932-942: getFileReadType currently only converts top-level
GeometryType/SphericalGeographyType to VARBINARY, but for Avro you must recurse
into complex types so nested geospatial leaves are also read as VARBINARY:
update getFileReadType(Type columnType) to detect complex types (RowType,
ArrayType, MapType) and return a new type with the same structure where each
child/field/element/key/value type is replaced by calling getFileReadType
recursively (preserve field names and nullability for RowType, preserve map
key/value semantics for MapType, and element type for ArrayType); keep the
existing simple-case behavior for GeometryType and SphericalGeographyType and
leave other scalar types unchanged so behavior matches getOrcReadType.
- Around line 1715-1754: injectSridIntoBlock currently returns the original
block whenever it isn't a VariableWidthBlock, which allows dictionary-, RLE- or
lazy-backed geometry blocks to bypass SRID injection; update injectSridIntoBlock
to unwrap and handle those wrappers: if block is a DictionaryBlock, call
injectSridIntoBlock on dictionaryBlock.getDictionary() and then return a new
DictionaryBlock that reuses the original ids and positionCount but wraps the
transformed dictionary; if block is a RunLengthEncodedBlock, call
injectSridIntoBlock on runLengthEncodedBlock.getValue() and return a new
RunLengthEncodedBlock with the transformed value and the original positionCount;
if block is a LazyBlock, force/load the underlying block (lazyBlock.getBlock())
and recurse similarly; for any other block types continue to return the original
block. Ensure you still perform the existing VariableWidthBlock SRID
transformation (keeping function name injectSridIntoBlock and reusing the
existing VariableWidthBlock handling) and preserve nulls, offsets and original
wrapper semantics when reconstructing wrapper blocks.
---
Nitpick comments:
In
`@plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java`:
- Line 93: The assertion currently uses
assertThat(wktList.get(i)).isEqualTo(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)), which puts the expected value in the assertThat() slot and yields confusing
failure messages; swap the operands so the actual value from
SPHERICAL_GEOGRAPHY.getObjectValue(block, i) is passed to assertThat(...) and
isEqualTo(...) compares to wktList.get(i) (i.e., use
assertThat(SPHERICAL_GEOGRAPHY.getObjectValue(block,
i)).isEqualTo(wktList.get(i))) to follow AssertJ convention.
In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1780-1790: In testGeographyRoundTrip, change the assertion to
validate the actual geography payload instead of only row presence: insert a
deterministic geometry using to_spherical_geography(ST_Point(...)) (as already
done) and then query the stored column (e.g., SELECT ST_AsText(geog) or SELECT
geog = to_spherical_geography(ST_Point(...))) to assert the round-tripped value
equals the original; update the assertThat(...) call in testGeographyRoundTrip
to compare the retrieved geog text/binary to the expected deterministic value so
corruption would fail the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cfe86063-c19c-407a-a2f2-c1f96b5aa41b
📒 Files selected for processing (14)
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.javaplugin/trino-iceberg/pom.xmlplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/aggregation/IcebergThetaSketchForStats.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
29af654 to
09bef94
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java (1)
352-362: Avoid rebuilding unchanged struct types.Unlike the list/map branches, this always allocates a new
StructType. That makestoFileSchema()rewrite every struct-typed subtree, even when there are no geometry/geography leaves under it. Mirroring the existing fast-path keeps the rewrite limited to the cases this PR actually needs.♻️ Suggested refinement
if (type instanceof Types.StructType structType) { - return Types.StructType.of(structType.fields().stream() - .map(IcebergFileWriterFactory::toFileType) - .collect(toImmutableList())); + List<Types.NestedField> fields = structType.fields().stream() + .map(IcebergFileWriterFactory::toFileType) + .collect(toImmutableList()); + if (IntStream.range(0, fields.size()).allMatch(index -> fields.get(index) == structType.fields().get(index))) { + return type; + } + return Types.StructType.of(fields); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java` around lines 352 - 362, The toFileType method always allocates a new Types.StructType even when none of its child fields change; update the StructType branch in toFileType to map each field through IcebergFileWriterFactory::toFileType but detect whether any child type differs from the original (by identity/equality) and only call Types.StructType.of(...) to construct and return a new StructType when at least one field's type changed; otherwise return the original structType to avoid unnecessary rewrites.plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (2)
1729-1758: Add a positive custom-SRID round-trip here.This only proves registration/description on an empty
EPSG:3857table. It never exercises the non-default CRS read/write path, so a regression in SRID injection/stripping would still pass. Please insert a3857geometry and assert both the shape andST_SRID.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java` around lines 1729 - 1758, Test currently only registers and describes an empty EPSG:3857 table; add a positive round-trip to exercise SRID injection/stripping by inserting a geometry with SRID 3857 and then reading it back to assert both the shape and ST_SRID. In the testGeometryWithCustomSrid method use assertUpdate to INSERT a row into the registered table (via the registered identifier), then use query/assertThat on SELECT ST_AsText(geom) and ST_SRID(geom) (or equivalent) from the same table to verify the geometry WKT matches the inserted shape and that ST_SRID returns 3857, and keep cleanup (DROP TABLE) as-is.
1829-1839: This doesn't actually round-trip the geography column.The test inserts
geog, but it only readsid, so the geography decode path is never exercised. Please materializegeogitself or run a geography function against it so read-side regressions can't slip through.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java` around lines 1829 - 1839, The test testGeographyRoundTrip currently inserts into the geog column but only selects id, so the geography decode path isn't exercised; update the assertion to materialize the geography (e.g., SELECT geog FROM <table>.getName() or SELECT ST_AsText(geog) FROM <table>.getName()) or run a geography function against it and assert the expected result so the read-side geography decoding in testGeographyRoundTrip is actually exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 251-269: The crsToSrid method currently accepts EPSG:0 and
negative EPSG values; update crsToSrid (in class JtsGeometrySerde) so after
parsing the numeric part from "EPSG:" you validate that the parsed int is > 0
and throw new IllegalArgumentException for non-positive values (e.g., "Invalid
EPSG code: " + crs) instead of returning 0; keep the existing OGC:CRS84/CRS84
special-case logic and preserve the same exception message style for unsupported
formats.
---
Nitpick comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java`:
- Around line 352-362: The toFileType method always allocates a new
Types.StructType even when none of its child fields change; update the
StructType branch in toFileType to map each field through
IcebergFileWriterFactory::toFileType but detect whether any child type differs
from the original (by identity/equality) and only call Types.StructType.of(...)
to construct and return a new StructType when at least one field's type changed;
otherwise return the original structType to avoid unnecessary rewrites.
In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1729-1758: Test currently only registers and describes an empty
EPSG:3857 table; add a positive round-trip to exercise SRID injection/stripping
by inserting a geometry with SRID 3857 and then reading it back to assert both
the shape and ST_SRID. In the testGeometryWithCustomSrid method use assertUpdate
to INSERT a row into the registered table (via the registered identifier), then
use query/assertThat on SELECT ST_AsText(geom) and ST_SRID(geom) (or equivalent)
from the same table to verify the geometry WKT matches the inserted shape and
that ST_SRID returns 3857, and keep cleanup (DROP TABLE) as-is.
- Around line 1829-1839: The test testGeographyRoundTrip currently inserts into
the geog column but only selects id, so the geography decode path isn't
exercised; update the assertion to materialize the geography (e.g., SELECT geog
FROM <table>.getName() or SELECT ST_AsText(geog) FROM <table>.getName()) or run
a geography function against it and assert the expected result so the read-side
geography decoding in testGeographyRoundTrip is actually exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 363a8e5b-2119-40d1-85a5-a87b160985c5
📒 Files selected for processing (13)
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.javaplugin/trino-iceberg/pom.xmlplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
✅ Files skipped from review due to trivial changes (6)
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
- plugin/trino-iceberg/pom.xml
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
🚧 Files skipped from review as they are similar to previous changes (2)
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java
09bef94 to
a419dd1
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java (1)
1684-1715: Exercise the custom-SRID data path, not just registration.This test stops at metadata checks, so it never hits the
EPSG:3857read/write logic added inIcebergPageSourceProviderandIcebergPageSink. Adding one row and assertingST_SRID(geom) = 3857would cover the new behavior end to end.Suggested test extension
assertThat(query("DESCRIBE " + registered)) .matches("VALUES (VARCHAR 'id', VARCHAR 'integer', VARCHAR '', VARCHAR ''), " + "(VARCHAR 'geom', VARCHAR 'Geometry', VARCHAR '', VARCHAR '')"); + assertUpdate("INSERT INTO " + registered + " VALUES (1, ST_SetSRID(ST_Point(1, 2), 3857))", 1); + assertThat(query("SELECT ST_SRID(geom) FROM " + registered)) + .matches("VALUES 3857"); + assertUpdate("DROP TABLE " + registered);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java` around lines 1684 - 1715, Extend TestIcebergV3.testGeometryWithCustomSrid to exercise actual read/write of the EPSG:3857 geometry: after registering the table (registered), INSERT one row with a geometry value into the registered table (so IcebergPageSink is used) and then SELECT the geometry back asserting ST_SRID(geom) = 3857 (and optionally validating the geometry value) to exercise the IcebergPageSink/IcebergPageSourceProvider code paths for custom SRID handling.lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java (1)
275-295: Consider defensive check for already-present SRID flag.If the input slice is EWKB (already has SRID flag set), the method would produce corrupted output by copying the old SRID as part of the geometry data. While the Javadoc clearly states the input should be "WKB", a defensive check would prevent silent data corruption from accidental misuse.
🛡️ Optional defensive check
public static Slice wkbToEwkb(Slice wkb, int srid) { checkArgument(wkb.length() >= 5, "WKB too short"); boolean bigEndian = wkb.getByte(0) == 0; int type = wkb.getInt(1); if (bigEndian) { type = Integer.reverseBytes(type); } + checkArgument((type & EWKB_SRID_FLAG) == 0, "Input already has SRID flag set (expected WKB, got EWKB)"); // Add SRID flag int newType = type | EWKB_SRID_FLAG;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java` around lines 275 - 295, wkbToEwkb currently blindly injects an SRID flag and copies bytes, which corrupts input if the incoming Slice already has EWKB SRID set; add a defensive check at the start of wkbToEwkb by inspecting the parsed type (the local variable type obtained from wkb.getInt(1) and possibly reversed) and if (type & EWKB_SRID_FLAG) != 0 either throw an IllegalArgumentException or simply return the original Slice unchanged (choose consistent behavior with the codebase), with a clear message mentioning EWKB_SRID_FLAG and the method JtsGeometrySerde.wkbToEwkb so callers know the input already contains an SRID and no injection is needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java`:
- Around line 119-121: In the GEOMETRY branch of TypeConverter, validate the
geometry CRS before returning GEOMETRY: extract the CRS string from the Iceberg
geometry(crs) type, call crsToSrid(crs) to validate/parse it, and if crsToSrid
indicates an unsupported format or invalid EPSG, reject the type
deterministically (e.g., throw a TrinoException with NOT_SUPPORTED) instead of
silently returning GEOMETRY; only return GEOMETRY when crsToSrid succeeds.
---
Nitpick comments:
In
`@lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.java`:
- Around line 275-295: wkbToEwkb currently blindly injects an SRID flag and
copies bytes, which corrupts input if the incoming Slice already has EWKB SRID
set; add a defensive check at the start of wkbToEwkb by inspecting the parsed
type (the local variable type obtained from wkb.getInt(1) and possibly reversed)
and if (type & EWKB_SRID_FLAG) != 0 either throw an IllegalArgumentException or
simply return the original Slice unchanged (choose consistent behavior with the
codebase), with a clear message mentioning EWKB_SRID_FLAG and the method
JtsGeometrySerde.wkbToEwkb so callers know the input already contains an SRID
and no injection is needed.
In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java`:
- Around line 1684-1715: Extend TestIcebergV3.testGeometryWithCustomSrid to
exercise actual read/write of the EPSG:3857 geometry: after registering the
table (registered), INSERT one row with a geometry value into the registered
table (so IcebergPageSink is used) and then SELECT the geometry back asserting
ST_SRID(geom) = 3857 (and optionally validating the geometry value) to exercise
the IcebergPageSink/IcebergPageSourceProvider code paths for custom SRID
handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f363ea84-a267-4c8f-b184-cd96420316b7
📒 Files selected for processing (14)
lib/trino-geospatial-toolkit/src/main/java/io/trino/geospatial/serde/JtsGeometrySerde.javalib/trino-geospatial-toolkit/src/test/java/io/trino/geospatial/serde/TestGeometrySerialization.javaplugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.javaplugin/trino-iceberg/pom.xmlplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java
✅ Files skipped from review due to trivial changes (5)
- plugin/trino-geospatial/src/test/java/io/trino/plugin/geospatial/TestSphericalGeoFunctions.java
- plugin/trino-iceberg/pom.xml
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTypes.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
🚧 Files skipped from review as they are similar to previous changes (2)
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
- plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java
a419dd1 to
2950790
Compare
|
Can we separate the commits for geometry and geography ? |
@raunaqmorarka Why? Seems like a bunch of what for not much win. This is not a big PR.
I can take a look. |
I took a look, and this is not as simple as adding a hard-coded logical annotation. The geometry/geography parameters would need to be carried through the Parquet schema annotations, and as far as I can tell that would require Parquet/Iceberg library support updates on our side to propagate those parameters end-to-end. So I think this is better handled as follow-up work in the Parquet schema conversion stack rather than folded into this PR. |
We usually make separate commits for independent pieces of functionality. It helps with review, nicer git history etc. Unless these functionalities cannot exist separately, I think it makes sense for the commits to be separate. Hopefully claude/codex can do this without too much fuss. |
I'm mainly worried about cross-engine compatibility. I see that Databricks/Spark doesn't support it yet, but SnowFlake does https://docs.snowflake.com/en/en/user-guide/tables-iceberg-data-types#label-tables-iceberg-v3-data-types |
2950790 to
bc1fc0c
Compare
|
Rebased to solve conflicts |
b59d0c6 to
c5e5819
Compare
Implement reading and writing of geometry columns in Iceberg tables using the Iceberg v3 geometry type specification.
c5e5819 to
2964e92
Compare
I added a commit that does this. |
2964e92 to
52e283b
Compare
Description
This PR is based on #27881, and only the last commit is relevent to Iceberg.
Geometry
geometry(crs)→ TrinoGeometryEPSG:3857→ SRID 3857,OGC:CRS84→ SRID 4326)Geography
geography(crs, algorithm)→ TrinoSphericalGeographyRelease notes
(X) Release notes are required, with the following suggested text: