Support DML operations on Delta tables with name column mapping#15837
Support DML operations on Delta tables with name column mapping#15837mx123 wants to merge 2 commits intotrinodb:masterfrom
name column mapping#15837Conversation
|
Please address the conflicts with Do consider relying now on Trino to perform DML statements instead of doing this over Databricks in |
c6d6b65 to
0a94f95
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeWriter.java
Outdated
Show resolved
Hide resolved
95c6be2 to
1f2d377
Compare
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
1f2d377 to
1e1d87e
Compare
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
9095186 to
30bf882
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
can we have a test for nested types as well?
There was a problem hiding this comment.
i guess it's unrelated since no SQL syntax change this PR contained.
There was a problem hiding this comment.
+1 to add nested column type case because we sometimes face row type specific issue.
There was a problem hiding this comment.
added tests for nested types. but tests are failed with column mapping id tables. looking into...
There was a problem hiding this comment.
One possibility that I see is to add into DeltaLakeColumnMetadata & DeltaLakeColumnHandle a mapping of <physical name String, field id Optionalnt> to have all the field ids available in the ParquetSchemaConverter.
This construct would replace
lib/trino-parquet/src/main/java/io/trino/parquet/writer/TypedColumn.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeWriter.java
Outdated
Show resolved
Hide resolved
30bf882 to
dbe553f
Compare
|
Tests are failing |
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
dbe553f to
ad5d1d9
Compare
fixed by ad5d1d9 |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePageSink.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeMapping.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
ad5d1d9 to
1490323
Compare
1490323 to
a6e5dfb
Compare
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
+1 to add nested column type case because we sometimes face row type specific issue.
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
db22c34 to
3ee29d9
Compare
|
per offline discussion, please split the work into |
3ee29d9 to
b7f8872
Compare
id / name column mappingname column mapping
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
7932ff5 to
9c45512
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/AbstractDeltaLakePageSink.java
Outdated
Show resolved
Hide resolved
9c45512 to
a53a7e8
Compare
|
@mx123 there seem to be some issues discovered by the tests (about collecting stats): see https://github.com/trinodb/trino/pull/15837/checks?check_run_id=11266685083 |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/AbstractDeltaLakePageSink.java
Outdated
Show resolved
Hide resolved
a53a7e8 to
3716ee9
Compare
3716ee9 to
8536585
Compare
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do the stats settings play any role?
There was a problem hiding this comment.
Does this apply only for column mapping NONE ?
If yes, please specify the column mapping as a parameter for the method and use it in the if statement.
There was a problem hiding this comment.
Agreed, if the behavior is very different for name mapping I'd rewrite this method to only do the name mapping and call it getPartitionColumnsForNameMapping
There was a problem hiding this comment.
In the case of a none mapping you're just returning the input handle.getMetadataEntry().getOriginalPartitionColumns() right?
There was a problem hiding this comment.
nit: there is no need for the "Note:" prefix in the statement. The comment implies that this is a developer note.
If we add the column mapping as a parameter for the method, we may not need this comment anymore.
If the column mapping is NONE perform the mapping of the column names as in the original code, otherwise for NAME use physical column name and for ID throw illegal argument exception.
| return new DeltaLakeColumnHandle(FILE_MODIFIED_TIME_COLUMN_NAME, FILE_MODIFIED_TIME_TYPE, OptionalInt.empty(), FILE_MODIFIED_TIME_COLUMN_NAME, FILE_MODIFIED_TIME_TYPE, SYNTHESIZED); | ||
| } | ||
|
|
||
| public Type getSupportedType() |
There was a problem hiding this comment.
Needs an @JsonIgnore annotation
There was a problem hiding this comment.
I'd also call this getPhysicalType
There was a problem hiding this comment.
Let's try to avoid iterating over the column list twice
| List<String> dataColumnNames = dataColumns.stream().map(DeltaLakeColumnHandle::getPhysicalName).collect(toImmutableList()); | |
| List<Type> parquetTypes = dataColumns.stream().map(DeltaLakeColumnHandle::getSupportedType).collect(toImmutableList()); | |
| ImmutableList.Builder<String> dataColumnNames = ImmutableList.builder(); | |
| ImmutableList.Builder<Type> parquetTypes = ImmutableList.builder(); | |
| for (DeltaLakeColumnHandle column : dataColumns) { | |
| dataColumnNames.add(..); | |
| parquetTypes.add(...); | |
| } |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This is O(n^2) on the column list size, not terrible but not awesome. Can we do better by pre-generating a Map lookup for the list traversal you're doing here?
There was a problem hiding this comment.
Agreed, if the behavior is very different for name mapping I'd rewrite this method to only do the name mapping and call it getPartitionColumnsForNameMapping
There was a problem hiding this comment.
In the case of a none mapping you're just returning the input handle.getMetadataEntry().getOriginalPartitionColumns() right?
|
/test-with-secrets sha=853658543ad1a765ba3c89b557fbdad664f4236d |
|
spark> CREATE TABLE default.test (c1 int) using delta LOCATION 's3://trino-ci-test/test' TBLPROPERTIES('delta.columnMapping.mode'='name', 'delta.checkpointInterval' = 1);
trino> INSERT INTO delta.default.test VALUES (1);
INSERT: 1 row |
|
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4168182591 |
|
@mx123 regarding #15837 (comment) |
8536585 to
d469d0c
Compare
|
I'm looking into the above checkpoints creation bug now and taking over this PR. |
|
closed due to #15837 (comment) |

Description
This functionality correspond to writer version 5 for DML operations
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#writer-version-requirements
Relates to #12638
Release notes
(x) Release notes are required, with the following suggested text: