Fix reading variant null values in Delta Lake #26184

chenjian2664 · 2025-07-13T15:19:44Z

Description

In cases where a value chunk includes both null and non-null
entries, the chunk must be treated as potentially nullable.

This is the another corner case that missed in #26027

Release notes

## Delta Lake
* Fix failure when reading `null` values on `json` type columns. ({issue}`26184`)

In cases where a value chunk includes both null and non-null entries, the chunk must be treated as potentially nullable.

ebyhr · 2025-07-13T20:14:29Z

/test-with-secrets sha=b06be75c90801490ebc04fefaa3ce32f0ab7b6ae

github-actions · 2025-07-13T20:15:30Z

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/16253030103

ebyhr · 2025-07-13T21:09:24Z

lib/trino-parquet/src/main/java/io/trino/parquet/ParquetTypeUtils.java

                    definitionLevel,
                    required,
-                    valueField,
+                    new PrimitiveField(valueField.getType(), false, valueField.getDescriptor(), valueField.getId()),


Does it mean Parquet spec is wrong, or Databricks violates the spec?
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#variant-in-parquet

The value field must be annotated as required for unshredded Variant values

Here is the meta of the test parquet file that written by Databricks:

parquet meta part-00000-3dae12c4-61bc-4177-bd36-2c936db81e90-c000.snappy.parquet File path: part-00000-3dae12c4-61bc-4177-bd36-2c936db81e90-c000.snappy.parquet Created by: parquet-mr version 1.12.3-databricks-0002 (build 2484a95dbe16a0023e3eb29c201f99ff9ea771ee) Properties: org.apache.spark.version: 3.5.0 com.databricks.spark.jobGroupId: 1752418278132_8642402946176337271_3e73399aa411479296f9bc16c62a6681 org.apache.spark.sql.parquet.row.metadata: {"type":"struct","fields":[{"name":"id","type":"integer","nullable":true,"metadata":{}},{"name":"x","type":"variant","nullable":true,"metadata":{}}]} com.databricks.spark.clusterId: 1002-064054-nbosugsx Schema: message spark_schema { optional int32 id; optional group x { required binary value; required binary metadata; } }

I think the Schema is correct

But I don't know why our NestedColumnReader#readNonNull can't read correctly cc @raunaqmorarka

chenjian2664 · 2025-07-14T09:34:04Z

#26194

Fix reading variant null values in Delta Lake

b06be75

In cases where a value chunk includes both null and non-null entries, the chunk must be treated as potentially nullable.

cla-bot bot added the cla-signed label Jul 13, 2025

github-actions bot added the delta-lake Delta Lake connector label Jul 13, 2025

chenjian2664 force-pushed the fix_variant_null_reading branch from 3ceefed to b06be75 Compare July 13, 2025 15:20

chenjian2664 requested review from anusudarsan and ebyhr July 13, 2025 15:20

ebyhr reviewed Jul 13, 2025

View reviewed changes

ebyhr approved these changes Jul 14, 2025

View reviewed changes

ebyhr merged commit 2a390bf into trinodb:master Jul 14, 2025
121 of 124 checks passed

github-actions bot added this to the 477 milestone Jul 14, 2025

chenjian2664 deleted the fix_variant_null_reading branch July 14, 2025 12:56

ebyhr mentioned this pull request Aug 6, 2025

Add Trino 477 release notes #26350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix reading variant null values in Delta Lake #26184

Fix reading variant null values in Delta Lake #26184

Uh oh!

chenjian2664 commented Jul 13, 2025 •

edited by ebyhr

Loading

Uh oh!

ebyhr commented Jul 13, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

ebyhr Jul 13, 2025 •

edited

Loading

Uh oh!

chenjian2664 Jul 14, 2025 •

edited

Loading

Uh oh!

chenjian2664 commented Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Fix reading variant null values in Delta Lake #26184

Fix reading variant null values in Delta Lake #26184

Uh oh!

Conversation

chenjian2664 commented Jul 13, 2025 • edited by ebyhr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release notes

Uh oh!

ebyhr commented Jul 13, 2025

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

ebyhr Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjian2664 Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenjian2664 commented Jul 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

chenjian2664 commented Jul 13, 2025 •

edited by ebyhr

Loading

ebyhr Jul 13, 2025 •

edited

Loading

chenjian2664 Jul 14, 2025 •

edited

Loading