Skip to content

[BUG] [Kernel] Scan.transformPhysicalSchema does not get applied to nested fields #5095

@vpapavas

Description

@vpapavas

Bug

Describe the problem

The iterator in Scan.transformPhysicalSchema https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/Scan.java#L143 uses the physical schema to read but converts to logical before returning the rows to the user. The physical to logical transformation only happens for top-level fields but not nested structs. In my test, I saw that the schema in the ColumnarBatch has the correct logical names for the nested fields but the ColumnVector data type has the physical schema.

Steps to reproduce

  1. Write parquet files with a schema that has nested struct columns
  2. Read using
final Table table = Table.forPath(engine, rootTablePath);
final Snapshot snapshot = table.getLatestSnapshot(engine);
final ScanBuilder scanBuilder = snapshot.getScanBuilder();
final StructType physicalReadSchema = ScanStateRow.getPhysicalDataReadSchema(engine, scanState);
final CloseableIterator<ColumnarBatch> physicalDataIter = engine
              .getParquetHandler()
              .readParquetFiles(singletonCloseableIterator(fileStatus), physicalReadSchema, Optional.empty())
              .map(res -> res.getData());
final CloseableIterator<FilteredColumnarBatch> transformedData = Scan.transformPhysicalData(
                engine,
                scanState,
                scanFileRow,
                physicalDataIter
              )
  1. Check the resulting ColumnarBatch. Specifically, look at the schema of the vectors that still uses physical names.

Observed results

The schema of the ColumnVector uses the physical names for nested columns.

Expected results

Transform physical to logical for all fields, also nested.

Further details

Environment information

  • Delta Lake version: kernel-07-30-2025
  • Spark version:
  • Scala version:

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions