-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Bug
Describe the problem
The iterator in Scan.transformPhysicalSchema
https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/Scan.java#L143 uses the physical schema to read but converts to logical before returning the rows to the user. The physical to logical transformation only happens for top-level fields but not nested structs. In my test, I saw that the schema in the ColumnarBatch has the correct logical names for the nested fields but the ColumnVector data type has the physical schema.
Steps to reproduce
- Write parquet files with a schema that has nested struct columns
- Read using
final Table table = Table.forPath(engine, rootTablePath);
final Snapshot snapshot = table.getLatestSnapshot(engine);
final ScanBuilder scanBuilder = snapshot.getScanBuilder();
final StructType physicalReadSchema = ScanStateRow.getPhysicalDataReadSchema(engine, scanState);
final CloseableIterator<ColumnarBatch> physicalDataIter = engine
.getParquetHandler()
.readParquetFiles(singletonCloseableIterator(fileStatus), physicalReadSchema, Optional.empty())
.map(res -> res.getData());
final CloseableIterator<FilteredColumnarBatch> transformedData = Scan.transformPhysicalData(
engine,
scanState,
scanFileRow,
physicalDataIter
)
- Check the resulting ColumnarBatch. Specifically, look at the schema of the vectors that still uses physical names.
Observed results
The schema of the ColumnVector
uses the physical names for nested columns.
Expected results
Transform physical to logical for all fields, also nested.
Further details
Environment information
- Delta Lake version: kernel-07-30-2025
- Spark version:
- Scala version:
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
- No. I cannot contribute a bug fix at this time.