You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Push parquet select to leaves, add correct reordering (#271)
This PR makes us finally have correct semantics when we are asked to
read a schema that's not the same as the schema in the parquet file. In
particular this adds:
- Correct sorting of columns to match the specified schema (previous
code was actually not correctly doing this)
- Identification of leaf fields that are being selected, so we only read
exactly what's asked for
- Sorting a struct inside a list.
- Detection of requested columns that don't exist in the parquet file.
- If nullable, note and then fill them in with a column of null
- Otherwise, error since we're requesting a missing column that can't be
null
- Detection of timestamp columns that need to be cast to match the delta
specification, and the code to do the actual casting
This turns out to be _way harder_ than anticipated, so its a lot of
code.
Currently does not support reordering things inside `map`s. This is a
complex PR as it is, and we don't need support for that just yet. Map
content will just be passed through as read from the file.
---------
Co-authored-by: Ryan Johnson <[email protected]>
0 commit comments