-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: support nullable columns in pre-sorted data sources #16783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -130,8 +130,7 @@ STORED AS PARQUET; | |
| ---- | ||
| 3 | ||
|
|
||
| # Check output plan again, expect no "output_ordering" clause in the physical_plan -> ParquetExec, | ||
| # due to there being more files than partitions: | ||
| # Check output plan again | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the actual reason why the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree -- I reviewed the test and the table definition explicitly says WITH ORDER (string_col ASC NULLS LAST, int_col ASC NULLS LAST)So I would expect this plan not to have additional sorting |
||
| query TT | ||
| EXPLAIN SELECT int_col, string_col | ||
| FROM test_table | ||
|
|
@@ -142,8 +141,7 @@ logical_plan | |
| 02)--TableScan: test_table projection=[int_col, string_col] | ||
| physical_plan | ||
| 01)SortPreservingMergeExec: [string_col@1 ASC NULLS LAST, int_col@0 ASC NULLS LAST] | ||
| 02)--SortExec: expr=[string_col@1 ASC NULLS LAST, int_col@0 ASC NULLS LAST], preserve_partitioning=[true] | ||
| 03)----DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/0.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/1.parquet], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/2.parquet]]}, projection=[int_col, string_col], file_type=parquet | ||
| 02)--DataSourceExec: file_groups={2 groups: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/0.parquet, WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/1.parquet], [WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/test_table/2.parquet]]}, projection=[int_col, string_col], output_ordering=[string_col@1 ASC NULLS LAST, int_col@0 ASC NULLS LAST], file_type=parquet | ||
|
|
||
|
|
||
| # Perform queries using MIN and MAX | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be the only actual code change: remove these lines
git praisesays they came in via #9593 from @suremarc . Do you remember why this condition was added @suremarc ?https://github.com/apache/datafusion/blame/a614716e7d97ff1d3374aef31b9a66fd10141423/datafusion/datasource/src/statistics.rs#L238