Commit 7ea6713
authored
feat(datafusion): implement the partitioning node for DataFusion to define the partitioning (#1620)
## Which issue does this PR close?
- Closes #1543
## What changes are included in this PR?
Implement a physical execution repartition node that determines the
relevant DataFusion partitioning strategy based on the Iceberg table
schema and metadata.
1. Unpartitioned tables: Uses round-robin partitioning
2. Partitioned tables: It depends on the transform type:
- Identity or Bucket transforms: Uses hash partitioning on the
_partition column
- Temporal transforms (Year, Month, Day, Hour): Uses round-robin
partitioning
_Minor change: I created a new `schema_ref()` helper method._
## Are these changes tested?
Yes, with unit tests
---------
Signed-off-by: Florian Valeye <[email protected]>1 parent 45c82df commit 7ea6713
File tree
3 files changed
+893
-1
lines changed- crates
- iceberg/src
- integrations/datafusion/src/physical_plan
3 files changed
+893
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
235 | 235 | | |
236 | 236 | | |
237 | 237 | | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
238 | 243 | | |
239 | 244 | | |
240 | 245 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
0 commit comments