Commit 1de3315
authored
fix: ensure CoalescePartitionsExec is enabled for IcebergCommitExec (#1723)
## Which issue does this PR close?
PR fixes partial writes similar to reported
[here](spiceai/spiceai#7407). Despite the
following code to enforce `CoalescePartitionsExec` (single input
behavior) it can be removed by DataFusion optimizer. Unit test was added
to demonstrate such behavior.
https://github.com/apache/iceberg-rust/blob/dc349284a4204c1a56af47fb3177ace6f9e899a0/crates/integrations/datafusion/src/table/mod.rs#L196-L210
```rust
let write_plan = Arc::new(IcebergWriteExec::new(
self.table.clone(),
input,
self.schema.clone(),
));
// Merge the outputs of write_plan into one so we can commit all files together
let coalesce_partitions = Arc::new(CoalescePartitionsExec::new(write_plan));
Ok(Arc::new(IcebergCommitExec::new(
self.table.clone(),
catalog,
coalesce_partitions,
self.schema.clone(),
)))
```
Example plan (observe no `CoalescePartitionsExec`)
```shell
explain format tree insert into task_history_sink select * from runtime.task_history;
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ IcebergCommitExec │ |
| | │ -------------------- │ |
| | │ IcebergCommitExec: table: │ |
| | │ team_app.task_history │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ IcebergWriteExec │ |
| | │ -------------------- │ |
| | │ IcebergWriteExec: table: │ |
| | │ team_app.task_history │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ ProjectionExec │ |
| | │ -------------------- │ |
| | │ captured_output: │ |
| | │ captured_output │ |
| | │ │ |
| | │ end_time: │ |
| | │ CAST(end_time AS Timestamp│ |
| | │ (Microsecond, None)) │ |
| | │ │ |
| | │ error_message: │ |
| | │ error_message │ |
| | │ │ |
| | │ execution_duration_ms: │ |
| | │ execution_duration_ms │ |
| | │ │ |
| | │ input: input │ |
| | │ │ |
| | │ labels: │ |
| | │ CAST(labels AS Map(Field {│ |
| | │ name: "key_value", │ |
| | │ data_type: Struct( │ |
| | │ [Field { name: "key", │ |
| | │ data_type: Utf8, │ |
| | │ nullable: false, │ |
| | │ dict_id: 0, │ |
| | │ dict_is_ordered │ |
| | │ : false, metadata: { │ |
| | │ "PARQUET:field_id": │ |
| | │ "12"} }, Field { name: │ |
| | │ "value", data_type: │ |
| | │ Utf8, nullable: true │ |
| | │ ... │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ RepartitionExec │ |
| | │ -------------------- │ |
| | │ partition_count(in->out): │ |
| | │ 1 -> 14 │ |
| | │ │ |
| | │ partitioning_scheme: │ |
| | │ RoundRobinBatch(14) │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ BytesProcessedExec │ |
| | │ -------------------- │ |
| | │ BytesProcessedExec │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ SchemaCastScanExec │ |
| | │ -------------------- │ |
| | │ SchemaCastScanExec │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 88176 │ |
| | │ format: memory │ |
| | │ rows: 6 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
```
## What changes are included in this PR?
PR adds `required_input_distribution` setting for `IcebergWriteExec` to
ensure DataFusion coalesces input partitions automatically before
commit. Similar to [DataFusion
DataSinkExec](https://github.com/apache/datafusion/blob/a7b113c45509aae34595b6a62469b3173cac91bd/datafusion/datasource/src/sink.rs#L187)
`test_datafusion_execution_partitioned_source` can be used to ovserve
behavior before and after
Before
```rust
Physical plan:
IcebergCommitExec: table=test_namespace.test_table_partitioning
RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=3
IcebergWriteExec: table=test_namespace.test_table_partitioning
DataSourceExec: partitions=3, partition_sizes=[1, 1, 1]
```
After
```rust
IcebergCommitExec: table=test_namespace.test_table
CoalescePartitionsExec
IcebergWriteExec: table=test_namespace.test_table
DataSourceExec: partitions=3, partition_sizes=[1, 1, 1]
```
## Are these changes tested?
Added `test_datafusion_execution_partitioned_source` unit test, tested
manually1 parent 05d9122 commit 1de3315
2 files changed
+120
-1
lines changedLines changed: 110 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
139 | 147 | | |
140 | 148 | | |
141 | 149 | | |
| |||
262 | 270 | | |
263 | 271 | | |
264 | 272 | | |
265 | | - | |
| 273 | + | |
266 | 274 | | |
| 275 | + | |
267 | 276 | | |
268 | 277 | | |
269 | 278 | | |
270 | 279 | | |
271 | 280 | | |
272 | 281 | | |
| 282 | + | |
273 | 283 | | |
274 | 284 | | |
275 | 285 | | |
| |||
280 | 290 | | |
281 | 291 | | |
282 | 292 | | |
| 293 | + | |
283 | 294 | | |
284 | 295 | | |
285 | 296 | | |
| |||
510 | 521 | | |
511 | 522 | | |
512 | 523 | | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
513 | 622 | | |
Lines changed: 10 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
147 | 157 | | |
148 | 158 | | |
149 | 159 | | |
| |||
0 commit comments