Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule #8731

viirya · 2024-01-03T08:23:54Z

Which issue does this PR close?

Closes #.

Rationale for this change

This patch cleans up some code in EnforceDistribution rule. In particular, it also does one major changes by moving repartition_file_scans out of the check by enable_round_robin.

What changes are included in this PR?

Are these changes tested?

Existing tests

Are there any user-facing changes?

No

viirya · 2024-01-03T08:25:00Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

-                // When `repartition_file_scans` is set, attempt to increase
-                // parallelism at the source.
-                if repartition_file_scans {
-                    if let Some(new_child) =
-                        child.plan.repartitioned(target_partitions, config)?
-                    {
-                        child.plan = new_child;
-                    }


repartition_file_scans doesn't look like it should be guarded by enable_round_robin config.

viirya · 2024-01-03T08:41:21Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

-                // Increase parallelism by adding round-robin repartitioning
-                // on top of the operator. Note that we only do this if the
-                // partition count is not already equal to the desired partition
-                // count.
-                child = add_roundrobin_on_top(child, target_partitions)?;


This doesn't look correct to me. The produced query plan also looks confused.

If there are already distribution requirements like single partition or hash partitioning (i.e., it will be repartitioning), why we add round-robin repartitioning below it to result in two repartitioning?

From the comment, looks like it is to increase parallelism? But the parallelism is bound to cpus, however here the partitioning number is target_partitions. That's said it could go to round-robin into much higher partitions (9000 for example as I saw such in one existing test case). But I don't think we will have parallelism as same as target_partitions. Not to mention that additional round-robin partitioning might have their cost.

Regarding parallelism, it sounds more correct to produce partitions at scans (i.e., repartition_file_scans) and results in required parallelism on following operators naturally instead of inserting arbitrary round-robin repartitioning around distribution requirements.

I did benchmark for this change. Overall I don't see significant downgrade.

For example:

Benchmark clickbench_1.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main ┃ cleanup ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 0 │ 3.19ms │ 3.13ms │ no change │ │ QQuery 1 │ 22.84ms │ 24.64ms │ 1.08x slower │ │ QQuery 2 │ 71.05ms │ 79.33ms │ 1.12x slower │ │ QQuery 3 │ 60.94ms │ 67.83ms │ 1.11x slower │ │ QQuery 4 │ 453.96ms │ 470.30ms │ no change │ │ QQuery 5 │ 1089.41ms │ 877.61ms │ +1.24x faster │ │ QQuery 6 │ 24.80ms │ 27.15ms │ 1.09x slower │ │ QQuery 7 │ 24.79ms │ 26.62ms │ 1.07x slower │ │ QQuery 8 │ 1001.42ms │ 1000.47ms │ no change │ │ QQuery 9 │ 1625.50ms │ 1653.77ms │ no change │ │ QQuery 10 │ 201.92ms │ 237.07ms │ 1.17x slower │ │ QQuery 11 │ 226.95ms │ 251.00ms │ 1.11x slower │ │ QQuery 12 │ 1023.53ms │ 791.79ms │ +1.29x faster │ │ QQuery 13 │ 1672.11ms │ 1435.32ms │ +1.16x faster │ │ QQuery 14 │ 1117.68ms │ 881.34ms │ +1.27x faster │ │ QQuery 15 │ 554.72ms │ 574.87ms │ no change │ │ QQuery 16 │ 1900.79ms │ 1840.62ms │ no change │ │ QQuery 17 │ 1884.34ms │ 1791.93ms │ no change │ │ QQuery 18 │ 4313.52ms │ 3465.75ms │ +1.24x faster │ │ QQuery 19 │ 43.34ms │ 53.79ms │ 1.24x slower │ │ QQuery 20 │ 1976.39ms │ 1850.32ms │ +1.07x faster │ │ QQuery 21 │ 2127.49ms │ 2010.90ms │ +1.06x faster │ │ QQuery 22 │ 4457.89ms │ 4859.12ms │ 1.09x slower │ │ QQuery 23 │ 8203.33ms │ 8276.48ms │ no change │ │ QQuery 24 │ 511.62ms │ 426.74ms │ +1.20x faster │ │ QQuery 25 │ 486.66ms │ 366.92ms │ +1.33x faster │ │ QQuery 26 │ 588.12ms │ 473.45ms │ +1.24x faster │ │ QQuery 27 │ 1692.65ms │ 1595.15ms │ +1.06x faster │ │ QQuery 28 │ 12640.52ms │ 12382.83ms │ no change │ │ QQuery 29 │ 407.09ms │ 447.76ms │ 1.10x slower │ │ QQuery 30 │ 968.69ms │ 770.84ms │ +1.26x faster │ │ QQuery 31 │ 1066.54ms │ 876.63ms │ +1.22x faster │ │ QQuery 32 │ 10732.72ms │ 8554.12ms │ +1.25x faster │ │ QQuery 33 │ 7478.91ms │ 6086.03ms │ +1.23x faster │ │ QQuery 34 │ 6516.26ms │ 6733.28ms │ no change │ │ QQuery 35 │ 1236.72ms │ 1207.24ms │ no change │ │ QQuery 36 │ 198.37ms │ 195.67ms │ no change │ │ QQuery 37 │ 99.59ms │ 99.33ms │ no change │ │ QQuery 38 │ 111.45ms │ 111.15ms │ no change │ │ QQuery 39 │ 432.10ms │ 432.58ms │ no change │ │ QQuery 40 │ 45.72ms │ 43.84ms │ no change │ │ QQuery 41 │ 41.19ms │ 41.65ms │ no change │ │ QQuery 42 │ 48.21ms │ 48.99ms │ no change │ └──────────────┴────────────┴────────────┴───────────────┘

The idea as you said is to increase parallelism of the hash-repartition operator. target_partitions in most cases / by design is currently bound to the CPU.

Are you sure the clickbench benchmark actually has different query plans? As we push down repartitioning to e.g. parquet/csv, we shouldn't introduce the round Robin repartitioning anymore in those plans, as target_partitions requirement might already be met.

I'm not sure as these benchmarks don't check query plans. I may need to manually check it. I will post back what I see later. However, my first idea might be, doesn't it mean mostly we don't need to add these round robin? I'm not sure if the addition of the round robin is driven by seeing better performance number.

Btw, I know this change might be a bit controversial before I change it. I change it because the query plan looks weird to have two-level repartitions always (round robin + hashing) and some of them looks unreasonable (round robin 9000 partitions?) so want to raise the discussion. If I cannot get consensus on the removal, I can restore it for sure and keep other change in this PR only.

The idea as you said is to increase parallelism of the hash-repartition operator. target_partitions in most cases / by design is currently bound to the CPU.

Yea, however, I think in most cases hash repartitioning should already take not-single partitioned input (unless its input is intentionally partitioned into single partition). If the scans are well partitioned, later operators follow the partitioning.

Also, I don't see it can obviously affect performance so far (based on the benchmark result locally). Actually, I think mostly the addition round robin doesn't help but adds a little cost (the round robin partitioning is not zero-cost op).

If I remember correctly, before partitions were pushed down to scans, adding round robin repartitioning after e.g. single files was better than doing it only in hash repartition. I think if you disable the file repartitioning or test with formats that don't support it you should be able to measure the effect.
The additional cost of round-robin is extremely small (invisible in profiles), especially compared to hash repartitioning.

Let me restore this change first and make this PR (other parts) forward.

viirya · 2024-01-03T08:42:16Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

@@ -1198,37 +1191,25 @@ fn ensure_distribution(
    )
    .map(
        |(mut child, requirement, required_input_ordering, would_benefit, maintains)| {
-            // Don't need to apply when the returned row count is not greater than 1:


Based on what value is compared with (batch_size), I think this should be batch_size instead of 1.

viirya · 2024-01-03T08:43:59Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

-            child_context.distribution_connection = match child_context.plan.as_any() {
-                plan_any if plan_any.is::<RepartitionExec>() => matches!(
-                    plan_any
-                        .downcast_ref::<RepartitionExec>()
-                        .unwrap()
-                        .partitioning(),
-                    Partitioning::RoundRobinBatch(_) | Partitioning::Hash(_, _)
-                ),


Unnecessary matching. For all supported partitioning in RepartitionExec, distribution_connection should be true.

And changed to use utility functions like is_repartition.

Dandandan · 2024-01-03T10:36:28Z

datafusion/sqllogictest/test_files/tpch/q11.slt.part

------------------------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 4), input_partitions=4
--------------------------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
----------------------------------------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/tpch/data/supplier.tbl]]}, projection=[s_suppkey, s_nationkey], has_header=false
+------------------------------------RepartitionExec: partitioning=Hash([s_suppkey@0], 4), input_partitions=1


I think these plans in these tests are less optimal parallelism-wise as the (compute intensive) hash-repartition is now running on one partition.

My reason to make the change: #8731 (comment)

viirya · 2024-01-03T20:43:18Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

-        if repartition_beneficial_stats {
-            // Since hashing benefits from partitioning, add a round-robin repartition
-            // before it:
-            input = add_roundrobin_on_top(input, n_target)?;


This looks redundant as we already have it.

Reverted this first. I will clean these up in other PR.

viirya · 2024-01-03T20:44:48Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

@@ -1237,12 +1231,7 @@ fn ensure_distribution(
                    child = add_spm_on_top(child);


@Dandandan Do we need the additional round robin for single partition? This looks unnecessary as single partitioning is not costly as hash partitioning.

Let me focus on repartition_file_scans change first in this PR. I will open another PR to clean up these round robin stuffs.

viirya · 2024-01-03T21:24:34Z

datafusion/sqllogictest/test_files/repartition_scan.slt

@@ -147,7 +147,7 @@ WITH HEADER ROW
 LOCATION 'test_files/scratch/repartition_scan/csv_table/';

 query I
-select * from csv_table;
+select * from csv_table ORDER BY column1;


This makes the test result stable, otherwise every time I rerun it locally, it gets different results...

viirya · 2024-01-03T21:25:35Z

datafusion/sqllogictest/test_files/window.slt

@@ -3794,7 +3794,7 @@ select a,
 1 1
 2 1

-# support scalar value in ORDER BY 
+# support scalar value in ORDER BY


Simply ran all sqllogictests locally. Seems some space chars left there.

Dandandan · 2024-01-03T22:10:32Z

datafusion/core/src/physical_optimizer/enforce_distribution.rs

            } else {
                true
            };

+            // When `repartition_file_scans` is set, attempt to increase
+            // parallelism at the source.
+            if repartition_file_scans {


Should we also check for repartition_beneficial_stats maybe?

Okay, looks like we have filled file scan statistics.

alamb

This looks like a good code change to me, but I am surprised that there are no (real) test changes needed other than parallelization_two_partitions

The changes to parallelization_two_partitions are pretty subtle to me in that if they changed I might not catch the regression (the plan would simply look like it had an implicit rather than an explicit range)

Is there some way we can add an explain plan to prevent this code from being broken in the future?

viirya · 2024-01-03T22:51:43Z

Is there some way we can add an explain plan to prevent this code from being broken in the future?

Let me see how to add one.

viirya · 2024-01-04T17:29:33Z

datafusion/sqllogictest/test_files/repartition_scan.slt

+# disable round robin repartitioning
+statement ok
+set datafusion.optimizer.enable_round_robin_repartition = false;
+
+## Expect to see the scan read the file as "4" groups with even sizes (offsets) again
+query TT
+EXPLAIN SELECT column1 FROM parquet_table WHERE column1 <> 42;
+----


I verified that this test case fails on current main branch.

alamb

This looks good to me -- thank you @viirya

cc @mustafasrepo / @ozankabak as you have expertise in this area of the code I think

mustafasrepo

Thanks @viirya LGTM!.

viirya · 2024-01-05T15:30:01Z

Thank you @alamb @Dandandan @mustafasrepo @ozankabak

Cleanup

b7395a3

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 3, 2024

viirya commented Jan 3, 2024

View reviewed changes

Dandandan reviewed Jan 3, 2024

View reviewed changes

viirya added 2 commits January 3, 2024 12:25

More

ebec14e

Restore add_roundrobin_on_top

5ea68e6

viirya force-pushed the cleanup branch from 49ee4aa to 5ea68e6 Compare January 3, 2024 20:28

viirya added 2 commits January 3, 2024 12:36

Restore test files

d702b0f

More

a4ab9c2

viirya commented Jan 3, 2024

View reviewed changes

viirya added 4 commits January 3, 2024 12:49

Restore

aeb3f42

More

210e8df

More

f95e29e

Make test stable

17d2b87

viirya commented Jan 3, 2024

View reviewed changes

alamb mentioned this pull request Jan 3, 2024

DataFusion weekly project plan (Andrew Lamb) - Jan 1, 2024 #8704

Closed

9 tasks

Dandandan reviewed Jan 3, 2024

View reviewed changes

alamb reviewed Jan 3, 2024

View reviewed changes

For review

54f7bf0

viirya added 2 commits January 4, 2024 09:27

Add test

74a687f

Merge remote-tracking branch 'upstream/main' into cleanup

aba4dc0

viirya commented Jan 4, 2024

View reviewed changes

alamb approved these changes Jan 4, 2024

View reviewed changes

ozankabak requested a review from mustafasrepo January 4, 2024 18:50

mustafasrepo approved these changes Jan 5, 2024

View reviewed changes

alamb merged commit 29f23eb into apache:main Jan 5, 2024
22 checks passed

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule #8731

Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule #8731

viirya commented Jan 3, 2024 •

edited

Loading

viirya Jan 3, 2024

viirya Jan 3, 2024 •

edited

Loading

Dandandan Jan 3, 2024 •

edited

Loading

Dandandan Jan 3, 2024 •

edited

Loading

viirya Jan 3, 2024

viirya Jan 3, 2024

Dandandan Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

Dandandan Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

viirya Jan 3, 2024

Dandandan Jan 3, 2024

viirya Jan 3, 2024

alamb left a comment

viirya commented Jan 3, 2024

viirya Jan 4, 2024

alamb left a comment

mustafasrepo left a comment

viirya commented Jan 5, 2024 •

edited

Loading

		@@ -1237,12 +1231,7 @@ fn ensure_distribution(
		child = add_spm_on_top(child);

Move repartition_file_scans out of enable_round_robin check in EnforceDistribution rule #8731

Move repartition_file_scans out of enable_round_robin check in EnforceDistribution rule #8731

Conversation

viirya commented Jan 3, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

viirya Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Dandandan Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Dandandan Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

viirya commented Jan 3, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

mustafasrepo left a comment

Choose a reason for hiding this comment

viirya commented Jan 5, 2024 • edited Loading

Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule #8731

Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule #8731

viirya commented Jan 3, 2024 •

edited

Loading

viirya Jan 3, 2024 •

edited

Loading

Dandandan Jan 3, 2024 •

edited

Loading

Dandandan Jan 3, 2024 •

edited

Loading

viirya commented Jan 5, 2024 •

edited

Loading