Fix Possible Congestion Scenario in `SortPreservingMergeExec` #12302

berkaysynnada · 2024-09-03T11:05:20Z

Which issue does this PR close?

Rationale for this change

Please see the issue.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Update merge_fuzz.rs Update merge.rs Update merge.rs Update merge.rs Update merge_fuzz.rs Update merge.rs Update merge_fuzz.rs Update merge.rs Counter is not enough Termination logic with counter

berkaysynnada · 2024-09-03T11:47:46Z

datafusion/physical-plan/src/sorts/merge.rs

+                    }
+                    Poll::Pending => {
+                        self.uninitiated_partitions.rotate_left(1);
+                        cx.waker().wake_by_ref();


I am not sure if this usage has some side-effects or decrease performance, but I cannot wake the SPM poll again once it receives a pending from its first partition

I did some research -- see https://github.com/synnada-ai/datafusion-upstream/pull/34/files#r1743621057

I think calling wake_by_ref effectively tells tokio to schedule this poll loop again after handling other tasks, which makes sense to me (as I am not sure how else we would signal to tokio that the merge is ready to go)

But I share your concern that this will cause some sort of performance issue

berkaysynnada · 2024-09-03T12:15:25Z

@alamb, you might be more familiar with these parts of the code. Do you have any ideas about this solution, or perhaps you could suggest a different approach?

alamb

Thanks @berkaysynnada -- it seems to me like the intent of the current code is to poll all the streams but after this change, the streams are only polled until they first return Ready

I left a suggestion on a way that maybe is closer to what the orignal intent was

alamb · 2024-09-03T20:18:37Z

datafusion/physical-plan/src/sorts/merge.rs

@@ -97,6 +100,10 @@ pub(crate) struct SortPreservingMergeStream<C: CursorValues> {

    /// number of rows produced
    produced: usize,
+
+    /// Unitiated partitions. They are stored in a vector to keep them in


Can you please document what an "uninitiated partition" means in this context? I think it means partitions whose streams that have been polled haven't been ready yet

datafusion/physical-plan/src/sorts/merge.rs

alamb · 2024-09-03T20:24:04Z

datafusion/physical-plan/src/sorts/merge.rs

-                if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) {
-                    self.aborted = true;
-                    return Poll::Ready(Some(Err(e)));
+            // Ensure all non-exhausted streams have a cursor from which rows can be pulled


This comment implies to me that the code would / should poll all the streams. However, the code seems to ensure now that only streams that had previously not returned Ready for a poll are now polled.

IMO, the behavior is more correct now. In the previous version, let's assume the 1st partition is exhausted and returns None without setting its cursor. Then, the 2nd partition returns Pending. When poll_next_inner() is polled again, the iteration starts from the 1st partition, which has already returned None. AFAIK polling exhausted streams could cause problems). Therefore, I track which streams have returned a result (either None or Some()), and which ones have returned Pending only.

see synnada-ai#34 for alternate idea

I've tried to explain my concern with that: synnada-ai#34 (comment)

👍 -- response in synnada-ai#34 (comment)

tustvold · 2024-09-04T11:04:49Z

I'm afraid I don't have capacity to review this, and am not likely to in the foreseeable future, however, one thing to be aware of is that SortPreservingMerge must be stable. Therefore if the first stream is not ready it must wait for it to be so, before it can proceed

alamb · 2024-09-05T11:01:32Z

datafusion/physical-plan/src/sorts/merge.rs

-                if let Err(e) = ready!(self.maybe_poll_stream(cx, i)) {
-                    self.aborted = true;
-                    return Poll::Ready(Some(Err(e)));
+            // Ensure all non-exhausted streams have a cursor from which rows can be pulled


👍 -- response in synnada-ai#34 (comment)

alamb · 2024-09-05T11:03:03Z

datafusion/core/tests/fuzz_cases/merge_fuzz.rs

+}
+
+#[tokio::test]
+async fn test_spm_congestion() -> Result<()> {


I read this test a bit more -- it doesn't seem like it is actually a fuzz test (aka it doesn't seem to have any random inputs, for example).

I think it would make more sense to put it with the other sort preserving merge tests:

datafusion/datafusion/physical-plan/src/sorts/sort_preserving_merge.rs

Lines 301 to 302 in 6034be4

#[cfg(test)]

mod tests {

alamb

Per the discussion on synnada-ai#34 I am now convinced that this design is reasonable and an improvement over the current one.

In my opinion it is important for this PR:

Encode the rationale for polling streams
Fix the comments about "ensure all non-exhausted streams" to the new intent (something like "ensure all streams have been polled at least once to start processing") or something

It would be nice to have to:

Move the tests with the existing sort preserving merge tests (rather than fuzz)
Add a new test that verifies the "doesn't poll after Ready(None) is returned" as that seems an important property that is not currently covered

Thank you for bearing with me @berkaysynnada

berkaysynnada · 2024-09-06T07:37:32Z

I’ve incorporated the final feedback. If there aren’t any more suggestions, I think we are good to merge this PR once the tests pass.

alamb

I think the PR looks (really) good to me now -- thank you @berkaysynnada

berkaysynnada added 6 commits September 2, 2024 18:22

Ready

4382945

Update merge_fuzz.rs Update merge.rs Update merge.rs Update merge.rs Update merge_fuzz.rs Update merge.rs Update merge_fuzz.rs Update merge.rs Counter is not enough Termination logic with counter

Add comments

d14dde3

Increase threshold

1567c0c

Debug

02cdbfe

Remove merge.rs changes

eb068a0

Use waker

93a9c7c

github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Sep 3, 2024

berkaysynnada added 2 commits September 3, 2024 14:29

Update merge.rs

5640da8

Simplify the test

07bf172

berkaysynnada commented Sep 3, 2024

View reviewed changes

Update merge_fuzz.rs

2e95923

alamb reviewed Sep 3, 2024

View reviewed changes

Update merge.rs

9f32d2b

alamb changed the title ~~Fix Possible Congestion Scenario in SPM~~ Fix Possible Congestion Scenario in SortPreservingMergeExec Sep 4, 2024

alamb mentioned this pull request Sep 4, 2024

Fix Possible Congestion Scenario in SPM #12230

Closed

alamb mentioned this pull request Sep 4, 2024

Poll all streams when not ready synnada-ai/datafusion-upstream#34

Closed

alamb reviewed Sep 5, 2024

View reviewed changes

alamb approved these changes Sep 5, 2024

View reviewed changes

berkaysynnada added 2 commits September 6, 2024 10:29

Addresses the latest review

c8b32a5

fix clippy

4bdbafa

Use VecDeque for rotation

e0bf209

alamb approved these changes Sep 6, 2024

View reviewed changes

alamb merged commit f2a8b07 into apache:main Sep 6, 2024
24 checks passed

alamb mentioned this pull request Sep 11, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 2, 2024 #12336

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Possible Congestion Scenario in `SortPreservingMergeExec` #12302

Fix Possible Congestion Scenario in `SortPreservingMergeExec` #12302

berkaysynnada commented Sep 3, 2024

berkaysynnada Sep 3, 2024

alamb Sep 4, 2024

berkaysynnada commented Sep 3, 2024

alamb left a comment

alamb Sep 3, 2024

alamb Sep 3, 2024

berkaysynnada Sep 4, 2024 •

edited

Loading

alamb Sep 5, 2024

berkaysynnada Sep 5, 2024

alamb Sep 5, 2024

tustvold commented Sep 4, 2024

alamb Sep 5, 2024

alamb Sep 5, 2024

alamb left a comment •

edited

Loading

berkaysynnada commented Sep 6, 2024

alamb left a comment

Fix Possible Congestion Scenario in SortPreservingMergeExec #12302

Fix Possible Congestion Scenario in SortPreservingMergeExec #12302

Conversation

berkaysynnada commented Sep 3, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

berkaysynnada commented Sep 3, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

berkaysynnada Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment • edited Loading

Choose a reason for hiding this comment

berkaysynnada commented Sep 6, 2024

alamb left a comment

Choose a reason for hiding this comment

Fix Possible Congestion Scenario in `SortPreservingMergeExec` #12302

Fix Possible Congestion Scenario in `SortPreservingMergeExec` #12302

berkaysynnada Sep 4, 2024 •

edited

Loading

alamb left a comment •

edited

Loading