Fix no value present scheduling failure with grouped execution by rschlussel · Pull Request #22538 · prestodb/presto

rschlussel · 2024-04-16T21:13:33Z

Description

Only use dynamic scheduling if all scans in a stage are using grouped execution

Motivation and Context

Previously queries with a mix of ungrouped and grouped scans in the same fragment could fail with the following stack trace:

java.util.NoSuchElementException: No value present
	at java.base/java.util.Optional.get(Optional.java:148)
	at com.facebook.presto.execution.scheduler.NodeScheduler.selectDistributionNodes(NodeScheduler.java:414)
	at com.facebook.presto.execution.scheduler.nodeSelection.SimpleNodeSelector.computeAssignments(SimpleNodeSelector.java:231)
	at com.facebook.presto.execution.scheduler.FixedSourcePartitionedScheduler$BucketedSplitPlacementPolicy.computeAssignments(FixedSourcePartitionedScheduler.java:316)
	at com.facebook.presto.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:273)
	at com.facebook.presto.execution.scheduler.FixedSourcePartitionedScheduler$AsGroupedSourceScheduler.schedule(FixedSourcePartitionedScheduler.java:353)
	at com.facebook.presto.execution.scheduler.FixedSourcePartitionedScheduler.schedule(FixedSourcePartitionedScheduler.java:240)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.schedule(SqlQueryScheduler.java:434)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Example query shape:
SELECT count(*) from bucketed_table1 t1 join (SELECT * FROM bucketed_table2 where some_column in (SELECTk key FROM my_small_broadcast_table) t2 on t1.key=t2.key group by t1.key;

The plan for the relevant fragment would look as follows:

          AGGREGATION
              |
           JOIN
           /    \
          /      \
TableScan(grouped) SemiJoin
                    /   \
 TableScan(ungrouped)   RemoteSource (replicated exchange)

Impact

Fixes a bug with mix of joins and semijoins using grouped execution

Test Plan

Added new test

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix an error for some queries using a mix of joins and semi-joins when grouped execution is enabled.

hantangwangd

Very nice catch, overall looks good to me. A little question for discussing.

hantangwangd · 2024-04-17T05:21:33Z

presto-main/src/main/java/com/facebook/presto/execution/scheduler/SectionExecutionFactory.java

+                if (plan.getFragment().getRemoteSourceNodes().stream().allMatch(node -> node.getExchangeType() == REPLICATE)
+                        && schedulingOrder.stream().allMatch(id -> plan.getFragment().getStageExecutionDescriptor().isScanGroupedExecution(id))) {
+                    // no non-replicated remote source and all scans are grouped


A little question, do we need this change? Should a fragment with broadcast-only remote sources but couldn't support dynamic lifespan scheduling still go through this path?

oh, I think you are right that we don't need this! The case you describe does go through here, but the stage execution descriptor won't be dynamicLifespanSchedule, and so we'll get a non-dynamic bucket node map in the code two lines below. Now I see that the else path is for stages that need a nodePartitionMap, which I believe is only needed for a partitioned remote source.

Dynamic lifespan schedule grouped execution assumes all scans are using grouped execution. Queries like the following can produce a plan where some scans are grouped and some are ungrouped SELECT count(*) from bucketed_table1 t1 join (SELECT * FROM bucketed_table2 where some_column in (SELECTk key FROM my_small_broadcast_table) t2 on t1.key=t2.key group by t1.key; The relevant would look as follows: AGGREGATION | JOIN / \ / \ TableScan(grouped) SemiJoin / \ TableScan(ungrouped) RemoteSource (replicated exchange) This fixes an error where such queries could fail with a "no value present" error during node scheduling.

hantangwangd

LGTM!

rschlussel requested review from a team and jaystarshot as code owners April 16, 2024 21:13

rschlussel requested a review from presto-oss April 16, 2024 21:13

rschlussel force-pushed the debug-scheduler-failure branch from 85c8225 to 043ec4b Compare April 16, 2024 21:15

rschlussel requested a review from arhimondr April 16, 2024 21:16

Add more debug information when node not present

7f1136b

rschlussel force-pushed the debug-scheduler-failure branch 4 times, most recently from dca7f13 to 9d7620f Compare April 17, 2024 00:48

hantangwangd reviewed Apr 17, 2024

View reviewed changes

rschlussel force-pushed the debug-scheduler-failure branch from 9d7620f to 2c4c693 Compare April 17, 2024 13:43

hantangwangd approved these changes Apr 17, 2024

View reviewed changes

arhimondr approved these changes Apr 17, 2024

View reviewed changes

rschlussel merged commit b398569 into prestodb:master Apr 17, 2024

rschlussel deleted the debug-scheduler-failure branch April 17, 2024 19:17

wanglinsong mentioned this pull request Jun 25, 2024

Add release notes for 0.288 #23079

Merged

36 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix no value present scheduling failure with grouped execution#22538

Fix no value present scheduling failure with grouped execution#22538
rschlussel merged 2 commits intoprestodb:masterfrom
rschlussel:debug-scheduler-failure

rschlussel commented Apr 16, 2024 •

edited

Loading

Uh oh!

hantangwangd left a comment

Uh oh!

hantangwangd Apr 17, 2024

Uh oh!

rschlussel Apr 17, 2024 •

edited

Loading

Uh oh!

rschlussel Apr 17, 2024

Uh oh!

hantangwangd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rschlussel commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

hantangwangd left a comment

Choose a reason for hiding this comment

Uh oh!

hantangwangd Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

rschlussel Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rschlussel Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

hantangwangd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rschlussel commented Apr 16, 2024 •

edited

Loading

rschlussel Apr 17, 2024 •

edited

Loading