[scheduler] Another task-stealing optimization #9984

RobinMorisset · 2025-06-20T09:19:09Z

The main win here is the third commit.
It is a followup to #9594, and was tested in prod on multiple kinds of machines and multiple services.
We observed some small but significant wins in scheduler utilisation, including at peak load which was the weak point of the previous batch of optimisations.
cc @tomas-abrahamsson since you saw large wins from the previous batch of scheduler enhancements.

github-actions · 2025-06-20T09:19:48Z

CT Test Results

3 files 135 suites 49m 11s ⏱️
1 649 tests 1 592 ✅ 57 💤 0 ❌
2 285 runs 2 209 ✅ 76 💤 0 ❌

Results for commit 09c6a67.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

RobinMorisset · 2025-06-23T10:47:30Z

I appear to have missed some failed assert in debug builds, I'm looking into it.

rickard-green · 2025-07-23T10:31:09Z

The ASSERT(!is_normal_sched || (flags & ERTS_RUNQ_FLG_NONEMPTY)); hit since the ERTS_RUNQ_FLG_NONEMPTY flag more or less has been used as "the scheduler is not waiting" as well as "the run queue isn't empty", in the accompanying no_empty_run_queues variable, which are not the same thing. Since the fix in your first commit, the assert hits.

I added a new commit "Track waiting schedulers" removing the assert that hit and added another way of keeping track of waiting schedulers. It is not used everywhere where it could be used, but I think that the legacy configurations that this is the case for will behave more as before by not changing them.

Please remove the meta internal stuff like the following from the commit messages:

    Test Plan: CI + canary
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    
    Differential Revision: https://phabricator.intern.facebook.com/D72957513

If we steal exactly one task, then we directly execute it, and it never ends up in our runqueue. So it is wrong to update the flag telling everyone that we have work they can steal. If we steal more than one task, then we will call erts_add_runq_len, which will take care of calling erts_non_empty_runq for us.

There are two benefits: - Being FIFO rather than LIFO, we are much more likely to find the runqueue uncontended on the second pass (since it is a runqueue that was found contended a lot earlier) - We can now do several passes, rather than defaulting to a blocking call to lock on the second pass. There was just one small issue: the implementation of equeues did not support passing them across function boundaries, since it used a preprocesor macro to find the default queue (when trying to grow the queue). I fixed that by adding a field to the queue itself.

RobinMorisset · 2025-09-25T11:08:03Z

Thanks for fixing the assertion, and sorry for the messed-up commit messages.
I've fixed them and pushed the new version of the PR.

jhogberg added the team:VM Assigned to OTP team VM label Jun 23, 2025

jhogberg assigned jhogberg and rickard-green Jun 23, 2025

rickard-green added the testing currently being tested, tag is used by OTP internal CI label Jul 23, 2025

RobinMorisset and others added 4 commits September 10, 2025 11:43

Trivial refactoring of try_steal_task

110ff4a

Track waiting schedulers

09c6a67

RobinMorisset force-pushed the master branch from 65348b0 to 09c6a67 Compare September 10, 2025 18:45

rickard-green approved these changes Sep 30, 2025

View reviewed changes

rickard-green merged commit b9bd4df into erlang:master Sep 30, 2025
28 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[scheduler] Another task-stealing optimization #9984

[scheduler] Another task-stealing optimization #9984

Uh oh!

RobinMorisset commented Jun 20, 2025

Uh oh!

github-actions bot commented Jun 20, 2025 •

edited

Loading

Uh oh!

RobinMorisset commented Jun 23, 2025

Uh oh!

rickard-green commented Jul 23, 2025 •

edited

Loading

Uh oh!

RobinMorisset commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[scheduler] Another task-stealing optimization #9984

[scheduler] Another task-stealing optimization #9984

Uh oh!

Conversation

RobinMorisset commented Jun 20, 2025

Uh oh!

github-actions bot commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CT Test Results

Artifacts

Uh oh!

RobinMorisset commented Jun 23, 2025

Uh oh!

rickard-green commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobinMorisset commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jun 20, 2025 •

edited

Loading

rickard-green commented Jul 23, 2025 •

edited

Loading