Skip to content

Conversation

@RobinMorisset
Copy link
Contributor

The main win here is the third commit.
It is a followup to #9594, and was tested in prod on multiple kinds of machines and multiple services.
We observed some small but significant wins in scheduler utilisation, including at peak load which was the weak point of the previous batch of optimisations.
cc @tomas-abrahamsson since you saw large wins from the previous batch of scheduler enhancements.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 20, 2025

CT Test Results

    3 files    135 suites   49m 11s ⏱️
1 649 tests 1 592 ✅ 57 💤 0 ❌
2 285 runs  2 209 ✅ 76 💤 0 ❌

Results for commit 09c6a67.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Jun 23, 2025
@RobinMorisset
Copy link
Contributor Author

I appear to have missed some failed assert in debug builds, I'm looking into it.

@rickard-green
Copy link
Contributor

rickard-green commented Jul 23, 2025

The ASSERT(!is_normal_sched || (flags & ERTS_RUNQ_FLG_NONEMPTY)); hit since the ERTS_RUNQ_FLG_NONEMPTY flag more or less has been used as "the scheduler is not waiting" as well as "the run queue isn't empty", in the accompanying no_empty_run_queues variable, which are not the same thing. Since the fix in your first commit, the assert hits.

I added a new commit "Track waiting schedulers" removing the assert that hit and added another way of keeping track of waiting schedulers. It is not used everywhere where it could be used, but I think that the legacy configurations that this is the case for will behave more as before by not changing them.

Please remove the meta internal stuff like the following from the commit messages:

    Test Plan: CI + canary
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    
    
    Differential Revision: https://phabricator.intern.facebook.com/D72957513

@rickard-green rickard-green added the testing currently being tested, tag is used by OTP internal CI label Jul 23, 2025
RobinMorisset and others added 4 commits September 10, 2025 11:43
If we steal exactly one task, then we directly execute it, and it never
ends up in our runqueue. So it is wrong to update the flag telling
everyone that we have work they can steal.
If we steal more than one task, then we will call erts_add_runq_len,
which will take care of calling erts_non_empty_runq for us.
There are two benefits:
- Being FIFO rather than LIFO, we are much more likely to find the
  runqueue uncontended on the second pass (since it is a runqueue that
  was found contended a lot earlier)
- We can now do several passes, rather than defaulting to a blocking
  call to lock on the second pass.

There was just one small issue: the implementation of equeues did not
support passing them across function boundaries, since it used a
preprocesor macro to find the default queue (when trying to grow the
queue). I fixed that by adding a field to the queue itself.
@RobinMorisset
Copy link
Contributor Author

Thanks for fixing the assertion, and sorry for the messed-up commit messages.
I've fixed them and pushed the new version of the PR.

@rickard-green rickard-green merged commit b9bd4df into erlang:master Sep 30, 2025
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants