-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Adding a 'Thread Affinity' when using parallel features to control Thread usage by jobs #630
Comments
Your short jobs still have the potential to stall among themselves - for example, when waiting to join the result from another thread. At a stall, that thread will try to steal other jobs. If the affinity allows that it might steal one of the long jobs, then you'd have a priority inversion, delaying the completion of the short job. Similarly, if threads in the pool are already busy on a long job, they're not going to preempted when a new short job comes, so the short job won't have the full potential resources available. We don't have direct support for priorities, nor any design to do so. I would suggest using separate |
@cuviper Thanks for the answer. Concerning the resources usage, I think I should rephrase what I tried to do, I'll go with an example: I know that I will issue constantly a lot of short jobs but long jobs will be only issued sometimes. That way, short jobs will always be consumed by 6 to 8 threads and long jobs, if any, will be consumed at most by 2 threads. Concerning using using different thread pools, that is the way I am going currently, before attempting any modification to rayon. But I have two issues about it:
|
@fredpointzero As @cuviper said, the way we intended to deal with this was via
In fact, parallel iterators do use the thread-pool of the current worker. But what It might be useful to get a few more details about how your 'long, slow' jobs work -- are they parallel iterators? Is that something where you might want a "future" -- i.e., the ability to run the job asynchronously and then demand it later? Can you use the |
Hi, @nikomatsakis thanks for the update. There was indeed a confusion about how A bit of contextI am currently building a realtime engine and therefore needs to be accurate with task scheduling.
So, you end up with a graph of tasks to execute and identify clusters of tasks by queue. The task contractThe contract I want to enforce, is:
Current implementationI am currently using two So my goal with this issue, is to maximize the usage of CPU cores while ensuring above contract. Currently, my setup for a 8 core CPU is: Execution of tasks is as follow: Implementation of each tasks will probably use itself IssueIf I want to push a bit further, I will want to avoid context switches for each thread and have 1 thread per core. But this implies that Frame tasks and Long tasks are competing for the same threads, and I don't know if this is possible. (Although, I need to finish my implementation and profile it to understand what are the possible gain of removing context switches in that case.) |
An update from my side about the implementation of my scheduling system. I ended up dropping the "long task" scheduling feature: trying to mix short and long tasks scheduling in a single system is not worth the effort:
Finally, concerning my usage of rayon, I use it like this:
Although, this does not cancel the initial feature request: It would be nice to have somehow a single ThreadPool but a way to describe how threads would compete to take jobs. Another way of thinking this is like:
I would fork-join short jobs on ThreadPoolView n°1 and spawn long jobs on ThreadPoolView n°2. In that way, I am sure that:
Maybe this kind of API is better suited to fit with existing rayon API? |
After #636 it's possible to implement at least first part manually via ThreadPoolBuilder::new()
.spawn_handler(|t| std::thread::spawn(|| {
affinity::set_thread_affinity(match t.index() {
0..4 => &[1],
_ => &[3],
});
t.run();
})) Apparently the second part could be resolved via multiple |
@l4l a |
Motivation
To optimize thread usage, at the start of your application you would instantiate one thread per core and use work stealing to avoid thread context switch. This can fit perfectly with Rayon.
However, It is currently impossible to restrict the number of thread used for specific jobs in a pool.
Use case
In my application, I will have two kinds of jobs: short and long jobs.
Short jobs need to be processed in real time and can't stall, whereas long jobs are less constrained in time.
Currently, if I schedule a parallel iterator that execute a long job, it can be run on all the threads simultaneously and no other short jobs can be executed.
Proposal
ThreadPool
, one can provide the ThreadAffinity per spawned thread.Example of the API: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=08f720e91377bc8d6c823a69b10ef610
Highlights of the example:
Implementation suggestion
(This is the main spots I am thinking of, there will probably other place to update)
Updating
registry::in_worker
Updating
WorkerThread::take_local_job
What do you think of it?
The text was updated successfully, but these errors were encountered: