-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Use 1 thread for all fibers for an actor scheduling queue. #37949
[core] Use 1 thread for all fibers for an actor scheduling queue. #37949
Conversation
@@ -95,7 +94,6 @@ void ConcurrencyGroupManager<ExecutorType>::Stop() { | |||
} | |||
} | |||
|
|||
template class ConcurrencyGroupManager<FiberState>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing anything? Shouldn't we only have 1 fiber state (1 thread) per concurrency group. What do you mean by 1 thread per submitter worker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we have actor scheduling queue per caller worker.
All concurrency groups from a same worker uses a same thread. However
different worker uses different ConcurrencyGroupManager<FiberState> which
means different thread
On Wed, Aug 2, 2023 at 00:52 Jiajun Yao ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/ray/core_worker/transport/concurrency_group_manager.cc
<#37949 (comment)>:
> @@ -95,7 +94,6 @@ void ConcurrencyGroupManager<ExecutorType>::Stop() {
}
}
-template class ConcurrencyGroupManager<FiberState>;
Am I missing anything? Shouldn't we only have 1 fiber state (1 thread) per
concurrency group. What do you mean by 1 thread per submitter worker?
—
Reply to this email directly, view it on GitHub
<#37949 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANLX3XZOBCVVTARA7KA747TXTHMKRANCNFSM6AAAAAA26YXH5A>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Best,
Ruiyang
|
This requires changes to ConcurrencyGroupManager. Now it creates FiberState
with no constructor arguments and hence any proxy object must reference to
a global FiberState and keeps a concurrency group name and FDs. I feel this
approach does not do much added value compared to not using
ConcurrencyGroupManager
at all.
On Wed, Aug 2, 2023 at 01:05 Jiajun Yao ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/ray/core_worker/transport/actor_scheduling_queue.cc
<#37949 (comment)>:
> @@ -39,17 +39,17 @@ ActorSchedulingQueue::ActorSchedulingQueue(
ss << "\t" << concurrency_group.name << " : " << concurrency_group.max_concurrency;
}
RAY_LOG(INFO) << ss.str();
- fiber_state_manager_ = std::make_unique<ConcurrencyGroupManager<FiberState>>(
I think a simple fix is instead of creating a fiber_state_manager per
ActorSchedulingQueue, we should have a global one and pass in to the
ActorSchedulingQueue constructor. Similar to how we do it for std::shared_ptr<ConcurrencyGroupManager<BoundedExecutor>>
pool_manager, ?
—
Reply to this email directly, view it on GitHub
<#37949 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANLX3X4E3WM3GWROP5QG24DXTHNZLANCNFSM6AAAAAA26YXH5A>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Best,
Ruiyang
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test in the Python layer with the repro script?
6ed925b
to
30738e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
@@ -18,6 +18,8 @@ | |||
#include <chrono> | |||
|
|||
#include "ray/util/logging.h" | |||
#include "ray/util/macros.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed for RAY_UNUSED. Not sure why it did not complain before.
… rate limiters Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
…added python tests. Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
16a6333
to
3dcf8ad
Compare
Signed-off-by: Ruiyang Wang <[email protected]>
There's a ASAN & TSAN error I believe originates from boostorg/fiber#214. Trying to add flags to solve... |
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Turns out the ASAN is not from boost issues but from our own variable lifetime management. Fixed. |
Signed-off-by: Ruiyang Wang <[email protected]>
… into single-thread-for-fibers-per-actor
Signed-off-by: Ruiyang Wang <[email protected]>
18bd679
to
88c30d0
Compare
@@ -124,7 +134,7 @@ class FiberState { | |||
// no fibers can run after this point as we don't yield here. | |||
// This makes sure this thread won't accidentally | |||
// access being destructed core worker. | |||
fiber_stopped_event_.Notify(); | |||
fiber_stopped_event->Notify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a fiber_stopped_event->clear()
to explicitly free the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do clear because lambda captures the shared ptr as const. We can mark it mutable but I guess that's overkill.
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
…y-project#37949) Now we have 1 thread per submitter worker per actor to handle the fibers to submit the actor tasks. This design together with the fact that we don't stop these threads because of lack of means to stop boost fibers, makes the issue that we have unbounded number of threads in a actor process. This PR makes all fibers for an actor run in a same thread. This makes the number of threads in an actor process bounded. Also changed fiber_stopped_event to std::condition_variable and std::mutex. Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: NripeshN <[email protected]>
…y-project#37949) Now we have 1 thread per submitter worker per actor to handle the fibers to submit the actor tasks. This design together with the fact that we don't stop these threads because of lack of means to stop boost fibers, makes the issue that we have unbounded number of threads in a actor process. This PR makes all fibers for an actor run in a same thread. This makes the number of threads in an actor process bounded. Also changed fiber_stopped_event to std::condition_variable and std::mutex. Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: e428265 <[email protected]>
…y-project#37949) Now we have 1 thread per submitter worker per actor to handle the fibers to submit the actor tasks. This design together with the fact that we don't stop these threads because of lack of means to stop boost fibers, makes the issue that we have unbounded number of threads in a actor process. This PR makes all fibers for an actor run in a same thread. This makes the number of threads in an actor process bounded. Also changed fiber_stopped_event to std::condition_variable and std::mutex. Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Victor <[email protected]>
Now we have 1 thread per submitter worker per actor to handle the fibers to submit the actor tasks. This design together with the fact that we don't stop these threads because of lack of means to stop boost fibers, makes the issue that we have unbounded number of threads in a actor process.
This PR makes all fibers for an actor run in a same thread. This makes the number of threads in an actor process bounded.
Also changed fiber_stopped_event to std::condition_variable and std::mutex.
Fixes #33957.
Fixes #38240.