-
Notifications
You must be signed in to change notification settings - Fork 159
Flush out commQueue before stopping listener #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi 2dm! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
/ok to test 458e800 |
/build |
458e800
to
f30a139
Compare
f30a139
to
7a7cdb7
Compare
/ok to test 7a7cdb7 |
/build |
7a7cdb7
to
3b82789
Compare
/build |
/ok to test 99297c6 |
/build |
Co-authored-by: Adit Ranadive <[email protected]> Signed-off-by: Micha Dery <[email protected]>
/ok to test 4387963 |
/build |
/build |
1 similar comment
/build |
|
||
nixlAgent::~nixlAgent() { | ||
if (data && (data->useEtcd || data->config.useListenThread)) { | ||
data->agentShutdown = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC agentShutdown is accessed from different threads, it should be atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ovidiusm Same for commThreadStop then?
I still see 2 tests failing with SIGPIPE during cleanup, so I would try to stress test this locally to make sure there is no issue. |
What?
Verify that all enqueued work has completed.
Why?
There is a race between the agent destructor and jobs on
commQueue
since the background thread might finish without checking for existing work in the queue. In particular, the missed job is probablyinvalidateLocalMD
that is called right before, which will keep the MD intact and prevent the re-addition of that rank later.How?
commThreadStop