-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Serve microbenchmarks occasionally crash with segfault or invalid memory access #50802
Comments
I am able to reproduce this issue by running the handle throughput microbenchmark in a loop using the same instance type as the release test. From the core dump, here is the full stack trace:
|
And another core dump:
|
The first stack trace should be fixed by: #50740. Re-ran the tests many times with this fix and did not see any crashes. However, it doesn't seem related to the second stack trace. Planning to downgrade this issue to a P1 after the linked PR is merged, then follow up and wrap the async callbacks we're passing to C++ in a |
The serve microbenchmark has been sporadically failing due to memory corruption issues (see the linked issue). One of the tracebacks captured pointed to the fact that the `deleted_generator_ids_` map was being accessed concurrently by multiple threads. Fixed by adding a mutex. Verified that it at least dramatically reduces the frequency of the crashes. I've also renamed a few fields for clarity. ## Related issue number #50802 --------- Signed-off-by: Edward Oakes <[email protected]>
…ject#50740) The serve microbenchmark has been sporadically failing due to memory corruption issues (see the linked issue). One of the tracebacks captured pointed to the fact that the `deleted_generator_ids_` map was being accessed concurrently by multiple threads. Fixed by adding a mutex. Verified that it at least dramatically reduces the frequency of the crashes. I've also renamed a few fields for clarity. ## Related issue number ray-project#50802 --------- Signed-off-by: Edward Oakes <[email protected]>
…ject#50740) The serve microbenchmark has been sporadically failing due to memory corruption issues (see the linked issue). One of the tracebacks captured pointed to the fact that the `deleted_generator_ids_` map was being accessed concurrently by multiple threads. Fixed by adding a mutex. Verified that it at least dramatically reduces the frequency of the crashes. I've also renamed a few fields for clarity. ray-project#50802 --------- Signed-off-by: Edward Oakes <[email protected]>
The Serve
serve_microbenchmark.aws
test has been failing periodically with some very nasty stack traces related to either a segfault orSIGABRT
due to a malloc-related issue.Example: https://buildkite.com/ray-project/release/builds/30788#019487a6-dd46-4e84-bb84-66b09f24ab97/787-841
Through trial and error I've found:
The text was updated successfully, but these errors were encountered: