-
Notifications
You must be signed in to change notification settings - Fork 528
Fix initializing two recordings with the same recording id causing SDK hangs #10201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix initializing two recordings with the same recording id causing SDK hangs #10201
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into this! Needs some more polish though
…str -> str failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running python tests with pixi run py-test
shows that a lot of tests are now broken because of the emitted warning. Haven't looked through the exact cause on all of them whether the warning should be handled, or strict mode is leaking etc. but this clearly needs some attention
Which pixi version are you using? it says in the toml file you're using
|
I found that returning the warning result to python caused multipile errors on the python side, I will ignore it as it was. I believe it's enough to see the warning on in terminal, logs and be known in python side. instead of raising as an error and hault the whole process. |
I really don't want to create precedence in ignoring warnings. This means that |
yikes. Turns out we never actually delete rust sided recordings ever. I.e. this: with rr.RecordingStream("rerun_example_test_recording") as rec:
pass will leak a recording on the rust side 😱 Looking into this as well now separately |
the leaking recordings is actually a known issue 😬 tried to tackle it, but I need help from @abey79 for that. Wasn't too hard to get things to work out without. So this should do now if ci passes. |
Why not to delete the recording on The implementation in rust and in python should be straight forward. /// Delete a recording stream by id.
#[allow(clippy::fn_params_excessive_bools)]
#[allow(clippy::struct_excessive_bools)]
#[pyfunction]
#[pyo3(signature = (
recording_id,
))]
fn delete_recording(_py: Python<'_>, recording_id: String) -> PyResult<()> {
if all_recordings()
.remove(&StoreId::from_string(
StoreKind::Recording,
recording_id.clone(),
))
.is_none()
{
utils::py_rerun_warn(format!("Recording id: {recording_id} not found.").as_str())?;
};
Ok(())
}
|
I think that's what we going end up doing in some form or capactiy but I found untangling the ramifications rather complicated. See:
Recording streams are internally already refcounted but we have this |
Ok, so we don't know what the grabage collector doing at any time, but can't we invoke the garabge collection in the destructor? |
There's no gurarantees what-so-ever on what |
### Related - Resoles: #10562 - Resolves: #10561 - While fixing the dead-lock originally addressed in: #10201 ### Also In order to preserve backwards compatibility in environments where `rr.init()` was depended upon to clean up existing resources semi-deterministically due to creation of a new stream, this now does a cleanup sweep of any recordings that only exist in the all_recordings list.
Related
What
The hang in the issue Initializing twice with same recording id hangs the SDK #9948 is caused when trying to insert duplicated recording, and since we don't need this behavior, just avoid inserting the new recording. The real reason for hang not known yet, but it's not replicated in PyTests.
Moved the py_rerun_warn function from
rerun_py/src/dataframe.rs
torerun_py/src/utils.rs
for readability.Added a new function that does both logging the warning using
re_log::warn
macro andPyErr::warn
to detect the warning in python as usingre_log::warn
macro alone isn't enough to detect the warning in python, and using PyErr::warn doesn't log the warning as per your code convention. I suggest to use this function for other warnings in the python bridgeAdded a unit test for detecting the warning of duplicated recording stream. Unforntually the hanging behavior can't be replicated in PyTests, still not sure why.