Fix initializing two recordings with the same recording id causing SDK hangs #10201

AhmedMousa-ag · 2025-06-11T07:39:37Z

What

The hang in the issue Initializing twice with same recording id hangs the SDK #9948 is caused when trying to insert duplicated recording, and since we don't need this behavior, just avoid inserting the new recording. The real reason for hang not known yet, but it's not replicated in PyTests.
Moved the py_rerun_warn function from rerun_py/src/dataframe.rs to rerun_py/src/utils.rs for readability.
Added a new function that does both logging the warning using re_log::warn macro and PyErr::warn to detect the warning in python as using re_log::warn macro alone isn't enough to detect the warning in python, and using PyErr::warn doesn't log the warning as per your code convention. I suggest to use this function for other warnings in the python bridge
Added a unit test for detecting the warning of duplicated recording stream. Unforntually the hanging behavior can't be replicated in PyTests, still not sure why.

…ng with warning

…recording

Wumpf

thanks for looking into this! Needs some more polish though

rerun_py/src/python_bridge.rs

rerun_py/src/utils.rs

rerun_py/tests/unit/test_expected_warnings.py

…str -> str failure

…test

Wumpf

Running python tests with pixi run py-test shows that a lot of tests are now broken because of the emitted warning. Haven't looked through the exact cause on all of them whether the warning should be handled, or strict mode is leaking etc. but this clearly needs some attention

AhmedMousa-ag · 2025-06-12T16:25:36Z

Which pixi version are you using? it says in the toml file you're using 0.34.0 but the pixi.lock refer to a higher version.

Running python tests with pixi run py-test shows that a lot of tests are now broken because of the emitted warning. Haven't looked through the exact cause on all of them whether the warning should be handled, or strict mode is leaking etc. but this clearly needs some attention

AhmedMousa-ag · 2025-06-12T17:26:35Z

I found that returning the warning result to python caused multipile errors on the python side, I will ignore it as it was. I believe it's enough to see the warning on in terminal, logs and be known in python side. instead of raising as an error and hault the whole process.

…ecording

Wumpf · 2025-06-13T08:45:33Z

I really don't want to create precedence in ignoring warnings. This means that strict_mode no longer works as expected everywhere.
I'm looking into fixing up those tests instead

Wumpf · 2025-06-13T09:07:16Z

yikes. Turns out we never actually delete rust sided recordings ever. I.e. this:

        with rr.RecordingStream("rerun_example_test_recording") as rec:
            pass

will leak a recording on the rust side 😱

Looking into this as well now separately

…ecording

Wumpf · 2025-06-13T11:34:32Z

the leaking recordings is actually a known issue 😬

Drop unused python recording streams #2116

tried to tackle it, but I need help from @abey79 for that. Wasn't too hard to get things to work out without. So this should do now if ci passes.

AhmedMousa-ag · 2025-06-13T15:55:53Z

Why not to delete the recording on __del__() destructor when the recording stream is deleted? Here

The implementation in rust and in python should be straight forward.

/// Delete a recording stream by id.
#[allow(clippy::fn_params_excessive_bools)]
#[allow(clippy::struct_excessive_bools)]
#[pyfunction]
#[pyo3(signature = (
    recording_id,
))]
fn delete_recording(_py: Python<'_>, recording_id: String) -> PyResult<()> {
    if all_recordings()
        .remove(&StoreId::from_string(
            StoreKind::Recording,
            recording_id.clone(),
        ))
        .is_none()
    {
        utils::py_rerun_warn(format!("Recording id: {recording_id} not found.").as_str())?;
    };
    Ok(())
}

the leaking recordings is actually a known issue 😬

Drop unused python recording streams #2116

tried to tackle it, but I need help from @abey79 for that. Wasn't too hard to get things to work out without. So this should do now if ci passes.

Wumpf · 2025-06-13T22:08:00Z

I think that's what we going end up doing in some form or capactiy but I found untangling the ramifications rather complicated. See:

rerun/rerun_py/src/python_bridge.rs

Line 47 in ae4c7e2

// The bridge needs to have complete control over the lifetimes of the individual recordings,
rerun/rerun_py/src/python_bridge.rs

Line 334 in ae4c7e2

// NOTE: Do **NOT** try and drain() `all_recordings` here.
rerun/rerun_py/src/python_bridge.rs

Line 465 in ae4c7e2

// Swapping the active data recording might drop the refcount of the currently active recording

Recording streams are internally already refcounted but we have this all_recordings registry on top and the Python object that wraps the recording is rather confusing as well: removing from the all_recordings registry doesn't do the actual drop which (see above comment links) is "dangerous" if done in the wrong spot, therefore the "real" drop of the inner recording stream (the one wrapped by the recording) happens at another point in time we don't control 😵‍💫
We might first need to make the python objects hold handles rather than "real" recording streams

AhmedMousa-ag · 2025-06-14T15:41:51Z

Ok, so we don't know what the grabage collector doing at any time, but can't we invoke the garabge collection in the destructor? gc.collect().
Usually invoking the garbage collection should be avoided, but is it worth it? will it solve the issue?

Wumpf · 2025-06-15T17:26:58Z

There's no gurarantees what-so-ever on what gc.collect exactly does, it's more of a hint

### Related - Resoles: #10562 - Resolves: #10561 - While fixing the dead-lock originally addressed in: #10201 ### Also In order to preserve backwards compatibility in environments where `rr.init()` was depended upon to clean up existing resources semi-deterministically due to creation of a new stream, this now does a cleanup sweep of any recordings that only exist in the all_recordings list.

AhmedMousa-ag added 4 commits June 10, 2025 19:16

feat(rerun_py, rust, utils): log warning to python func

5d2e85f

fix(rerun_py, rust bridge, new_recording): avoid creating new recordi…

80b26ca

…ng with warning

feat(rerun_py, unit test, new_recording): test warning of duplicated …

ccfd85f

…recording

chore(rerun_py, rust bridge, utils): formatted

d1f73db

Wumpf self-requested a review June 11, 2025 08:17

Wumpf changed the title ~~Rerun py avoid duplicated recording streams~~ Fix initializing two recordings with the same recording id causing SDK hangs Jun 11, 2025

Wumpf added 🪳 bug Something isn't working sdk-python Python logging API include in changelog labels Jun 11, 2025

Wumpf requested changes Jun 11, 2025

View reviewed changes

AhmedMousa-ag added 5 commits June 11, 2025 22:42

fix(py run, rust bridge, warning): rename warn cstr, ommit error on c…

0937ada

…str -> str failure

fix(py run, python bridge, new recording): ommit warning error res

97e7a4c

chore(py run, tests, expected warnings): seperate duplicated warning …

eee0478

…test

chore(py run, python bridge): rename py_rerun_warn

f0596cd

chore(py run, python bridge, warnings): docs

4cefefa

AhmedMousa-ag requested a review from Wumpf June 11, 2025 19:53

Wumpf requested changes Jun 12, 2025

View reviewed changes

feat(py run, tests, init twice): test the global recording id

0535b0d

fix(py run, python bridge, test): not raising the warning error

07282a4

Merge remote-tracking branch 'origin/main' into rerun_py_duplicated_r…

58720df

…ecording

Wumpf added 2 commits June 13, 2025 10:46

fix duplicated warning

c905d04

fix up & improve test_init_twice test

687a6c3

Wumpf added 2 commits June 13, 2025 13:33

handle strict mode correctly for duplicated recording id

7bdca56

Merge remote-tracking branch 'origin/main' into rerun_py_duplicated_r…

9ca4b43

…ecording

Wumpf approved these changes Jun 13, 2025

View reviewed changes

fix application id lint

7a6def0

Merge branch 'main' into rerun_py_duplicated_recording

1c6dfa7

Wumpf merged commit 298049f into rerun-io:main Jun 15, 2025
34 checks passed

This was referenced Jul 8, 2025

Running rr.init() no longer resets notebook state correctly #10561

Closed

Fix deadlock while preserving recording isolation #10563

Merged

Fix initializing two recordings with the same recording id causing SDK hangs #10201

Fix initializing two recordings with the same recording id causing SDK hangs #10201

Uh oh!

Conversation

AhmedMousa-ag commented Jun 11, 2025

Related

What

Uh oh!

Wumpf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wumpf left a comment

Choose a reason for hiding this comment

Uh oh!

AhmedMousa-ag commented Jun 12, 2025

Uh oh!

AhmedMousa-ag commented Jun 12, 2025

Uh oh!

Wumpf commented Jun 13, 2025

Uh oh!

Wumpf commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wumpf commented Jun 13, 2025

Uh oh!

AhmedMousa-ag commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wumpf commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AhmedMousa-ag commented Jun 14, 2025

Uh oh!

Wumpf commented Jun 15, 2025

Uh oh!

Uh oh!

Uh oh!

Wumpf commented Jun 13, 2025 •

edited

Loading

AhmedMousa-ag commented Jun 13, 2025 •

edited

Loading

Wumpf commented Jun 13, 2025 •

edited

Loading