Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging large values to rerun.set_time_seconds() causes hang/crash #2582

Closed
Nicholas-Autio-Mitchell opened this issue Jul 2, 2023 · 1 comment · Fixed by #2588
Closed
Assignees
Labels
🪳 bug Something isn't working 💣 crash crash, deadlock/freeze, do-no-start

Comments

@Nicholas-Autio-Mitchell
Copy link

Describe the bug

Using the Python SDK, logging a time before logging an entity causes the GUI to hang then crash. The process itself never exits.

I am logging various sensor data to the same stream via rerun.log_points(), rerun.log_image(), etc.

For each call, I first set the time of that recorded sensor data.

  • Using a "frame ID" works fine e.g. rr.set_time_sequence("frame_id", step).
  • I tried logging my dataset's timestamps as nanoseconds: rr.set_time_nanos("timestamp", timestamp). This wasn't correct for my case.
  • I instead logged timestamps as seconds: rr.set_time_nanos("timestamp", timestamp). This causes the hang/crash.

[sidenote: turned out my dataset uses a custom time reference: nanoseconds since something domain specific]

To Reproduce
Reproduced on the clock example: examples/python/clock/main.py. Apply the diff below, where we scale up the step by 1e9 e.g. nanoseconds:

@ examples/python/clock/main.py:46 @ def log_clock(steps: int) -> None:
    for step in range(steps):
        t_secs = step

-        rr.set_time_seconds("sim_time", t_secs)
+        rr.set_time_seconds("seconds", t_secs * 1e9)        # causes the app to hang
+        # rr.set_time_nanos("nano seconds", t_secs * 1e9)   # this works fine
  1. The full process:
pip install -U rerun-sdk  # get 0.7.0 at time of writing
git clone [email protected]:rerun-io/rerun.git  # Clone the repository
cd rerun/
git checkout 0.7.0
# code -n examples/python/clock/main.py  # open file to apply the diff above
python examples/python/clock/main.py  # opens the GUI

Interestingly, the logging itself takes a really high, suggesting some type checks/conversions are slowing things down?

  1. In the GUI, change the logging time to choose seconds, which causes the app to hang
    image

  2. Wait ~7 minutes, the GUI app does crash and close... but the process does not. You can still find python -m rerun --port 9876 in htop still chugging along doing something (for me CPU% floats around ~25%). Watching htop as I switch to seconds in the GUI to enter failure mode, I notice that memory spikes to fill my memory, then seems to clear itself and again spike, then clear itself and become stable.

Playing with datatypes on the Python side had no effect. E.g rr.set_time_seconds("seconds", float(t_secs * 1e9)), ends with the same result. I haven't starting poking around into the bindings/rust code.

Expected behavior

An error, or perhaps a sane conversion? The docs already speak of some assumptions on timestamps based on their magnitude.

Backtrace

No backtrace is given.

Desktop (please complete the following information):

  • OS: MacOS Ventura 13.4.1

Rerun version

rerun_py 0.7.0 [rustc 1.69.0 (84c898d65 2023-04-16), LLVM 15.0.7] aarch64-apple-darwin prepare-0.7 9cf3033, built 2023-06-16T15:39:37Z

@Nicholas-Autio-Mitchell Nicholas-Autio-Mitchell added 👀 needs triage This issue needs to be triaged by the Rerun team 🪳 bug Something isn't working labels Jul 2, 2023
@emilk emilk added 💣 crash crash, deadlock/freeze, do-no-start and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Jul 3, 2023
@emilk
Copy link
Member

emilk commented Jul 3, 2023

Whoa, thanks for the report and the reproduce!

I can reproduce on main, and in debug builds we get a panic:

thread 'ThreadId(1)' panicked at 'attempt to add with overflow', re_time_panel/src/paint_ticks.rs:213

   8: core::panicking::panic_fmt
             at core/src/panicking.rs:64:14
   9: core::panicking::panic
             at core/src/panicking.rs:114:5
  10: re_time_panel::paint_ticks::paint_ticks
             at re_time_panel/src/paint_ticks.rs:213:9
  11: re_time_panel::paint_ticks::paint_time_range_ticks
      re_time_panel::paint_ticks::paint_time_ranges_and_ticks
             at re_time_panel/src/paint_ticks.rs:49:21
  12: re_time_panel::TimePanel::expanded_ui
             at re_time_panel/src/lib.rs:263:9

I'll take a look right away

@emilk emilk self-assigned this Jul 3, 2023
emilk added a commit that referenced this issue Jul 4, 2023
### What
If the user logged times that were at the limit of i64 we would
sometimes get freezes and crashes.

Closes #2582

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/2588) (if
applicable)

- [PR Build Summary](https://build.rerun.io/pr/2588)
- [Docs
preview](https://rerun.io/preview/pr%3Aemilk%2Ffix-freeze-on-large-times/docs)
- [Examples
preview](https://rerun.io/preview/pr%3Aemilk%2Ffix-freeze-on-large-times/examples)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 💣 crash crash, deadlock/freeze, do-no-start
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants