Skip to content

Fix crash at shutdown of rqt_console plugin#30

Closed
mjeronimo wants to merge 1 commit intoros2from
mjeronimo/bugfix/shutdown-crash
Closed

Fix crash at shutdown of rqt_console plugin#30
mjeronimo wants to merge 1 commit intoros2from
mjeronimo/bugfix/shutdown-crash

Conversation

@mjeronimo
Copy link
Copy Markdown

Don't use the rqt node for subscribers. Instead, create a local node for this purpose.

Signed-off-by: Michael Jeronimo michael.jeronimo@openrobotics.org

Don't use the rqt node for subscribers. Instead, create a local
node for this purpose.

Signed-off-by: Michael Jeronimo <michael.jeronimo@openrobotics.org>
Copy link
Copy Markdown

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code looks reasonable, but I don't understand a few things:

  • What was the crash like (traceback, how to reproduce, etc)? What was causing it?
  • Why is this patch avoiding the crash?

@mjeronimo
Copy link
Copy Markdown
Author

@ivanpauno Sorry, should've had the info here as well (it was in the Galactic testing matrix).

To reproduce: Start rqt, insert the rqt_console plug-in, then 'x' the rqt_console plugin (not the overall rqt window). This generates a segfault.

What's happening: The RosPyPluginProvider creates a ROS2 node and it is available to plug-ins, like rqt_console. Rqt_console uses this node to create a subscription. When closing the rqt_console plug-in, the subscription is removed. However, as the node is still spinning on another thread, this creates the issue. Similar to this: ros2/rclpy#255.

The fix: Instead of using the node from the RosPyPluginProvider, create a ROS node when creating an instance of the plug-in and use this one. That way, we can control the node directly and ensure that we don't encounter this situation where the node is actively processing messages on one thread while the subscriber is destroyed on another.

@ivanpauno
Copy link
Copy Markdown

What's happening: The RosPyPluginProvider creates a ROS2 node and it is available to plug-ins, like rqt_console. Rqt_console uses this node to create a subscription. When closing the rqt_console plug-in, the subscription is removed. However, as the node is still spinning on another thread, this creates the issue. Similar to this: ros2/rclpy#255.

Thanks for clarifying!
That should actually work, i.e. removing a subscription while spinning in another thread is supposed to be safe.
I guess we reintroduced the issue recently (?).

We can introduce this as a workaround but it would be great if we can find the root cause in rclpy.

@mjeronimo
Copy link
Copy Markdown
Author

@ivanpauno @cottsay
Ok, I didn't realize it was supposed to work. Perhaps we should root cause the issue then before integrating any of these fixes.

@mjeronimo
Copy link
Copy Markdown
Author

Just confirmed that this issue only happens with Cyclone. This PR and ros-visualization/rqt_topic#30 are work-arounds, but we should get to the root of the problem w/ Cyclone and hold off on these two.

@ivanpauno
Copy link
Copy Markdown

ivanpauno commented Apr 7, 2021

Just confirmed that this issue only happens with Cyclone. This PR and ros-visualization/rqt_topic#30 are work-arounds, but we should get to the root of the problem w/ Cyclone and hold off on these two.

I investigated a little, the issue is that one of the subscriptions "event handlers" is being destroyed while the waitset still keeps a dangling pointer to it.
This seems to be poor lifetime management in rclpy, and not an issue in cyclonedds.
(the fact that it doesn't segfault with other implementations seems to be only by chance, depending on what the implementation is doing with the dangling rmw_event_t pointer you're more likely to get a segfault or not)

(edit) I will open an issue in rclpy and try to find a solution.
ros2/rclpy#757 should solve half of the problem at least (which will be merged to master in ros2/rclpy#756)

@sloretz
Copy link
Copy Markdown

sloretz commented Apr 7, 2021

@mjeronimo mind checking if ros2/rclpy#761 fixes the issue this is trying to solve?

@mjeronimo
Copy link
Copy Markdown
Author

Yes, I'll check now, thanks.

@mjeronimo
Copy link
Copy Markdown
Author

mjeronimo commented Apr 7, 2021

@sloretz I tried it out and ros2/rclpy#761 fixes both rqt_console and rqt_topic, which both create subscriptions on the rqt node and them also explicitly destroy them. From a quick scan I see that rqt_bag, rqt_reconfigure, and rqt_plot also destroy subscriptions, but I didn't test those out previously. I'll try them out too, just to see if this fixes some other issues as well.

Thanks @ivanpauno and @sloretz for getting to the root if this issue. I'll close and/or modify the two PRs (rqt_console and rqt_topic). rqt_topic is outputting a message that can now be removed with this fix:

TopicInfo.stop_monitoring(): Unsubscribing from a topic while the executor is spinning may segfault. This is a known bug in rclpy and is being addressed
TopicInfo.stop_monitoring(): See: https://github.com/ros2/rclpy/issues/255

@mjeronimo
Copy link
Copy Markdown
Author

mjeronimo commented Apr 7, 2021

FYI - I wasn't able find similar issues with rqt_bag, rqt_plot, or rqt_reconfigure. Not that it would never happen, but I couldn't reproduce.

@mjeronimo
Copy link
Copy Markdown
Author

I'll close this PR now in favor of ros2/rclpy#761.

@mjeronimo mjeronimo closed this Apr 7, 2021
@clalancette clalancette deleted the mjeronimo/bugfix/shutdown-crash branch August 11, 2023 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants