Conversation
| // Update the active goal | ||
| RealtimeGoalHandlePtr rt_goal = std::make_shared<RealtimeGoalHandle>(goal_handle); | ||
| rt_goal->preallocated_feedback_->joint_names = joint_names_; | ||
| rt_goal->execute(); | ||
| rt_active_goal_.writeFromNonRT(rt_goal); | ||
|
|
||
| // Setup goal status checking timer | ||
| goal_handle_timer_ = node_->create_wall_timer( | ||
| action_monitor_period_.to_chrono<std::chrono::seconds>(), | ||
| std::bind(&RealtimeGoalHandle::runNonRealtime, rt_active_goal_)); | ||
| std::bind(&RealtimeGoalHandle::runNonRealtime, rt_goal)); | ||
| } |
There was a problem hiding this comment.
I think this needs further work, probably in a follow-up PR.
In both the old and new code, I worry that we might end up in a condition where we're overwriting the goal_handle_timer_ while the previous rt_active_goal_ has pending operations, and thus dropping those operations. I think we should call goal_handle_timer_::execute_callback() or rt_active_goal_.readFromNonRT()->runNonRealtime() before creating the new timer.
Similarly, I think we could probably end up in a race condition where something is resetting the rt_active_goal_ right after this function writes the new rt_goal, and we could end up losing the goal handle. This would involve a little more work to solve, for example only resetting the rt_active_goal_ if new_data_available_ == false. That variable is private, but something along those lines.
Would appreciate your thoughts so we can consider appropriate follow-up PRs. This PR leaves the behaviour unchanged from before, and I haven't yet run into either of those cases, so I think we're ok to leave it to a future task.
There was a problem hiding this comment.
This write-up above is a perfect start for the description of a follow-up issue ;)
|
The CI failure looks unrelated: ros-tooling/setup-ros v0.1.3 was just released a few days ago, maybe something changed? Edit: Indeed, looks like a bump in Edit 2: PR opened, see #165 |
bmagyar
left a comment
There was a problem hiding this comment.
looks good to me, just 2 notes about creating follow-up issues
|
Thanks a bunch for this fix, a big step toward a green pipeline! :D |
Purpose
Addresses #132. Fixes a segfault that could occur in regular use of the joint_trajectory_controller, as exposed by the unit tests.
Summary
There was a race condition in the JTC, where
rt_active_goal_could bereset()in one thread (joint_trajectory_controller.cpp:568) but then dereferenced elsewhere. This would cause a segfault.For example, in
JointTrajectoryController::update(), we check thatrt_active_goal_is non-null, and then dereference it a couple lines later. But if we get unlucky with timing, we can reset the shared_ptr in another thread in-between the non-null check and the dereference. We don't currently have the appropriate thread safety mecanisms in place.By taking a copy of the
rt_active_goal_shared ptr before checking and using it, we ensure the local copy will never becoming invalid while we're holding it. TheRealtimeBufferis required since we need to read and write to the shared ptr concurrently from multiple threads.Testing done
✔️
colcon test --packages-select joint_trajectory_controller --retest-until-fail 100