Fix collison_monitor performance by HovorunB · Pull Request #11 · logivations/navigation2

HovorunB · 2023-02-07T14:42:08Z

Basic Info

Info	Fix collison_monitor performance
Ticket(s) this addresses
Primary OS tested on	Ubuntu
Robotic platform tested on	gazebo, AMR47

Description of contribution in a few bullet points

The cmd_vel and velocity_smoother/cmd_vel topics have the frequency of 20hz. The collison_monitor/cmd_vel has 10hz in gazebo and ~16hz basement AMR. This is related to the collison_monitor waiting for the transform base_link->rgb_camera_link every time it receives input cmd_vel.

Save the transform in pointcloud object and reuse it instead of getting it on every callback

This was tested on basement AMR, collision monitor is able to keep up with 20hz frequency. The CPU usage by the collision monitor is around 25% while driving, and the logic still works

jplapp

nice find!

jplapp · 2023-02-07T14:44:49Z

nav2_collision_monitor/src/pointcloud.cpp

-  if (!getTransform(data_->header.frame_id, data_->header.stamp, curr_time, tf_transform)) {
-    return;
+  if (!tf_transform_received) {
+    tf_transform_received = getTransform(data_->header.frame_id, data_->header.stamp, curr_time, tf_transform);


It might be good to keep the return here, if it was not received after trying

jplapp · 2023-02-07T14:45:23Z

nav2_collision_monitor/include/nav2_collision_monitor/pointcloud.hpp

  /// @brief Latest data obtained from pointcloud
  sensor_msgs::msg::PointCloud2::ConstSharedPtr data_;

+  // Transform from the data source frame to the robot base frame


you could add a comment here why this is cached and which assumption is made

tonynajjar · 2023-02-08T09:17:50Z

Could you open a PR to upstream in parallel? I generally like this approach so that we get early feedback from the maintainers -> better than merging, opening a PR upstream, making changes, merging upstream and then getting these changes back into our fork. If they take too long too answer we won't wait of course

tonynajjar

@HovorunB very nice find! Could you run a small experiment to confirm your hypothesis:

Set the transform_tolerance parameter to 0, I think this is where the delay comes from. You will likely get extrapolation error messages, but this would be useful to confirm and to see by how much.

As discussed, If we want to merge this urgently and it works, feel free but let's make a follow up ticket to get this fix upstream (and potentially change the fix for us)

HovorunB · 2023-02-08T10:26:21Z

Set the transform_tolerance parameter to 0, I think this is where the delay comes from. You will likely get extrapolation error messages, but this would be useful to confirm and to see by how much.

@tonynajjar
With transform_tolerance=0 I'm getting extrapolation errors on every callback.

With our publishing frequency of 20hz the maximum we can allow is transform_tolerance=0.05, when i set it I was still getting errors every second, moreover the frequency of collision_monitor/cmd_vel topic fell back to 10hz

tonynajjar · 2023-02-08T12:59:11Z

That's an interesting log, it says it fails to get camera_rgb_link -> base_link and then that it's because it fails to get odom -> base_link. It's likely because they use the Advanced API of lookup_transform.

I'm not sure yet but I'm starting to think that this might not be a suitable solution, there must be a reason why we need odom -> base_link and by just saving the transform we skip that reason -> maybe you can figure out that reason?

Now, knowing that odom->base_link is actually the issue, publishing odom at a faster rate could also fix the issue.

Another interesting thing: I first thought this tolerance argument means that: "If I request a TF at a specific time but it's not available then grab the latest tf with time > time_requested - tolerance."

but looks like it's: "If I request a TF at a specific time but it's not available then wait tolerance until the tf at that time exists, before returning an error". This tolerance is actually called timeout in the lower level api

I'm not 100% sure of what I'm saying yet, just sharing my thoughts. Would be cool if you can do this investigation and confirm or deny my hypothesis.

tonynajjar · 2023-02-08T13:05:57Z

Another interesting thing: I first thought this tolerance argument means that: "If I request a TF at a specific time but it's not available then grab the latest tf with time > now - tolerance."
but looks like it's: "If I request a TF at a specific time but it's not available then wait tolerance before returning an error". This tolerance is actually called timeout in the lower level api

Okay I think this is confirmed here

It could be interesting to implement an argument to act like I first described 🤔

jplapp · 2023-02-08T13:23:48Z

From the docs:

so, in our case, it computes rgb_link to map, then travels in map, then computes map to base_link. So, it traverses odom for which we don't have a transform yet.

This all sounds way to complex given that in nearly all cases the TF between base_link and camera will be static anyway.
So maybe we could use transform = tf_->lookupTransform(base_frame_id_, data_->header.frame_id, data_->header.stamp, transform_tolerance_); instead?

This reverts commit 0d737ec.

This reverts commit 56c1766.

This reverts commit 11ab57c.

HovorunB · 2023-02-08T16:50:40Z

Changed to "simple API of lookup transform", the frequency looks great. it is able to keep up and I get no extrapolation errors even when running it at 100hz

jplapp · 2023-02-08T17:00:34Z

nice, I think this is now a good candidate to merge and at the same time to ask for feedback from nav2 team

tonynajjar · 2023-02-08T17:00:46Z

Changed to "simple API of lookup transform", the frequency looks great. it is able to keep up and I get no extrapolation errors even when running it at 100hz

Sure that will work as with the simple API you don't get odom->base_link, but there is probably a reason why they use the advanced API, e.g. to solve the same problem described here

tonynajjar · 2023-02-08T17:02:55Z

From the docs:

so, in our case, it computes rgb_link to map, then travels in map, then computes map to base_link. So, it traverses odom for which we don't have a transform yet.

This all sounds way to complex given that in nearly all cases the TF between base_link and camera will be static anyway. So maybe we could use transform = tf_->lookupTransform(base_frame_id_, data_->header.frame_id, data_->header.stamp, transform_tolerance_); instead?

Ah sorry I just read @jplapp's answer here, ignore my previous comment until I check

tonynajjar · 2023-02-08T17:14:02Z

nice, I think this is now a good candidate to merge and at the same time to ask for feedback from nav2 team

Alright sounds good to me, I still have to wrap my head around this advanced API....

But if the answer is: to account for a dynamic camera_link->base_link, then we have to think:

1- if we keep these changes
2- revert changes & publish odom faster
3- revert changes & accept the timeout -> not possible, too much delay

jplapp · 2023-02-08T17:32:47Z

I think this new change does support a dynamic camera_link->base_link - it gets the transform between base_link and camera_link at the time of capture of the pointcloud and then projects the pointcloud into the baselink frame at the current time. (This is the simple example from the docs “What was the pose of carrot1 5 seconds ago, relative to turtle2 5 seconds ago?”). It seems like what we need here, if there would somehow be a moving camera on the robot.

Honestly I don't understand what the advanced API in this case even tries to answer, but maybe we will know soon 😄

HovorunB · 2023-02-08T18:25:00Z

projects the pointcloud into the baselink frame at the current time.

I think this is the problem they tried to solve what the advanced API, because we want to publish the pointcloud into the base_link frame at the time of capture and not at the current time. So I feel like it also affects our case with the static sensor.
Imagine AMR driving with very high speed towards the obstacle, by the time it receives the info that the obstacle is 2m away and publishes it into current base_link frame, the robot would already be closer than 2m, but would still think that it is 2m away.
Or did I miss something?

jplapp · 2023-02-08T20:35:34Z

that makes sense!

HovorunB · 2023-02-09T12:01:13Z

Follow up ticket: https://lvserv01.logivations.com/browse/AMRNAV-4141

HovorunB added 2 commits February 7, 2023 15:15

Fix collison_monitor performance

11ab57c

move

56c1766

HovorunB added the bug Something isn't working label Feb 7, 2023

HovorunB requested review from jplapp and tonynajjar February 7, 2023 14:42

HovorunB self-assigned this Feb 7, 2023

HovorunB requested a review from andriimaistruk February 7, 2023 14:42

jplapp approved these changes Feb 7, 2023

View reviewed changes

tonynajjar reviewed Feb 8, 2023

View reviewed changes

add return, docs

0d737ec

HovorunB added 4 commits February 8, 2023 17:20

Revert "add return, docs"

3aae34a

This reverts commit 0d737ec.

Revert "move"

d34b8da

This reverts commit 56c1766.

Revert "Fix collison_monitor performance"

dabed0d

This reverts commit 11ab57c.

Change lookuptransform parameters

e165fda

tonynajjar approved these changes Feb 8, 2023

View reviewed changes

HovorunB merged commit 22a3498 into humble Feb 9, 2023

HovorunB mentioned this pull request Jul 19, 2023

Nav2 fork update #38

Merged

7 tasks

Conversation

HovorunB commented Feb 7, 2023

Basic Info

Description of contribution in a few bullet points

Uh oh!

jplapp left a comment

Choose a reason for hiding this comment

Uh oh!

jplapp Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

jplapp Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

tonynajjar commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonynajjar left a comment

Choose a reason for hiding this comment

Uh oh!

HovorunB commented Feb 8, 2023

Uh oh!

tonynajjar commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonynajjar commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jplapp commented Feb 8, 2023

Uh oh!

HovorunB commented Feb 8, 2023

Uh oh!

jplapp commented Feb 8, 2023

Uh oh!

tonynajjar commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonynajjar commented Feb 8, 2023

Uh oh!

tonynajjar commented Feb 8, 2023

Uh oh!

jplapp commented Feb 8, 2023

Uh oh!

HovorunB commented Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jplapp commented Feb 8, 2023

Uh oh!

HovorunB commented Feb 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tonynajjar commented Feb 8, 2023 •

edited

Loading

tonynajjar commented Feb 8, 2023 •

edited

Loading

tonynajjar commented Feb 8, 2023 •

edited

Loading

tonynajjar commented Feb 8, 2023 •

edited

Loading

HovorunB commented Feb 8, 2023 •

edited

Loading