Skip to content

Commit f8d80cb

Browse files
authored
Add Camera Benchmark Tool and Allow Correct Unprojection of distance_to_camera depth image (#976)
1 parent e6f4ed1 commit f8d80cb

File tree

7 files changed

+1136
-1
lines changed

7 files changed

+1136
-1
lines changed

CONTRIBUTORS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Guidelines for modifications:
4141
* Calvin Yu
4242
* Chenyu Yang
4343
* David Yang
44+
* Gary Lvov
4445
* HoJin Jeon
4546
* Jean Tampon
4647
* Jia Lin Yuan
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
.. _how-to-estimate-how-cameras-can-run:
2+
3+
4+
Find How Many/What Cameras You Should Train With
5+
================================================
6+
7+
.. currentmodule:: omni.isaac.lab
8+
9+
Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras,
10+
and Ray Caster cameras. These camera types differ in functionality and performance. The ``benchmark_cameras.py``
11+
script can be used to understand the difference in cameras types, as well to characterize their relative performance
12+
at different parameters such as camera quantity, image dimensions, and data types.
13+
14+
This utility is provided so that one easily can find the camera type/parameters that are the most performant
15+
while meeting the requirements of the user's scenario. This utility also helps estimate
16+
the maximum number of cameras one can realistically run, assuming that one wants to maximize the number
17+
of environments while minimizing step time.
18+
19+
This utility can inject cameras into an existing task from the gym registry,
20+
which can be useful for benchmarking cameras in a specific scenario. Also,
21+
if you install ``pynvml``, you can let this utility automatically find the maximum
22+
numbers of cameras that can run in your task environment up to a
23+
certain specified system resource utilization threshold (without training; taking zero actions
24+
at each timestep).
25+
26+
This guide accompanies the ``benchmark_cameras.py`` script in the ``IsaacLab/source/standalone/tutorials/04_sensors``
27+
directory.
28+
29+
.. dropdown:: Code for benchmark_cameras.py
30+
:icon: code
31+
32+
.. literalinclude:: ../../../source/standalone/tutorials/04_sensors/benchmark_cameras.py
33+
:language: python
34+
:linenos:
35+
36+
37+
Possible Parameters
38+
-------------------
39+
40+
First, run
41+
42+
.. code-block:: bash
43+
44+
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h
45+
46+
to see all possible parameters you can vary with this utility.
47+
48+
49+
See the command line parameters related to ``autotune`` for more information about
50+
automatically determining maximum camera count.
51+
52+
53+
Compare Performance in Task Environments and Automatically Determine Task Max Camera Count
54+
------------------------------------------------------------------------------------------
55+
56+
Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.
57+
58+
For example, to see how your system could handle 100 tiled cameras in
59+
the cartpole environment, with 2 cameras per environment (so 50 environments total)
60+
only in RGB mode, run
61+
62+
.. code-block:: bash
63+
64+
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
65+
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
66+
--task_num_cameras_per_env 2 \
67+
--tiled_camera_data_types rgb
68+
69+
If you have pynvml installed, (``./isaaclab.sh -p -m pip install pynvml``), you can also
70+
find the maximum number of cameras that you could run in the specified environment up to
71+
a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent,
72+
max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras
73+
you can run with cartpole, you could run:
74+
75+
.. code-block:: bash
76+
77+
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
78+
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
79+
--task_num_cameras_per_env 2 \
80+
--tiled_camera_data_types rgb --autotune \
81+
--autotune_max_percentage_util 100 80 50 50
82+
83+
Autotune may lead to the program crashing, which means that it tried to run too many cameras at once.
84+
However, the max percentage utilization parameter is meant to prevent this from happening.
85+
86+
The output of the benchmark doesn't include the overhead of training the network, so consider
87+
decreasing the maximum utilization percentages to account for this overhead. The final output camera
88+
count is for all cameras, so to get the total number of environments, divide the output camera count
89+
by the number of cameras per environment.
90+
91+
92+
Compare Camera Type and Performance (Without a Specified Task)
93+
--------------------------------------------------------------
94+
95+
This tool can also asses performance without a task environment.
96+
For example, to view 100 random objects with 2 standard cameras, one could run
97+
98+
.. code-block:: bash
99+
100+
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
101+
--height 100 --width 100 --num_standard_cameras 2 \
102+
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
103+
--experiment_length 100
104+
105+
If your system cannot handle this due to performance reasons, then the process will be killed.
106+
It's recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get
107+
an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like ``htop`` and ``nvtop``
108+
to live monitor resources while running this script, and in Windows, you can use the Task Manager.
109+
110+
If your system has a hard time handling the desired cameras, you can try the following
111+
112+
- Switch to headless mode (supply ``--headless``)
113+
- Ensure you are using the GPU pipeline not CPU!
114+
- If you aren't using Tiled Cameras, switch to Tiled Cameras
115+
- Decrease camera resolution
116+
- Decrease how many data_types there are for each camera.
117+
- Decrease the number of cameras
118+
- Decrease the number of objects in the scene
119+
120+
If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal.
121+
After the simulations stops it can be closed with CTRL C.

docs/source/how-to/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,17 @@ This guide explains how to save the camera output in Isaac Lab.
4646

4747
save_camera_output
4848

49+
Estimate How Many Cameras Can Run On Your Machine
50+
-------------------------------------------------
51+
52+
This guide demonstrates how to estimate the number of cameras one can run on their machine under the desired parameters.
53+
54+
.. toctree::
55+
:maxdepth: 1
56+
57+
estimate_how_many_cameras_can_run
58+
59+
4960
Drawing Markers
5061
---------------
5162

source/extensions/omni.isaac.lab/docs/CHANGELOG.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,18 @@ Changelog
22
---------
33

44

5+
0.24.14 (2024-09-20)
6+
~~~~~~~~~~~~~~~~~~~~
7+
8+
Added
9+
^^^^^
10+
11+
* Added :meth:`convert_perspective_depth_to_orthogonal_depth`. :meth:`unproject_depth` assumes
12+
that the input depth image is orthogonal. The new :meth:`convert_perspective_depth_to_orthogonal_depth`
13+
can be used to convert a perspective depth image into an orthogonal depth image, so that the point cloud
14+
can be unprojected correctly with :meth:`unproject_depth`.
15+
16+
517
0.24.13 (2024-09-08)
618
~~~~~~~~~~~~~~~~~~~~
719

source/extensions/omni.isaac.lab/omni/isaac/lab/utils/math.py

Lines changed: 105 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -988,7 +988,12 @@ def transform_points(
988988

989989
@torch.jit.script
990990
def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
991-
r"""Unproject depth image into a pointcloud.
991+
r"""Unproject depth image into a pointcloud. This method assumes that depth
992+
is provided orthogonally relative to the image plane, as opposed to absolutely relative to the camera's
993+
principal point (perspective depth). To unproject a perspective depth image, use
994+
:meth:`convert_perspective_depth_to_orthogonal_depth` to convert
995+
to an orthogonal depth image prior to calling this method. Otherwise, the
996+
created point cloud will be distorted, especially around the edges.
992997
993998
This function converts depth images into points given the calibration matrix of the camera.
994999
@@ -1059,6 +1064,105 @@ def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tens
10591064
return points_xyz
10601065

10611066

1067+
@torch.jit.script
1068+
def convert_perspective_depth_to_orthogonal_depth(
1069+
perspective_depth: torch.Tensor, intrinsics: torch.Tensor
1070+
) -> torch.Tensor:
1071+
r"""Provided depth image(s) where depth is provided as the distance to the principal
1072+
point of the camera (perspective depth), this function converts it so that depth
1073+
is provided as the distance to the camera's image plane (orthogonal depth).
1074+
1075+
This is helpful because `unproject_depth` assumes that depth is expressed in
1076+
the orthogonal depth format.
1077+
1078+
If `perspective_depth` is a batch of depth images and `intrinsics` is a single intrinsic matrix,
1079+
the same calibration matrix is applied to all depth images in the batch.
1080+
1081+
The function assumes that the width and height are both greater than 1.
1082+
1083+
Args:
1084+
perspective_depth: The depth measurement obtained with the distance_to_camera replicator.
1085+
Shape is (H, W) or or (H, W, 1) or (N, H, W) or (N, H, W, 1).
1086+
intrinsics: A tensor providing camera's calibration matrix. Shape is (3, 3) or (N, 3, 3).
1087+
1088+
Returns:
1089+
The depth image as if obtained by the distance_to_image_plane replicator. Shape
1090+
matches the input shape of depth
1091+
1092+
Raises:
1093+
ValueError: When depth is not of shape (H, W) or (H, W, 1) or (N, H, W) or (N, H, W, 1).
1094+
ValueError: When intrinsics is not of shape (3, 3) or (N, 3, 3).
1095+
"""
1096+
1097+
# Clone inputs to avoid in-place modifications
1098+
perspective_depth_batch = perspective_depth.clone()
1099+
intrinsics_batch = intrinsics.clone()
1100+
1101+
# Check if inputs are batched
1102+
is_batched = perspective_depth_batch.dim() == 4 or (
1103+
perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] != 1
1104+
)
1105+
1106+
# Track whether the last dimension was singleton
1107+
add_last_dim = False
1108+
if perspective_depth_batch.dim() == 4 and perspective_depth_batch.shape[-1] == 1:
1109+
add_last_dim = True
1110+
perspective_depth_batch = perspective_depth_batch.squeeze(dim=3) # (N, H, W, 1) -> (N, H, W)
1111+
if perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] == 1:
1112+
add_last_dim = True
1113+
perspective_depth_batch = perspective_depth_batch.squeeze(dim=2) # (H, W, 1) -> (H, W)
1114+
1115+
if perspective_depth_batch.dim() == 2:
1116+
perspective_depth_batch = perspective_depth_batch[None] # (H, W) -> (1, H, W)
1117+
1118+
if intrinsics_batch.dim() == 2:
1119+
intrinsics_batch = intrinsics_batch[None] # (3, 3) -> (1, 3, 3)
1120+
1121+
if is_batched and intrinsics_batch.shape[0] == 1:
1122+
intrinsics_batch = intrinsics_batch.expand(perspective_depth_batch.shape[0], -1, -1) # (1, 3, 3) -> (N, 3, 3)
1123+
1124+
# Validate input shapes
1125+
if perspective_depth_batch.dim() != 3:
1126+
raise ValueError(f"Expected perspective_depth to have 2, 3, or 4 dimensions; got {perspective_depth.shape}.")
1127+
if intrinsics_batch.dim() != 3:
1128+
raise ValueError(f"Expected intrinsics to have shape (3, 3) or (N, 3, 3); got {intrinsics.shape}.")
1129+
1130+
# Image dimensions
1131+
im_height, im_width = perspective_depth_batch.shape[1:]
1132+
1133+
# Get the intrinsics parameters
1134+
fx = intrinsics_batch[:, 0, 0].view(-1, 1, 1)
1135+
fy = intrinsics_batch[:, 1, 1].view(-1, 1, 1)
1136+
cx = intrinsics_batch[:, 0, 2].view(-1, 1, 1)
1137+
cy = intrinsics_batch[:, 1, 2].view(-1, 1, 1)
1138+
1139+
# Create meshgrid of pixel coordinates
1140+
u_grid = torch.arange(im_width, device=perspective_depth.device, dtype=perspective_depth.dtype)
1141+
v_grid = torch.arange(im_height, device=perspective_depth.device, dtype=perspective_depth.dtype)
1142+
u_grid, v_grid = torch.meshgrid(u_grid, v_grid, indexing="xy")
1143+
1144+
# Expand the grids for batch processing
1145+
u_grid = u_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
1146+
v_grid = v_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
1147+
1148+
# Compute the squared terms for efficiency
1149+
x_term = ((u_grid - cx) / fx) ** 2
1150+
y_term = ((v_grid - cy) / fy) ** 2
1151+
1152+
# Calculate the orthogonal (normal) depth
1153+
normal_depth = perspective_depth_batch / torch.sqrt(1 + x_term + y_term)
1154+
1155+
# Restore the last dimension if it was present in the input
1156+
if add_last_dim:
1157+
normal_depth = normal_depth.unsqueeze(-1)
1158+
1159+
# Return to original shape if input was not batched
1160+
if not is_batched:
1161+
normal_depth = normal_depth.squeeze(0)
1162+
1163+
return normal_depth
1164+
1165+
10621166
@torch.jit.script
10631167
def project_points(points: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
10641168
r"""Projects 3D points into 2D image plane.

source/extensions/omni.isaac.lab/test/utils/test_math.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,24 @@ def iter_old_quat_rotate_inverse(q: torch.Tensor, v: torch.Tensor) -> torch.Tens
376376
iter_old_quat_rotate_inverse(q_rand, v_rand),
377377
)
378378

379+
def test_depth_perspective_conversion(self):
380+
# Create a sample perspective depth image (N, H, W)
381+
perspective_depth = torch.tensor([[[10.0, 0.0, 100.0], [0.0, 3000.0, 0.0], [100.0, 0.0, 100.0]]])
382+
383+
# Create sample intrinsic matrix (3, 3)
384+
intrinsics = torch.tensor([[500.0, 0.0, 5.0], [0.0, 500.0, 5.0], [0.0, 0.0, 1.0]])
385+
386+
# Convert perspective depth to orthogonal depth
387+
orthogonal_depth = math_utils.convert_perspective_depth_to_orthogonal_depth(perspective_depth, intrinsics)
388+
389+
# Manually compute expected orthogonal depth based on the formula for comparison
390+
expected_orthogonal_depth = torch.tensor(
391+
[[[9.9990, 0.0000, 99.9932], [0.0000, 2999.8079, 0.0000], [99.9932, 0.0000, 99.9964]]]
392+
)
393+
394+
# Assert that the output is close to the expected result
395+
torch.testing.assert_close(orthogonal_depth, expected_orthogonal_depth)
396+
379397

380398
if __name__ == "__main__":
381399
run_tests()

0 commit comments

Comments
 (0)