Adds documentation and example scripts for sensors (#1443)

## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Michael Gussert <[email protected]> Co-authored-by: David Hoeller <[email protected]> Co-authored-by: Kelly Guo <[email protected]>
isaac-sim · Nov 25, 2024 · 4d99147 · 4d99147
1 parent d603b67
commit 4d99147
Show file tree

Hide file tree

Showing 21 changed files with 994 additions and 26 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -86,6 +86,7 @@ Table of Contents
 
    source/overview/developer-guide/index
    source/overview/core-concepts/index
+   source/overview/sensors/index
    source/overview/environments
    source/overview/reinforcement-learning/index
    source/overview/teleop_imitation
@@ -98,7 +99,7 @@ Table of Contents
 
    source/features/hydra
    source/features/multi_gpu
-   source/features/tiled_rendering
+   Tiled Rendering</source/overview/sensors/camera>
    source/features/reproducibility
 
 .. toctree::

diff --git a/docs/source/_static/overview/overview_sensors_contact_diagram.png b/docs/source/_static/overview/overview_sensors_contact_diagram.png
diff --git a/docs/source/_static/overview/overview_sensors_depth.png b/docs/source/_static/overview/overview_sensors_depth.png
diff --git a/docs/source/_static/overview/overview_sensors_frame_transformer.png b/docs/source/_static/overview/overview_sensors_frame_transformer.png
diff --git a/docs/source/_static/overview/overview_sensors_ft_visualizer.png b/docs/source/_static/overview/overview_sensors_ft_visualizer.png
diff --git a/docs/source/_static/overview/overview_sensors_instance.png b/docs/source/_static/overview/overview_sensors_instance.png
diff --git a/docs/source/_static/overview/overview_sensors_instanceID.png b/docs/source/_static/overview/overview_sensors_instanceID.png
diff --git a/docs/source/_static/overview/overview_sensors_normals.png b/docs/source/_static/overview/overview_sensors_normals.png
diff --git a/docs/source/_static/overview/overview_sensors_rc_patterns.png b/docs/source/_static/overview/overview_sensors_rc_patterns.png
diff --git a/docs/source/_static/overview/overview_sensors_rc_visualizer.png b/docs/source/_static/overview/overview_sensors_rc_visualizer.png
diff --git a/docs/source/_static/overview/overview_sensors_rgb.png b/docs/source/_static/overview/overview_sensors_rgb.png
diff --git a/docs/source/_static/overview/overview_sensors_semantic.png b/docs/source/_static/overview/overview_sensors_semantic.png
diff --git a/docs/source/features/tiled_rendering.rst → docs/source/overview/sensors/camera.rst b/docs/source/features/tiled_rendering.rst → docs/source/overview/sensors/camera.rst
@@ -1,7 +1,15 @@
-Tiled-Camera Rendering
-======================
+.. _overview_sensors_camera:
 
-.. currentmodule:: omni.isaac.lab
+
+Camera
+========
+
+Camera sensors are uniquely defined by the use of the ``render_product``, a structure for managing data generated by the rendering pipeline (images). Isaac Lab provides the ability to fully control how these renderings are created through camera parameters like focal length, pose, type, etc... and what kind of data you want to render through the use of Annotators, allowing you to record not only RGB, but also Instance segmentation, object pose, object ID, etc...
+
+Rendered images are unique among the supported data types in Isaac Lab due to the inherently large bandwidth requirements for moving those data. A single 800 x 600 image with 32-bit color (a single float per pixel) clocks in at just under 2 MB. If we render at 60 fps and record every frame, that camera needs to move 120 MB/s. Multiply this by the number of cameras in an environment and environments in a simulation, and you can quickly see how scaling a naive vectorization of camera data could lead to bandwidth challenges. NVIDIA's Isaac Lab leverages our expertise in GPU hardware to provide an API that specifically addresses these scaling challenges in the rendering pipeline.
+
+Tiled Rendering
+~~~~~~~~~~~~~~~~~
 
 .. note::
 
@@ -10,19 +18,9 @@ Tiled-Camera Rendering
     Tiled rendering in combination with image processing networks require heavy memory resources, especially
     at larger resolutions. We recommend running 512 cameras in the scene on RTX 4090 GPUs or similar.
 
+The Tiled Rendering APIs provide a vectorized interface for collecting data from camera sensors. This is useful for reinforcement learning environments where parallelization can be exploited to accelerate data collection and thus the training loop. Tiled rendering works by using a single ``render_product`` for **all** clones of a single camera in the scene. The desired dimensions of a single image and the number of environments are used to compute a much larger ``render_product``, consisting of the tiled individual renders from the separate clones of the camera. When all cameras have populated their buffers the render product is "completed" and can be moved around as a single, large image, dramatically reducing the overhead for moving the data from the host to the device, for example.  Only a single call is used to synchronize the device data, instead of one call per camera, and this is a big part of what makes the Tiled Rendering API more efficient for working with vision data.
 
-Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors.
-This is useful for reinforcement learning environments requiring vision in the loop.
-Tiled rendering works by concatenating camera outputs from multiple cameras and rendering
-one single large image instead of multiple smaller images that would have been produced
-by each individual camera. This reduces the amount of time required for rendering and
-provides a more efficient API for working with vision data.
-
-Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera`
-class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg`
-class, specifying parameters such as the regex expression for all camera paths, the transform
-for the cameras, the desired data type, the type of cameras to add to the scene, and the camera
-resolution.
+Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera` class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg` class, specifying parameters such as the regex expression for all camera paths, the transform for the cameras, the desired data type, the type of cameras to add to the scene, and the camera resolution.
 
 .. code-block:: python
 
@@ -37,28 +35,25 @@ resolution.
         height=80,
     )
 
-To access the tiled rendering interface, a :class:`~sensors.TiledCamera` object can be created and used
-to retrieve data from the cameras.
+To access the tiled rendering interface, a :class:`~sensors.TiledCamera` object can be created and used to retrieve data from the cameras.
 
 .. code-block:: python
 
     tiled_camera = TiledCamera(cfg.tiled_camera)
     data_type = "rgb"
     data = tiled_camera.data.output[data_type]
 
-The returned data will be transformed into the shape (num_cameras, height, width, num_channels), which
-can be used directly as observation for reinforcement learning.
+The returned data will be transformed into the shape (num_cameras, height, width, num_channels), which can be used directly as observation for reinforcement learning.
 
-When working with rendering, make sure to add the ``--enable_cameras`` argument when launching the
-environment. For example:
+When working with rendering, make sure to add the ``--enable_cameras`` argument when launching the environment. For example:
 
 .. code-block:: shell
 
     python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras
 
 
-Annotators and Data Types
--------------------------
+Annotators
+~~~~~~~~~~~~~~~~~
 
 Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide APIs for retrieving various types annotator data from replicator:
 
@@ -76,6 +71,11 @@ Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide
 RGB and RGBA
 ~~~~~~~~~~~~
 
+.. figure:: ../../_static/overview/overview_sensors_rgb.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``rgb`` data type returns a 3-channel RGB colored image of type ``torch.uint8``, with dimension (B, H, W, 3).
 
 ``rgba`` data type returns a 4-channel RGBA colored image of type ``torch.uint8``, with dimension (B, H, W, 4).
@@ -85,6 +85,11 @@ To convert the ``torch.uint8`` data to ``torch.float32``, divide the buffer by 2
 Depth and Distances
 ~~~~~~~~~~~~~~~~~~~
 
+.. figure:: ../../_static/overview/overview_sensors_depth.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``distance_to_camera`` returns a single-channel depth image with distance to the camera optical center. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
 
 ``distance_to_image_plane`` returns a single-channel depth image with distances of 3D points from the camera plane along the camera's Z-axis. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
@@ -94,6 +99,11 @@ Depth and Distances
 Normals
 ~~~~~~~
 
+.. figure:: ../../_static/overview/overview_sensors_normals.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``normals`` returns an image containing the local surface normal vectors at each pixel. The buffer has dimension (B, H, W, 3), containing the (x, y, z) information for each vector, and has data type ``torch.float32``.
 
 Motion Vectors
@@ -104,6 +114,11 @@ Motion Vectors
 Semantic Segmentation
 ~~~~~~~~~~~~~~~~~~~~~
 
+.. figure:: ../../_static/overview/overview_sensors_semantic.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``semantic_segmentation`` outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['semantic_segmentation']`` containing ID to labels information.
 
 - If ``colorize_semantic_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to semantic labels.
@@ -113,6 +128,11 @@ Semantic Segmentation
 Instance ID Segmentation
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
+.. figure:: ../../_static/overview/overview_sensors_instanceID.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``instance_id_segmentation_fast`` outputs instance ID segmentation of each entity in the camera’s viewport. The instance ID is unique for each prim in the scene with different paths. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_id_segmentation_fast']`` containing ID to labels information.
 
 The main difference between ``instance_id_segmentation_fast`` and ``instance_segmentation_fast`` are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, where instance ID segmentation always goes down to the leaf prim.
@@ -124,11 +144,18 @@ The main difference between ``instance_id_segmentation_fast`` and ``instance_seg
 Instance Segmentation
 """""""""""""""""""""
 
+.. figure:: ../../_static/overview/overview_sensors_instance.png
+    :align: center
+    :figwidth: 100%
+    :alt: A scene captured in RGB
+
 ``instance_segmentation_fast`` outputs instance segmentation of each entity in the camera’s viewport. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_segmentation_fast']`` containing ID to labels and ID to semantic information.
 
-- If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
+- If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``.
+
+- If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel.
 
-- If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from instance ID to semantic labels of that semantic entity.
+The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
 
 
 Current Limitations

diff --git a/docs/source/overview/sensors/contact_sensor.rst b/docs/source/overview/sensors/contact_sensor.rst
@@ -0,0 +1,145 @@
+.. _overview_sensors_contact:
+
+Contact Sensor
+================
+
+.. figure:: ../../_static/overview/overview_sensors_contact_diagram.png
+    :align: center
+    :figwidth: 100%
+    :alt: A contact sensor with filtering
+
+The contact sensor is designed to return the net contact force acting on a given ridgid body. The sensor is written to behave as a physical object, and so the "scope" of the contact sensor is limited to the body (or bodies) that defines it. There are multiple ways to define this scope, depending on your need to filter the forces coming from the contact.
+
+By default, the reported force is the total contact force, but your application may only care about contact forces due to specific objects. Retrieving contact forces from specific objects requires filtering, and this can only be done in a "many-to-one" way. A multi-legged robot that needs filterable contact information for its feet would require one sensor per foot to be defined in the environment, but a robotic hand with contact sensors on the tips of each finger can be defined with a single sensor.
+
+Consider a simple environment with an Anymal Quadruped and a block
+
+.. code-block:: python
+
+  @configclass
+  class ContactSensorsSceneCfg(InteractiveSceneCfg):
+      """Design the scene with sensors on the robot."""
+
+      # ground plane
+      ground = AssetBaseCfg(prim_path="/World/defaultGroundPlane", spawn=sim_utils.GroundPlaneCfg())
+
+      # lights
+      dome_light = AssetBaseCfg(
+          prim_path="/World/Light", spawn=sim_utils.DomeLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
+      )
+
+      # robot
+      robot = ANYMAL_C_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+      # Rigid Object
+      cube = RigidObjectCfg(
+          prim_path="{ENV_REGEX_NS}/Cube",
+          spawn=sim_utils.CuboidCfg(
+              size=(0.5,0.5,0.1),
+              rigid_props=sim_utils.RigidBodyPropertiesCfg(),
+              mass_props=sim_utils.MassPropertiesCfg(mass=100.0),
+              collision_props=sim_utils.CollisionPropertiesCfg(),
+              physics_material=sim_utils.RigidBodyMaterialCfg(static_friction=1.0),
+              visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.0, 1.0, 0.0), metallic=0.2),
+          ),
+          init_state=RigidObjectCfg.InitialStateCfg(pos=(0.5, 0.5, 0.05)),
+      )
+
+      contact_forces_LF = ContactSensorCfg(
+          prim_path="{ENV_REGEX_NS}/Robot/LF_FOOT",
+          update_period=0.0,
+          history_length=6,
+          debug_vis=True,
+          filter_prim_paths_expr=["{ENV_REGEX_NS}/Cube"],
+      )
+
+      contact_forces_RF = ContactSensorCfg(
+          prim_path="{ENV_REGEX_NS}/Robot/RF_FOOT",
+          update_period=0.0,
+          history_length=6,
+          debug_vis=True,
+          filter_prim_paths_expr=["{ENV_REGEX_NS}/Cube"],
+      )
+
+      contact_forces_H = ContactSensorCfg(
+          prim_path="{ENV_REGEX_NS}/Robot/.*H_FOOT",
+          update_period=0.0,
+          history_length=6,
+          debug_vis=True,
+      )
+
+We define the sensors on the feet of the robot in two different ways.  The front feet are independent sensors (one sensor body per foot) and the "Cube" is placed under the left foot.  The hind feet are defined as a single sensor with multiple bodies.
+
+We can then run the scene and print the data from the sensors
+
+.. code-block:: python
+
+  def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
+    .
+    .
+    .
+    # Simulate physics
+    while simulation_app.is_running():
+      .
+      .
+      .
+      # print information from the sensors
+      print("-------------------------------")
+      print(scene["contact_forces_LF"])
+      print("Received force matrix of: ", scene["contact_forces_LF"].data.force_matrix_w)
+      print("Received contact force of: ", scene["contact_forces_LF"].data.net_forces_w)
+      print("-------------------------------")
+      print(scene["contact_forces_RF"])
+      print("Received force matrix of: ", scene["contact_forces_RF"].data.force_matrix_w)
+      print("Received contact force of: ", scene["contact_forces_RF"].data.net_forces_w)
+      print("-------------------------------")
+      print(scene["contact_forces_H"])
+      print("Received force matrix of: ", scene["contact_forces_H"].data.force_matrix_w)
+      print("Received contact force of: ", scene["contact_forces_H"].data.net_forces_w)
+
+Here, we print both the net contact force and the filtered force matrix for each contact sensor defined in the scene. The front left and front right feet report the following
+
+.. code-block:: bash
+
+  -------------------------------
+  Contact sensor @ '/World/envs/env_.*/Robot/LF_FOOT':
+          view type         : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
+          update period (s) : 0.0
+          number of bodies  : 1
+          body names        : ['LF_FOOT']
+
+  Received force matrix of:  tensor([[[[-1.3923e-05,  1.5727e-04,  1.1032e+02]]]], device='cuda:0')
+  Received contact force of:  tensor([[[-1.3923e-05,  1.5727e-04,  1.1032e+02]]], device='cuda:0')
+  -------------------------------
+  Contact sensor @ '/World/envs/env_.*/Robot/RF_FOOT':
+          view type         : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
+          update period (s) : 0.0
+          number of bodies  : 1
+          body names        : ['RF_FOOT']
+
+  Received force matrix of:  tensor([[[[0., 0., 0.]]]], device='cuda:0')
+  Received contact force of:  tensor([[[1.3529e-05, 0.0000e+00, 1.0069e+02]]], device='cuda:0')
+
+Notice that even with filtering, both sensors report the net contact force acting on the foot. However only the left foot has a non zero "force matrix", because the right foot isn't standing on the filtered body, ``/World/envs/env_.*/Cube``. Now, checkout the data coming from the hind feet!
+
+.. code-block:: bash
+
+  -------------------------------
+  Contact sensor @ '/World/envs/env_.*/Robot/.*H_FOOT':
+          view type         : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
+          update period (s) : 0.0
+          number of bodies  : 2
+          body names        : ['LH_FOOT', 'RH_FOOT']
+
+  Received force matrix of:  None
+  Received contact force of:  tensor([[[9.7227e-06, 0.0000e+00, 7.2364e+01],
+          [2.4322e-05, 0.0000e+00, 1.8102e+02]]], device='cuda:0')
+
+In this case, the contact sensor has two bodies: the left and right hind feet.  When the force matrix is queried, the result is ``None`` because this is a many body sensor, and presently Isaac Lab only supports "many to one" contact force filtering. Unlike the single body contact sensor, the reported force tensor has multiple entries, with each "row" corresponding to the contact force on a single body of the sensor (matching the ordering at construction).
+
+.. dropdown:: Code for contact_sensor.py
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/demos/sensors/contact_sensor.py
+      :language: python
+      :linenos: