[Feat] Add new data form support for dataset queries by screw-44 · Pull Request #2688 · huggingface/lerobot

screw-44 · 2025-12-20T02:51:58Z

Type / Scope

Type: (Feature)
Scope: lerobot/datasets/lerobot_datasets.py

Summary / Motivation

Add a new feature of returning the full manipulation trajectory as affordance
support new data form for VLA training

What changed

Add special handling for 'affordance' key in _get_query_indices(), return variable-length indices from current frame to episode end for affordance
Implement affordance-specific query in _query_hf_dataset()
Map affordance queries to 'action' column
Update get_delta_indices() to handle affordance key specially

How was this tested

Manual checks / dataset runs performed on my local devices.

How to run locally (reviewer)

initualize the dataset as this

      self.train_dataset = LeRobotDataset(
          repo_id,
          root=self.root,
          episodes=self.train_episode,
          # delta_timestamps={"affordance": []}  
      )

Checklist (required before merge)

Linting/formatting run (pre-commit run -a)
All tests pass locally (pytest)
Documentation updated
CI is green

Reviewer notes

This is a little patch of features

- Add special handling for 'affordance' key in _get_query_indices() - Return variable-length indices from current frame to episode end for affordance - Implement affordance-specific query in _query_hf_dataset() - Map affordance queries to 'action' column - Update get_delta_indices() to handle affordance key specially

into affordance-support

Copilot

Pull request overview

This PR adds affordance support to LeRobot datasets, enabling queries that return variable-length action trajectories from the current frame to the end of an episode. This feature supports new forms of VLA (Vision-Language-Action) training by providing complete manipulation trajectories.

Key Changes:

Special handling for "affordance" key in dataset queries to return variable-length sequences
Mapping of affordance queries to the "action" column in the underlying dataset
Support for empty delta_timestamps list via {"affordance": []}

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
src/lerobot/datasets/utils.py	Added special handling in `get_delta_indices()` to preserve empty list for affordance key
src/lerobot/datasets/lerobot_dataset.py	Implemented affordance-specific logic in `_get_query_indices()` to generate indices from current frame to episode end, added affordance mapping in `_query_hf_dataset()`, and updated documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-20T02:55:25Z

src/lerobot/datasets/lerobot_dataset.py

+        query_indices = {}
+        padding = {}
+
+        for key, delta_idx in self.delta_indices.items():
+            # Special handling for "affordance" key: return all frames from current to episode end
+            if key == "affordance":
+                # Generate indices from idx to ep_end (exclusive), clamped to valid range
+                affordance_indices = list(range(max(idx, ep_start), ep_end))
+                # If range is empty, return at least the current frame
+                if not affordance_indices:
+                    affordance_indices = [max(idx, ep_start)]


The logic in line 939 has a potential issue. When idx < ep_start, the code uses max(idx, ep_start) which would start from ep_start. However, if idx is before the episode start, this means the current frame is outside the episode bounds, which seems like an invalid state. The code should either handle this edge case more explicitly or document why this behavior is acceptable. Additionally, the condition on line 941 checking if affordance_indices is empty would only be true if ep_start >= ep_end, which suggests an invalid episode. This edge case should be handled more robustly or validated earlier.

Suggested change

query_indices = {}

padding = {}

for key, delta_idx in self.delta_indices.items():

# Special handling for "affordance" key: return all frames from current to episode end

if key == "affordance":

# Generate indices from idx to ep_end (exclusive), clamped to valid range

affordance_indices = list(range(max(idx, ep_start), ep_end))

# If range is empty, return at least the current frame

if not affordance_indices:

affordance_indices = [max(idx, ep_start)]

# Validate episode bounds and requested index to avoid querying outside the episode.

if ep_start >= ep_end:

raise ValueError(f"Invalid episode bounds: dataset_from_index ({ep_start}) "

f"is not less than dataset_to_index ({ep_end}).")

if not (ep_start <= idx < ep_end):

raise IndexError(

f"Frame index {idx} is outside the episode range "

f"[{ep_start}, {ep_end})."

)

query_indices = {}

padding = {}

for key, delta_idx in self.delta_indices.items():

# Special handling for "affordance" key: return all frames from current to episode end

if key == "affordance":

# Generate indices from the current frame to ep_end (exclusive).

# The range is guaranteed to be non-empty by the bounds check above.

affordance_indices = list(range(idx, ep_end))

Copilot · 2025-12-20T02:55:25Z

src/lerobot/datasets/utils.py

+        # Special handling for "affordance" key: keep empty list (will be handled specially in _get_query_indices)
+        if key == "affordance":
+            delta_indices[key] = []
+        else:
+            delta_indices[key] = [round(d * fps) for d in delta_ts]


The new affordance feature lacks test coverage. Given that the repository has comprehensive test suites for dataset functionality (as seen in tests/datasets/test_delta_timestamps.py, test_datasets.py, etc.), this new feature should include tests to verify: 1) affordance queries return correct variable-length sequences from current frame to episode end, 2) empty list handling in get_delta_indices, 3) affordance mapping to action column in _query_hf_dataset, and 4) edge cases like querying at the last frame of an episode.

Copilot · 2025-12-20T02:55:25Z

src/lerobot/datasets/lerobot_dataset.py

+            if key == "affordance":
+                # Generate indices from idx to ep_end (exclusive), clamped to valid range
+                affordance_indices = list(range(max(idx, ep_start), ep_end))
+                # If range is empty, return at least the current frame
+                if not affordance_indices:
+                    affordance_indices = [max(idx, ep_start)]
+                query_indices[key] = affordance_indices
+                # No padding needed for affordance (variable length by design)
+                padding[f"{key}_is_pad"] = torch.BoolTensor([False] * len(affordance_indices))


The affordance-specific logic in _get_query_indices lacks test coverage. Tests should verify: 1) correct generation of indices from current frame to episode end, 2) behavior when querying at different positions within an episode (start, middle, end), 3) padding tensor generation with all False values, and 4) integration with the rest of the dataset querying pipeline.

Copilot · 2025-12-20T02:55:26Z

src/lerobot/datasets/lerobot_dataset.py

                torchvision.transforms.v2 here which will be applied to visual modalities (whether they come
                from videos or images). Defaults to None.
-            delta_timestamps (dict[list[float]] | None, optional): _description_. Defaults to None.
+            delta_timestamps (dict[list[float]] | None, optional): _description_. Defaults to None. Use {"affordance":[]} to get full trajectory.


The documentation for the delta_timestamps parameter is incomplete. It still shows "description" as a placeholder and the added note "Use {"affordance":[]} to get full trajectory" is unclear. The documentation should: 1) replace "description" with a proper explanation of what delta_timestamps does, 2) explain what "affordance" means in this context, 3) clarify what "full trajectory" refers to, and 4) describe the expected structure and behavior of the affordance feature more thoroughly.

Suggested change

delta_timestamps (dict[list[float]] | None, optional): _description_. Defaults to None. Use {"affordance":[]} to get full trajectory.

delta_timestamps (dict[str, list[float]] | None, optional): Mapping from feature names to lists

of relative time offsets (in seconds) that control which timesteps are materialized from the

underlying recorded trajectory. Each key should be the name of a feature or feature group

present in the dataset (for example, an observation, action, or label namespace), and each

value should be a list of floats representing time deltas relative to the reference control

timestamp for a given step. When ``delta_timestamps`` is ``None`` (default), all modalities are

read at their native timestamps without additional temporal offsets.

In this context, ``"affordance"`` refers to a special feature namespace used to store

task‑specific targets or labels that are defined per timestep along the trajectory. Passing

``{"affordance": []}`` disables sub‑sampling for that namespace and exposes the full trajectory,

i.e. the complete sequence of timesteps recorded for each episode, rather than a reduced set of

timesteps derived from non‑empty offset lists. The dictionary is expected to use feature names

as keys and lists of float offsets (typically sorted, in seconds) as values; these offsets are

validated against the dataset frame rate so that each offset corresponds to an existing frame

or step when added to the reference timestamps. Defaults to None.

Copilot · 2025-12-20T02:55:26Z

src/lerobot/datasets/utils.py

    delta_indices = {}
    for key, delta_ts in delta_timestamps.items():
-        delta_indices[key] = [round(d * fps) for d in delta_ts]
+        # Special handling for "affordance" key: keep empty list (will be handled specially in _get_query_indices)


The comment refers to "_get_query_indices" but should use proper backticks for code references to improve readability and follow documentation best practices. The comment should be: "Special handling for 'affordance' key: keep empty list (will be handled specially in _get_query_indices)"

Suggested change

# Special handling for "affordance" key: keep empty list (will be handled specially in _get_query_indices)

# Special handling for "affordance" key: keep empty list (will be handled specially in `_get_query_indices`)

Copilot · 2025-12-20T02:55:26Z

src/lerobot/datasets/lerobot_dataset.py

+                if not affordance_indices:
+                    affordance_indices = [max(idx, ep_start)]
+                query_indices[key] = affordance_indices
+                # No padding needed for affordance (variable length by design)


The comment on line 944 states "No padding needed for affordance (variable length by design)" but this is misleading. The code still creates a padding tensor with all False values, so padding is technically still present, just never set to True. The comment should clarify this: "Affordance sequences are variable length by design - padding tensor created with all False values since all frames are valid"

Suggested change

# No padding needed for affordance (variable length by design)

# Affordance sequences are variable length by design - padding tensor created with all False values since all frames are valid

Copilot · 2025-12-20T02:55:26Z

src/lerobot/datasets/lerobot_dataset.py

+            # Special handling for "affordance": query from "action" column
+            query_key = "action" if key == "affordance" else key
            try:
                result[key] = torch.stack(self.hf_dataset[key][relative_indices])


The first try block attempts to access self.hf_dataset[key][relative_indices] but should use query_key instead of key to properly handle the affordance case. When key is "affordance", the code should query the "action" column, but the try block still uses the original key name which will fail to find the "affordance" column.

Suggested change

result[key] = torch.stack(self.hf_dataset[key][relative_indices])

result[key] = torch.stack(self.hf_dataset[query_key][relative_indices])

screw-44 added 4 commits December 20, 2025 02:23

add documentation

ee099aa

add documentation

32a9d53

Merge branch 'affordance-support' of https://github.com/screw-44/lerobot

399132a

into affordance-support

Copilot AI review requested due to automatic review settings December 20, 2025 02:51

github-actions bot added the dataset Issues regarding data inputs, processing, or datasets label Dec 20, 2025

Copilot started reviewing on behalf of screw-44 December 20, 2025 02:52 View session

screw-44 changed the title ~~[Feat] Add affordance support for dataset queries~~ [Feat] Add new data form support for dataset queries Dec 20, 2025

Copilot AI reviewed Dec 20, 2025

View reviewed changes

screw-44 added 2 commits December 20, 2025 03:26

clean the code and fix minor issues

d01e919

abs_afff

b40fd2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Feat] Add new data form support for dataset queries#2688

[Feat] Add new data form support for dataset queries#2688
screw-44 wants to merge 6 commits intohuggingface:mainfrom
screw-44:affordance-support

screw-44 commented Dec 20, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Copilot AI Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            delta_timestamps (dict[list[float]] | None, optional): _description_. Defaults to None. Use {"affordance":[]} to get full trajectory.
+            delta_timestamps (dict[str, list[float]] | None, optional): Mapping from feature names to lists
+                of relative time offsets (in seconds) that control which timesteps are materialized from the
+                underlying recorded trajectory. Each key should be the name of a feature or feature group
+                present in the dataset (for example, an observation, action, or label namespace), and each
+                value should be a list of floats representing time deltas relative to the reference control
+                timestamp for a given step. When ``delta_timestamps`` is ``None`` (default), all modalities are
+                read at their native timestamps without additional temporal offsets.
+                In this context, ``"affordance"`` refers to a special feature namespace used to store
+                task‑specific targets or labels that are defined per timestep along the trajectory. Passing
+                ``{"affordance": []}`` disables sub‑sampling for that namespace and exposes the full trajectory,
+                i.e. the complete sequence of timesteps recorded for each episode, rather than a reduced set of
+                timesteps derived from non‑empty offset lists. The dictionary is expected to use feature names
+                as keys and lists of float offsets (typically sorted, in seconds) as values; these offsets are
+                validated against the dataset frame rate so that each offset corresponds to an existing frame
+                or step when added to the reference timestamps. Defaults to None.

	# Special handling for "affordance" key: keep empty list (will be handled specially in _get_query_indices)
	# Special handling for "affordance" key: keep empty list (will be handled specially in `_get_query_indices`)

	# No padding needed for affordance (variable length by design)
	# Affordance sequences are variable length by design - padding tensor created with all False values since all frames are valid

	result[key] = torch.stack(self.hf_dataset[key][relative_indices])
	result[key] = torch.stack(self.hf_dataset[query_key][relative_indices])

Comments

Conversation

screw-44 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type / Scope

Summary / Motivation

What changed

How was this tested

How to run locally (reviewer)

Checklist (required before merge)

Reviewer notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

screw-44 commented Dec 20, 2025 •

edited

Loading