Skip to content

Fix loss_fn_outputs right-aligned slicing in Tinker API path#1367

Merged
CharlieFRuan merged 3 commits intomainfrom
fix/loss-fn-outputs-right-aligned-slicing
Mar 23, 2026
Merged

Fix loss_fn_outputs right-aligned slicing in Tinker API path#1367
CharlieFRuan merged 3 commits intomainfrom
fix/loss-fn-outputs-right-aligned-slicing

Conversation

@CharlieFRuan
Copy link
Copy Markdown
Member

@CharlieFRuan CharlieFRuan commented Mar 23, 2026

Summary

  • Fix loss_fn_outputs extraction using [:valid_len] (left-aligned) on right-aligned tensors, returning padding/prompt logprobs instead of actual response values
  • Change to [-valid_len:] at all 4 affected sites: FSDP worker SFT path, FSDP worker RL path, Megatron worker SFT path, Megatron worker RL path

Details

After #1285 unified the padding layout to left-pad (right-aligned response tensors), the loss_fn_outputs extraction in the Tinker API path still used [:valid_len], which grabs values from the left (padding region) instead of the right (actual response region).

Example with action_mask = [0, 0, 1, 1, 1] and action_log_probs = [pad, pad, real_0, real_1, real_2]:

  • [:3][pad, pad, real_0] (WRONG)
  • [-3:][real_0, real_1, real_2] (CORRECT)

Affected entrypoints: The Tinker API (skyrl.tinker.api) — the standard RL training loop discards loss_fn_outputs and is unaffected.

Note that #1285 does not affect the existing bug in the tinker entrypoint, since it calls _to_training_batch(), which also does right padding.

Fixes #1304

🤖 Generated with Claude Code


Open with Devin

After PR #1285 unified the padding layout to left-pad (right-aligned
response tensors), the loss_fn_outputs extraction in the Tinker API
path still sliced with [:valid_len] (left-aligned), returning
padding/prompt logprobs instead of actual response logprobs.

Change [:valid_len] to [-valid_len:] at all 4 affected sites in both
the FSDP worker and Megatron worker (SFT and RL paths each).

Add CPU unit tests that verify right-aligned slicing returns the
correct response values and would fail with the old left-aligned
slicing.

Fixes #1304

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gemini-code-assist[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@CharlieFRuan CharlieFRuan merged commit 50ebff5 into main Mar 23, 2026
5 of 7 checks passed
@CharlieFRuan CharlieFRuan deleted the fix/loss-fn-outputs-right-aligned-slicing branch March 23, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[tinker][bug] loss_fn_outputs extraction uses left-aligned slicing on right-aligned tensors (Tinker API path)

1 participant