[train] Extract pad_batch() to training_batch.py#1523
Closed
CharlieFRuan wants to merge 1 commit intomainfrom
Closed
[train] Extract pad_batch() to training_batch.py#1523CharlieFRuan wants to merge 1 commit intomainfrom
CharlieFRuan wants to merge 1 commit intomainfrom
Conversation
… support - Moves `pad_batch` out of `RayPPOTrainer` into a module-level function in `training_batch.py` so that dispatch-level callers can share it. - Adds a `mode` kwarg: ``train_batch`` (callers own the full batch and want uids/trajectory_ids metadata extended with synthetic pad entries) vs ``mini_batch`` (callers pad a transient slice and must not touch parent metadata that would not correspond to the slice anyway). - Asserts the batch lives on CPU. Both real callers already stage on CPU, and padding allocates/concatenates — it's not something we want to do on the GPU hot path. - Allows ``pad_size > batch_size`` by cycling row 0 (regression: mini-batch padding can see ``mb_size=1, dp_size=4`` → ``pad_size=3``, and the old ``tensor[:pad_size]`` silently returned a shorter slice). - Handles ``TensorList`` fields (``pixel_values``, ``image_grid_thw``) via cyclic cloning, matching the VLM path introduced in #1498. - Adds ``is_last_step`` to the ``TrainingInput`` TypedDict (it's already used everywhere; this makes the schema match reality). - Field-exhaustive unit tests mirror ``test_generator_output_concatenation``: they enumerate ``TrainingInput.__annotations__`` and fail loudly if a new field is added without updating ``pad_batch()``. Also covers the pad_size > batch_size edge case, CPU-only assertion, both modes, and input-immutability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closed by #1527
pad_batchout ofRayPPOTrainerinto a module-level function intraining_batch.pyso that dispatch-level callers can share it.modekwarg:train_batch(callers own the full batch and want uids/trajectory_ids metadata extended with synthetic pad entries) vsmini_batch(callers pad a transient slice and must not touch parent metadata that would not correspond to the slice anyway).pad_size > batch_sizeby cycling row 0 (regression: mini-batch padding can seemb_size=1, dp_size=4→pad_size=3, and the oldtensor[:pad_size]silently returned a shorter slice).TensorListfields (pixel_values,image_grid_thw) via cyclic cloning, matching the VLM path introduced in [train][multimodal][3/3] Trainer changes to extract multi-modal outputs from GeneratorOutput #1498.is_last_stepto theTrainingInputTypedDict (it's already used everywhere; this makes the schema match reality).test_generator_output_concatenation: they enumerateTrainingInput.__annotations__and fail loudly if a new field is added without updatingpad_batch(). Also covers the pad_size > batch_size edge case, CPU-only assertion, both modes, and input-immutability.