Implement feature to allow maximum sequences per segment for packed inputs #1039

gabeweisz · 2025-09-11T16:20:05Z

This PR implements the feature request described in #1000

MaxText and MaxDiffusion (and probably others) use TransformerEngine for Flash Attention on GPUs.
When using packed inputs in THD format, TransformerEngine requires that the user specify the maximum segments packet into a sequence.

Grain did not previously support specifying maximum segment per sequence, which would cause data corruption in TransformerEngine if the limit was exceeded.

This PR allows the user to specify max segments per sequence in both the FirstFitPackIterDataset class and the deprecated PackAndBatchOperation class and includes tests for both.

PR #1028 mostly makes this PR obsolete because #1028 allows the user to override the packing and implement #1000 without duplicating a lot of code from grain, but this PR will still be useful because multiple consumers of grain need this feature and will not need to individually implement it.

Should #1028 be accepted, I'm happy to integrate this functionality with that one. I think that PR is great from the end-user perspective.

📚 Documentation preview 📚: https://google-grain--1039.org.readthedocs.build/

…g_add_max_inputs

gabeweisz added 5 commits August 26, 2025 14:01

Add support for max examples per row

de29004

add test for max_examples_per_row

7ba4d61

fix error in turning classmethod into member

e9d1a40

add trest for max sequences per bin

66ecd2f

Merge branch 'main' of https://github.com/google/grain into gw_packin…

9edba56

…g_add_max_inputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement feature to allow maximum sequences per segment for packed inputs #1039

Implement feature to allow maximum sequences per segment for packed inputs #1039

Uh oh!

gabeweisz commented Sep 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Implement feature to allow maximum sequences per segment for packed inputs #1039

Are you sure you want to change the base?

Implement feature to allow maximum sequences per segment for packed inputs #1039

Uh oh!

Conversation

gabeweisz commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gabeweisz commented Sep 11, 2025 •

edited

Loading