Implement feature to allow maximum sequences per segment for packed inputs #1039
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the feature request described in #1000
MaxText and MaxDiffusion (and probably others) use TransformerEngine for Flash Attention on GPUs.
When using packed inputs in THD format, TransformerEngine requires that the user specify the maximum segments packet into a sequence.
Grain did not previously support specifying maximum segment per sequence, which would cause data corruption in TransformerEngine if the limit was exceeded.
This PR allows the user to specify max segments per sequence in both the FirstFitPackIterDataset class and the deprecated PackAndBatchOperation class and includes tests for both.
PR #1028 mostly makes this PR obsolete because #1028 allows the user to override the packing and implement #1000 without duplicating a lot of code from grain, but this PR will still be useful because multiple consumers of grain need this feature and will not need to individually implement it.
Should #1028 be accepted, I'm happy to integrate this functionality with that one. I think that PR is great from the end-user perspective.
📚 Documentation preview 📚: https://google-grain--1039.org.readthedocs.build/