Implement bucket corrector for Mamba chunk size - v0.14.1#885
Conversation
Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size. Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
There was a problem hiding this comment.
Pull request overview
This PR adds correction logic to ensure all bucket sizes used for Mamba models are multiples of the Mamba chunk size, as required by the MambaMixer2 implementation.
Changes:
- Added initialization of Mamba layer count and chunk size in the HPU model runner
- Updated bucket generation to accept and apply Mamba chunk size corrections
- Implemented a corrector function that rounds query sizes up to the nearest multiple of Mamba chunk size
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_model_runner.py | Retrieves Mamba configuration and passes chunk size to bucket initialization |
| vllm_gaudi/extension/bucketing/common.py | Adds Mamba chunk size parameter throughout bucket generation and applies correction logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def correct_for_max_model_len(bs, query, ctx): | ||
| return (bs, query, min(ctx, bs * math.ceil(max_model_len / block_size))) | ||
|
|
||
| def correct_for_mamba_chunk_size(bs, query, ctx): |
There was a problem hiding this comment.
Division by zero will occur if mamba_chunk_size is 0. While the corrector is only called when mamba_chunk_size > 0 (line 433), the function itself doesn't enforce this constraint. Add a guard condition at the start of the function to prevent potential misuse.
| def correct_for_mamba_chunk_size(bs, query, ctx): | |
| def correct_for_mamba_chunk_size(bs, query, ctx): | |
| if mamba_chunk_size <= 0: | |
| raise ValueError("mamba_chunk_size must be greater than 0 to avoid division by zero.") |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
edb9b73
into
vllm-project:releases/v0.14.1
…ct#885) Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size. Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai> Signed-off-by: slokesha <slokeshappa@habana.ai>
Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size.