Skip to content

Implement bucket corrector for Mamba chunk size - v0.14.1#885

Merged
wpyszka merged 1 commit into
vllm-project:releases/v0.14.1from
jbyczkow:mamba_chunk_bucket_corrector_0_14_1
Jan 28, 2026
Merged

Implement bucket corrector for Mamba chunk size - v0.14.1#885
wpyszka merged 1 commit into
vllm-project:releases/v0.14.1from
jbyczkow:mamba_chunk_bucket_corrector_0_14_1

Conversation

@jbyczkow
Copy link
Copy Markdown
Collaborator

Due to MambaMixer2 implementation requirements, all buckets used for mamba must be a multiple of mamba chunk size.

Due to MambaMixer2 implementation requirements, all buckets used
for mamba must be a multiple of mamba chunk size.

Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Copilot AI review requested due to automatic review settings January 27, 2026 02:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds correction logic to ensure all bucket sizes used for Mamba models are multiples of the Mamba chunk size, as required by the MambaMixer2 implementation.

Changes:

  • Added initialization of Mamba layer count and chunk size in the HPU model runner
  • Updated bucket generation to accept and apply Mamba chunk size corrections
  • Implemented a corrector function that rounds query sizes up to the nearest multiple of Mamba chunk size

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
vllm_gaudi/v1/worker/hpu_model_runner.py Retrieves Mamba configuration and passes chunk size to bucket initialization
vllm_gaudi/extension/bucketing/common.py Adds Mamba chunk size parameter throughout bucket generation and applies correction logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def correct_for_max_model_len(bs, query, ctx):
return (bs, query, min(ctx, bs * math.ceil(max_model_len / block_size)))

def correct_for_mamba_chunk_size(bs, query, ctx):
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero will occur if mamba_chunk_size is 0. While the corrector is only called when mamba_chunk_size > 0 (line 433), the function itself doesn't enforce this constraint. Add a guard condition at the start of the function to prevent potential misuse.

Suggested change
def correct_for_mamba_chunk_size(bs, query, ctx):
def correct_for_mamba_chunk_size(bs, query, ctx):
if mamba_chunk_size <= 0:
raise ValueError("mamba_chunk_size must be greater than 0 to avoid division by zero.")

Copilot uses AI. Check for mistakes.
@jbyczkow jbyczkow changed the title Mamba chunk bucket corrector - v0.14.1 Implement bucket corrector for Mamba chunk size - v0.14.1 Jan 27, 2026
Copy link
Copy Markdown
Collaborator

@adobrzyn adobrzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

Copy link
Copy Markdown
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wpyszka wpyszka merged commit edb9b73 into vllm-project:releases/v0.14.1 Jan 28, 2026
64 of 65 checks passed
slokesha pushed a commit to libinta/vllm-gaudi that referenced this pull request Jan 29, 2026
…ct#885)

Due to MambaMixer2 implementation requirements, all buckets used for
mamba must be a multiple of mamba chunk size.

Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants