Fix decode bucket filter issues from #1122#1447
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes two regressions introduced in #1122 affecting decode bucket filtering: (1) avoid IndexError when file-based bucketing passes an empty ctx_range, and (2) stop incorrectly filtering valid contiguous-PA decode buckets based on the ctx filter (since contiguous-PA block ranges are already bounded by max_blocks).
Changes:
- Guard
ctx_range[0]access innum_ctx_tokens_less_or_equal_batched_max_model_lenby deriving a safectx_minwhenctx_rangeis empty. - Remove the ctx filter from contiguous-PA decode bucket filtering.
- Add/adjust unit tests to cover the empty-
ctx_rangefile-bucketing path and contiguous-PA decode bucket retention.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
vllm_gaudi/extension/bucketing/common.py |
Makes ctx filtering safe for empty ctx_range and disables ctx filtering for contiguous-PA decode buckets. |
tests/unit_tests/test_bucketing.py |
Adds regression tests for the two reported issues and scopes ctx-filter assertions to non-contiguous PA. |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
|
@iboiko-habana @kamil-kaczor |
|
Finding 1 🔴 Critical · With Worked example using parameters from the new
3 of 5 explicitly-configured file buckets are silently filtered out. The new test only asserts Suggestion: Skip filters entirely for file-provided buckets (file = explicit user intent), e.g. in filters = [] if file_buckets else get_filters(is_prompt, use_merged_prefill, use_contiguous_pa)This preserves #1122's filter for auto-generated buckets (its actual target) and avoids any behavior change on the file path beyond "don't crash". Also strengthen the regression test to assert all five file buckets are preserved: assert len(buckets) == len(file_buckets)
for fb in file_buckets:
assert fb in buckets[- Reviewed by Awesome ChlOpus] |
…eting) When VLLM_BUCKETING_FROM_FILE is used, ctx_range is passed as an empty list to generate_buckets(). The num_ctx_tokens_less_or_equal_batched_max_model_len filter accessed ctx_range[0] unconditionally, causing IndexError. Fix: use safe access with fallback to 0 when ctx_range is empty. Signed-off-by: Youlei Yang <youlei.yang@intel.com>
For contiguous PA, the block range is already bounded by max_blocks in the bucketing strategies, so the num_ctx_tokens_less_or_equal_batched_max_model_len filter is unnecessary and incorrectly drops valid buckets. Example: with max_model_len=2048, block_size=256, max_num_seqs=256, bucket (256, 1, 2112) was filtered because 2112 > ceil(2048/256)*256=2048, but 2112 is a valid user-configured VLLM_DECODE_BLOCK_BUCKET_MAX. Signed-off-by: Youlei Yang <youlei.yang@intel.com>
- Set filters to empty list when file_buckets is provided in generate_buckets(), since file-based bucketing should pass all user-specified buckets through without filtering. - Rename test_file_buckets_with_empty_ctx_range_no_crash to test_file_buckets_bypass_filters to reflect the actual behavior. - Add bucket (512,1,256) that would be rejected by batch_size_smaller_than_blocks filter to prove filters are skipped. - Update test_decode_buckets_satisfy_ctx_filter to only run for non-contiguous PA since contiguous PA decode buckets are not filtered. - Update docstring for test_exponential_decode_block_limit_uncapped. Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Done by Skip filters for file-based bucketing and refactor tests. |
✅ CI PassedAll checks passed successfully against the following vllm commit: |
…project#1447)" This reverts commit e9b8f08. Signed-off-by: Youlei Yang <youlei.yang@intel.com>
## Fixes Two bugs introduced by #1122 (commit f24f3f9): ### 1. IndexError when using file-based bucketing (GAUDISW-248587) When `VLLM_BUCKETING_FROM_FILE` is used (e.g. GraniteMoeHybrid model), `ctx_range` is passed as an empty list to `generate_buckets()`. The `num_ctx_tokens_less_or_equal_batched_max_model_len` filter accessed `ctx_range[0]` unconditionally, causing `IndexError: list index out of range`. **Fix**: Safe access with fallback to 0 when `ctx_range` is empty. ### 2. Contiguous PA decode buckets incorrectly filtered (GAUDISW-248598) The ctx filter was applied to contiguous PA decode buckets, incorrectly dropping valid buckets. For example, with `max_model_len=2048`, `block_size=256`, `max_num_seqs=256`, bucket `(256, 1, 2112)` was filtered because `2112 > ceil(2048/256)*256 = 2048`, but 2112 is a valid user-configured `VLLM_DECODE_BLOCK_BUCKET_MAX`. **Fix**: Remove the ctx filter from contiguous PA decode buckets. For contiguous PA, the block range is already bounded by `max_blocks` in the bucketing strategies. ## Tests - Added `test_file_buckets_with_empty_ctx_range_no_crash` — reproduces the server.log IndexError - Added `test_contiguous_pa_decode_buckets_not_filtered_by_ctx` — reproduces the std_out.txt issue - Narrowed `test_decode_buckets_satisfy_ctx_filter` to non-contiguous PA only - Updated docstrings Signed-off-by: Youlei Yang <youlei.yang@intel.com> --------- Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Fixes
Two bugs introduced by #1122 (commit f24f3f9):
1. IndexError when using file-based bucketing (GAUDISW-248587)
When
VLLM_BUCKETING_FROM_FILEis used (e.g. GraniteMoeHybrid model),ctx_rangeis passed as an empty list togenerate_buckets(). Thenum_ctx_tokens_less_or_equal_batched_max_model_lenfilter accessedctx_range[0]unconditionally, causingIndexError: list index out of range.Fix: Safe access with fallback to 0 when
ctx_rangeis empty.2. Contiguous PA decode buckets incorrectly filtered (GAUDISW-248598)
The ctx filter was applied to contiguous PA decode buckets, incorrectly dropping valid buckets. For example, with
max_model_len=2048,block_size=256,max_num_seqs=256, bucket(256, 1, 2112)was filtered because2112 > ceil(2048/256)*256 = 2048, but 2112 is a valid user-configuredVLLM_DECODE_BLOCK_BUCKET_MAX.Fix: Remove the ctx filter from contiguous PA decode buckets. For contiguous PA, the block range is already bounded by
max_blocksin the bucketing strategies.Tests
test_file_buckets_with_empty_ctx_range_no_crash— reproduces the server.log IndexErrortest_contiguous_pa_decode_buckets_not_filtered_by_ctx— reproduces the std_out.txt issuetest_decode_buckets_satisfy_ctx_filterto non-contiguous PA onlySigned-off-by: Youlei Yang youlei.yang@intel.com