-
Notifications
You must be signed in to change notification settings - Fork 203
[AMD] Optimize MiniMax-M2.5 FP8 MI355X vLLM search-space #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
65fe884
faf1537
772019f
2eaf88a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -345,15 +345,15 @@ minimaxm2.5-fp8-mi355x-vllm: | |
| - isl: 1024 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 2, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
| - { tp: 8, ep: 8, conc-start: 32, conc-end: 256 } | ||
| - { tp: 2, ep: 2, conc-start: 2, conc-end: 512 } | ||
| - { tp: 4, ep: 4, conc-start: 4, conc-end: 256 } | ||
| - { tp: 8, ep: 8, conc-start: 2, conc-end: 2 } | ||
| - isl: 8192 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 2, conc-start: 4, conc-end: 64 } | ||
| - { tp: 4, conc-start: 4, conc-end: 64 } | ||
| - { tp: 8, ep: 8, conc-start: 32, conc-end: 256 } | ||
| - { tp: 2, ep: 2, conc-start: 2, conc-end: 256 } | ||
| - { tp: 4, ep: 4, conc-start: 4, conc-end: 512 } | ||
| - { tp: 8, ep: 8, conc-start: 2, conc-end: 2 } | ||
|
Comment on lines
+350
to
+356
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The tp:8/ep:8 entries for both ISL scenarios (1k1k and 8k1k) have conc-start:2 and conc-end:2, collapsing the search space to a single concurrency point. This is a dramatic reduction from the previous conc-start:32, conc-end:256 range and is undocumented—if intentional (e.g., tp8/ep8 is known non-optimal and just needs a sanity check at conc=2), a brief comment in the YAML would prevent future confusion; if unintentional, conc-end should be raised to provide a meaningful search range. Extended reasoning...What the bug is: In the updated The specific change: The diff shows the tp:8/ep:8 lines for both 1k1k (line 350) and 8k1k (line 356) changed from Why existing code doesn't prevent it: The YAML schema appears to allow Addressing the intentionality argument: One verifier argued this is intentional search-space optimization—narrowing tp8/ep8 to a reference point while widening tp2/ep2 (2-512) and tp4/ep4 (4-256). This is plausible: the PR title is "Optimize MiniMax-M2.5 FP8 MI355X vLLM search-space," and the consistency across both ISL scenarios suggests deliberate authorship. However, the sibling Step-by-step proof: Before the PR, tp8/ep8 in How to fix: If intentional, add an inline comment such as |
||
|
|
||
| minimaxm2.5-fp8-mi355x-atom: | ||
| image: rocm/atom:rocm7.2.1-ubuntu24.04-pytorch2.9.1-atom0.1.2 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 The tp:4/ep:4
conc-endvalues are accidentally swapped between the 1k1k and 8k1k ISL scenarios: 1k1k getsconc-end:256while 8k1k getsconc-end:512, which is backwards. Longer context requires more KV-cache memory per request, so 8k1k should support fewer concurrent requests (lowerconc-end); the values should be 512 for 1k1k and 256 for 8k1k, matching the stated PR intent and the tp:2/ep:2 entries in this same PR.Extended reasoning...
What the bug is and how it manifests
In the
minimaxm2.5-fp8-mi355x-vllmconfig, the newtp:4/ep:4entries have theirconc-endvalues reversed relative to ISL scenario:tp:4, ep:4, conc-end: 256tp:4, ep:4, conc-end: 512This means the benchmark will explore lower maximum concurrency for short contexts and higher maximum concurrency for long contexts — the opposite of what is physically appropriate.
The specific code path
The diff modifies the
seq-len-configsforminimaxm2.5-fp8-mi355x-vllm. Underisl: 1024,conc-end: 256was written fortp:4/ep:4. Underisl: 8192,conc-end: 512was written fortp:4/ep:4. These two values appear to have been copy-pasted with the wrong values for the wrong ISL bucket.Why existing code doesn't prevent it
The YAML config is not validated for semantic ordering of
conc-endacross ISL scenarios. Both values are syntactically valid; only a human review catches that the relationship between them is inverted.What the impact would be
The benchmarking harness will test tp:4/ep:4 up to concurrency 512 at 8k ISL, where memory pressure is highest, likely causing OOM errors, degraded throughput, or misleading results. Conversely, it will stop at concurrency 256 for 1k ISL, potentially missing the optimal peak-throughput operating point for shorter contexts.
How to fix it
Swap the two
conc-endvalues so that:conc-end: 512conc-end: 256Step-by-step proof
tp:2/ep:2entries added in this same PR correctly implement that intent: 1k1k getsconc-end:512, 8k1k getsconc-end:256.tp:4/ep:4entries are reversed: 1k1k getsconc-end:256, 8k1k getsconc-end:512— contradicting point 1.minimaxm2.5-fp8-mi325x-vllmconfig (already in the file) confirms the expected pattern:tp:8/ep:8hasconc-end:512for isl:1024 andconc-end:256for isl:8192.conc-endfor 8k1k must be ≤conc-endfor 1k1k at the same TP/EP.