[Perf/Fix] Reimplement Batched CFG Forward for Bagel by alex-jw-brooks · Pull Request #4098 · vllm-project/vllm-omni

alex-jw-brooks · 2026-06-03T08:27:26Z

Purpose

Related: #3977

#3728 had removed the hard-coded flash attention code from Bagel, but broke the batched forward for when we have multiple CFG branches, largely because the text_cfg branch doesn't have kv values, so the kvs across the branches are uneven and incorrectly handled.

The PR for Lance fixed the correctness by calling the CFG branches sequentially, so the outputs on main should be correct for gen mode on Bagel, but the tests are still disabled, and the test pixel values have not been updated to reflect some changes made in the Lance PR.

We should probably merge #4081 first before this PR, since it updates the ground truth pixels and turns the e2e bagel tti/i2i tests back on, and we can validate that this PR won't change the outputs. For testing though, I've copied the updated pixel values over to this PR also.

Test Plan

Will add some more details with testing & examples tomorrow.

CC @Gaohan123 @lishunyang12 @zhangj1an @natureofnature @princepride

chatgpt-codex-connector · 2026-06-03T08:27:31Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Gaohan123 · 2026-06-03T09:51:48Z

@zhangj1an @princepride PTAL

zhangj1an · 2026-06-04T08:11:30Z

LGTM, is good to merge,

Previously, all CFG branches' tokens in Bagel were concatenated into a single sequence. In a forward step, a single dense attention was used, which caused the branches to cross-attend and contaminate each other. This PR instead stacks the branches on the batch dimension (q_4d/k_4d/v_4d), so it is equivalent to each branch using a separate matrix multiplication. So each branch only attends to its own cache + text + vae.
This branch did not re-use CFGParallelMixin. This is because this batched-CFG-with-KV-cache logic is specific to AR models like Bagel/Lance (per-branch KV cache, fused into one batched forward). CFGParallelMixin is meant for DiT models (assumes stateless, independent per-branch forwards), which do not deal with KV cache.

example of main branch v.s. this branch

main branch (token-concatenation) vs. this branch (batch-dim stacking)

Assume 2 CFG branches, each with 2 query tokens (real Bagel tokens are [text…, vae…] per branch, but 2 each is enough to show the idea):

Branch A (conditional): tokens a1, a2
Branch B (unconditional / text_cfg): tokens b1, b2

previously, the tokens are concatenated into one sequence (batch 1, seq 4):

q = [a1, a2, b1, b2]   ->  shape (1, 4, d)
k = [a1, a2, b1, b2]   ->  shape (1, 4, d)

q @ kᵀ is a 4×4 score matrix.

          key: a1   a2   b1   b2
   q a1  [     ok   ok   XX   XX  ]
     a2  [     ok   ok   XX   XX  ]
     b1  [     XX   XX   ok   ok  ]
     b2  [     XX   XX   ok   ok  ]

The XX cells should not be there, because CFG branches should not attend to each other.

in this PR, the tokens are stacked on the batch dimension. Each branch is its own batch row (batch 2, seq 2):

q_4d = [ [a1, a2],     ->  shape (2, 2, d)
         [b1, b2] ]
k_4d = [ [a1, a2],
         [b1, b2] ]

Batched attention runs q @ kᵀ independently per row. This results in two separate 2×2 matrices. This ensures each branch is independent, and is more light-weight than my previous proposed method (still use 1 huge matrix, just add diagonal masks).

   batch row 0 (branch A)        batch row 1 (branch B)
        key: a1   a2                  key: b1   b2
  q a1 [    ok   ok ]           q b1 [    ok   ok ]
    a2 [    ok   ok ]             b2 [    ok   ok ]

Gaohan123

Thanks. Could you please post the generation result before and after the PR to ituitively show the correction of images? And it is better to provide a table to compare the performance before and after the PR.

Gaohan123 · 2026-06-04T09:42:22Z

-    {"position": (400, 700), "rgb": (130, 96, 77)},
-    {"position": (700, 700), "rgb": (247, 203, 140)},
-    {"position": (256, 256), "rgb": (167, 156, 150)},
+    {"position": (100, 100), "rgb": (64, 45, 35)},


Does it mean previous groundtruth is wrong? cc @princepride ?

Thanks for the catch, following @princepride's reply in #4081, we agreed to not change ref img pixels, it was already correct, so I think @alex-jw-brooks will undo this part in test_bagel_mooncake_connector.py and test_bagel_shared_memory_connector.py

princepride · 2026-06-04T10:15:00Z

I don't think the previous pixels are wrong😑

Signed-off-by: Alex Brooks <albrooks@redhat.com> minor Signed-off-by: Alex Brooks <albrooks@redhat.com> fix ref Signed-off-by: Alex Brooks <albrooks@redhat.com>

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks · 2026-06-04T15:27:59Z

Hey @princepride @Gaohan123 @zhangj1an Yes, when I opened this PR, it had just been updated to reflect current main at the time, which is why the pixel values were changed 😅

Rebased now after the discussions from #4081, since the latent generation stuff was reverted, the old values are correct

alex-jw-brooks requested review from Isotr0py, RuixiangMa, SamitHuang, ZJY0516, david6666666, princepride, wtomin and yenuo26 as code owners June 3, 2026 08:27

Gaohan123 added this to the v0.22.0 milestone Jun 3, 2026

zhangj1an mentioned this pull request Jun 3, 2026

[CI] Update Bagel Pixels #4081

Merged

Gaohan123 reviewed Jun 4, 2026

View reviewed changes

Gaohan123 added the ready label to trigger buildkite CI label Jun 4, 2026

alex-jw-brooks added 12 commits June 4, 2026 14:37

bagel fix wip

6a9a21a

Signed-off-by: Alex Brooks <albrooks@redhat.com> minor Signed-off-by: Alex Brooks <albrooks@redhat.com> fix ref Signed-off-by: Alex Brooks <albrooks@redhat.com>

revert lance changes (passing bagel tests)

7a3fb26

Signed-off-by: Alex Brooks <albrooks@redhat.com>

pop kwarg to fix lance tests

cd3935a

Signed-off-by: Alex Brooks <albrooks@redhat.com>

remove merge

de884d5

Signed-off-by: Alex Brooks <albrooks@redhat.com>

more consolidation and refactoring

9804e8d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add naivecache tests

8068161

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add from obj tests for naive cache (for kv transfer)

cb89593

Signed-off-by: Alex Brooks <albrooks@redhat.com>

batch vae passes

c78366a

Signed-off-by: Alex Brooks <albrooks@redhat.com>

update pixel refs

78c83b0

Signed-off-by: Alex Brooks <albrooks@redhat.com>

minor

5dfce68

Signed-off-by: Alex Brooks <albrooks@redhat.com>

remove outdated comments

47aeee0

Signed-off-by: Alex Brooks <albrooks@redhat.com>

rebase pixels

4666ee3

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks force-pushed the bagel_fixes branch from 530fa35 to 4666ee3 Compare June 4, 2026 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf/Fix] Reimplement Batched CFG Forward for Bagel#4098

[Perf/Fix] Reimplement Batched CFG Forward for Bagel#4098
alex-jw-brooks wants to merge 12 commits into
vllm-project:mainfrom
alex-jw-brooks:bagel_fixes

alex-jw-brooks commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 3, 2026

Uh oh!

Gaohan123 commented Jun 3, 2026

Uh oh!

zhangj1an commented Jun 4, 2026 •

edited

Loading

main branch (token-concatenation) vs. this branch (batch-dim stacking)

Uh oh!

Gaohan123 left a comment

Uh oh!

Gaohan123 Jun 4, 2026

Uh oh!

zhangj1an Jun 4, 2026

Uh oh!

princepride commented Jun 4, 2026

Uh oh!

alex-jw-brooks commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alex-jw-brooks commented Jun 3, 2026

Purpose

Test Plan

Uh oh!

chatgpt-codex-connector Bot commented Jun 3, 2026

Uh oh!

Gaohan123 commented Jun 3, 2026

Uh oh!

zhangj1an commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

main branch (token-concatenation) vs. this branch (batch-dim stacking)

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

zhangj1an Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

princepride commented Jun 4, 2026

Uh oh!

alex-jw-brooks commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangj1an commented Jun 4, 2026 •

edited

Loading

alex-jw-brooks commented Jun 4, 2026 •

edited

Loading