[Cache Refactor 1/N] Simplify CacheDiT Integration by alex-jw-brooks · Pull Request #2527 · vllm-project/vllm-omni

alex-jw-brooks · 2026-04-06T19:42:28Z

Purpose

See RFC #2535

Currently, integration with Cache DiT involves copying a lot of code, but the underlying library that implements Cache DiT is very generic. We should not have to write much code for supporting new models in vLLM omni unless they need to handle special cases, e.g., Wan2's multi transformer architecture.

This PR is one of a few intended to simplify transformer caching in vLLM Omni so that we can hopefully have more generic integrations and have wider free model support, or at least make progress towards adding support simply being adding a new block adapter object.

For this PR:

Creates a generic build_cache_context_refresh that we can use to build the cache refresh closure for different models; currently we have ~10 identical copy pastes of this function
Moves common parts like creating the calibration config out
Adds a common enable_cache_for_dit that takes as input a BlockAdapter, since this is the main thing that changes across models.
removes the GLM Image DiT cache integration, as it's fully broken and does not currently return a refresh function + also looks like it's pretty much the default case anyway.

Test Plan

We should be able to run DiT cache on all models on main + this branch and verify that we get the same output for each. However, some of these models are currently broken, so still working on testing.

CC @lishunyang12 @wtomin, could you please take a look when you get the chance?

chatgpt-codex-connector · 2026-04-06T19:42:33Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gcanlin · 2026-04-07T01:59:40Z

It's better to create a RFC to explain the design.

alex-jw-brooks · 2026-04-07T02:36:16Z

Sure thanks @gcanlin - opened one with some details here

lishunyang12

Review: [Cache Refactor 1/N] Simplify CacheDiT Integration

Overall direction is great -- collapsing ~10 near-identical enabler functions into a declarative _block_fwd_patterns class attribute + generic enable_cache_for_dit / build_cache_context_refresh is a big win for maintainability. Net -511 lines with cleaner abstractions. A few issues need addressing before merging:

Behavioral regressions

Flux2-Klein lost its Fn_compute_blocks = 2 override (regression)
The old enable_cache_for_flux2_klein explicitly set db_cache_config.Fn_compute_blocks = 2 after building the config, with a comment about quality degradation at Fn=1. The new generic path has no mechanism for this model-specific override. This will silently change Flux2-Klein caching behavior. Consider either:
- Keeping Flux2-Klein in CUSTOM_DIT_ENABLERS, or
- Extending _block_fwd_patterns (or adding a sibling attribute like _cache_config_overrides) to let models declare config tweaks.
check_forward_pattern=False lost for LTX2 and HunyuanImage3
Both old enablers passed check_forward_pattern=False to BlockAdapter. The new maybe_build_block_adapter never passes this flag. The TODO comments in the model files acknowledge the uncertainty but don't resolve it. If the validation fails at runtime for these models, cache enablement will break. Please either:
- Add check_forward_pattern to the _block_fwd_patterns declaration (e.g., make it a richer descriptor), or
- Pass check_forward_pattern=False in maybe_build_block_adapter for now and file a follow-up issue.
HunyuanImage3 pipeline accessor changed from pipeline.model to pipeline.transformer
The old enable_cache_for_hunyuan_image3 accessed pipeline.model (with pipeline.model.layers). The new generic path uses default_get_pipeline_transformer(pipeline) which returns pipeline.transformer. The _block_fwd_patterns is defined on HunyuanImage3Model, but if the pipeline stores it at .model rather than .transformer, this will fail. Please verify the pipeline attribute name is correct.

Minor issues

Double _build_db_cache_config call in enable_cache_for_wan22 single-transformer path
Lines building db_cache_config are called once at the top of the function and again inside the if getattr(pipeline, "transformer_2", None) is None: branch. The second call shadows the first unnecessarily.
Naming convention for refresh_cache_context_func TypeAlias
PEP 8 and common practice use PascalCase for type aliases (e.g., RefreshCacheContextFunc). The lowercase refresh_cache_context_func reads like a variable name rather than a type. Minor, but worth aligning with conventions.
Flux2-Klein SCM refresh also lost the Fn_compute_blocks override
The old refresh function for Flux2-Klein passed Fn_compute_blocks=db_cache_config.Fn_compute_blocks in the DBCacheConfig().reset(...) call during SCM-enabled refreshes. The generic build_cache_context_refresh does not do this, so SCM refreshes will also regress.

Looks good

The build_cache_context_refresh factory is clean and well-structured.
_resolve_calibrator_config extraction is a nice touch.
Keeping Wan22 and Bagel as custom enablers makes sense given their multi-transformer complexity.
Removing the broken GLM Image enabler is the right call.
The _block_fwd_patterns declarative approach on models is a good foundation for the rest of the refactor series.

Please address items 1-3 (the behavioral regressions) before merge. Items 4-6 are nice-to-fix.

re-review

Signed-off-by: Alex Brooks <albrooks@redhat.com>

lishunyang12 · 2026-04-28T19:17:12Z

I think it is a great design. But a lot of validation work is needed, be it unit testing and end-to-end. I don't think this pr is in ready state yet. Please show your test plan in more details and let's put it after 0.20.0 cut.

alex-jw-brooks requested a review from hsliuustc0106 as a code owner April 6, 2026 19:42

wtomin requested review from SamitHuang and wtomin April 13, 2026 04:15

lishunyang12 previously requested changes Apr 16, 2026

View reviewed changes

alex-jw-brooks mentioned this pull request Apr 16, 2026

[Bugfix] Fix cache dit for Longcat & LTX2 #2860

Merged

alex-jw-brooks added 11 commits April 24, 2026 05:15

use common calibration config builder

e7fd1d3

Signed-off-by: Alex Brooks <albrooks@redhat.com>

consolidate refresh builder

b5ffa4b

Signed-off-by: Alex Brooks <albrooks@redhat.com>

further consolidation of cache dit

f84787b

Signed-off-by: Alex Brooks <albrooks@redhat.com>

use common dbcache builder in wan2

7d4731f

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fix

aa5a71c

Signed-off-by: Alex Brooks <albrooks@redhat.com>

make more handlers simpler

c93e7a2

Signed-off-by: Alex Brooks <albrooks@redhat.com>

simplify wan22

ac11a7d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

move block fwd -> transformer classes

d624433

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fix adapter wrapper

48ef7ce

Signed-off-by: Alex Brooks <albrooks@redhat.com>

wrap cache adapter attr in a dataclass

101917f

Signed-off-by: Alex Brooks <albrooks@redhat.com>

handle cfg issues

d306f5d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks force-pushed the cache_dit_refactor branch from 623ed8d to d306f5d Compare April 27, 2026 01:01

alex-jw-brooks added 4 commits April 27, 2026 01:10

formatting

8fbd1ea

Signed-off-by: Alex Brooks <albrooks@redhat.com>

re-enable stable audio cache dit

fc7ea99

Signed-off-by: Alex Brooks <albrooks@redhat.com>

more formatting

b3edf83

Signed-off-by: Alex Brooks <albrooks@redhat.com>

glm image adapter config (untested)

4c264a2

Signed-off-by: Alex Brooks <albrooks@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cache Refactor 1/N] Simplify CacheDiT Integration#2527

[Cache Refactor 1/N] Simplify CacheDiT Integration#2527
alex-jw-brooks wants to merge 15 commits intovllm-project:mainfrom
alex-jw-brooks:cache_dit_refactor

alex-jw-brooks commented Apr 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

gcanlin commented Apr 7, 2026

Uh oh!

alex-jw-brooks commented Apr 7, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

lishunyang12 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alex-jw-brooks commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

gcanlin commented Apr 7, 2026

Uh oh!

alex-jw-brooks commented Apr 7, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: [Cache Refactor 1/N] Simplify CacheDiT Integration

Behavioral regressions

Minor issues

Looks good

Uh oh!

lishunyang12 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alex-jw-brooks commented Apr 6, 2026 •

edited

Loading

lishunyang12 commented Apr 28, 2026 •

edited

Loading