Skip to content

[Cache Refactor 1/N] Simplify CacheDiT Integration#2527

Open
alex-jw-brooks wants to merge 15 commits intovllm-project:mainfrom
alex-jw-brooks:cache_dit_refactor
Open

[Cache Refactor 1/N] Simplify CacheDiT Integration#2527
alex-jw-brooks wants to merge 15 commits intovllm-project:mainfrom
alex-jw-brooks:cache_dit_refactor

Conversation

@alex-jw-brooks
Copy link
Copy Markdown
Contributor

@alex-jw-brooks alex-jw-brooks commented Apr 6, 2026

Purpose

See RFC #2535

Currently, integration with Cache DiT involves copying a lot of code, but the underlying library that implements Cache DiT is very generic. We should not have to write much code for supporting new models in vLLM omni unless they need to handle special cases, e.g., Wan2's multi transformer architecture.

This PR is one of a few intended to simplify transformer caching in vLLM Omni so that we can hopefully have more generic integrations and have wider free model support, or at least make progress towards adding support simply being adding a new block adapter object.

For this PR:

  • Creates a generic build_cache_context_refresh that we can use to build the cache refresh closure for different models; currently we have ~10 identical copy pastes of this function
  • Moves common parts like creating the calibration config out
  • Adds a common enable_cache_for_dit that takes as input a BlockAdapter, since this is the main thing that changes across models.
  • removes the GLM Image DiT cache integration, as it's fully broken and does not currently return a refresh function + also looks like it's pretty much the default case anyway.

Test Plan

We should be able to run DiT cache on all models on main + this branch and verify that we get the same output for each. However, some of these models are currently broken, so still working on testing.

CC @lishunyang12 @wtomin, could you please take a look when you get the chance?

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 7, 2026

It's better to create a RFC to explain the design.

@alex-jw-brooks
Copy link
Copy Markdown
Contributor Author

Sure thanks @gcanlin - opened one with some details here

@wtomin wtomin requested review from SamitHuang and wtomin April 13, 2026 04:15
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [Cache Refactor 1/N] Simplify CacheDiT Integration

Overall direction is great -- collapsing ~10 near-identical enabler functions into a declarative _block_fwd_patterns class attribute + generic enable_cache_for_dit / build_cache_context_refresh is a big win for maintainability. Net -511 lines with cleaner abstractions. A few issues need addressing before merging:

Behavioral regressions

  1. Flux2-Klein lost its Fn_compute_blocks = 2 override (regression)
    The old enable_cache_for_flux2_klein explicitly set db_cache_config.Fn_compute_blocks = 2 after building the config, with a comment about quality degradation at Fn=1. The new generic path has no mechanism for this model-specific override. This will silently change Flux2-Klein caching behavior. Consider either:

    • Keeping Flux2-Klein in CUSTOM_DIT_ENABLERS, or
    • Extending _block_fwd_patterns (or adding a sibling attribute like _cache_config_overrides) to let models declare config tweaks.
  2. check_forward_pattern=False lost for LTX2 and HunyuanImage3
    Both old enablers passed check_forward_pattern=False to BlockAdapter. The new maybe_build_block_adapter never passes this flag. The TODO comments in the model files acknowledge the uncertainty but don't resolve it. If the validation fails at runtime for these models, cache enablement will break. Please either:

    • Add check_forward_pattern to the _block_fwd_patterns declaration (e.g., make it a richer descriptor), or
    • Pass check_forward_pattern=False in maybe_build_block_adapter for now and file a follow-up issue.
  3. HunyuanImage3 pipeline accessor changed from pipeline.model to pipeline.transformer
    The old enable_cache_for_hunyuan_image3 accessed pipeline.model (with pipeline.model.layers). The new generic path uses default_get_pipeline_transformer(pipeline) which returns pipeline.transformer. The _block_fwd_patterns is defined on HunyuanImage3Model, but if the pipeline stores it at .model rather than .transformer, this will fail. Please verify the pipeline attribute name is correct.

Minor issues

  1. Double _build_db_cache_config call in enable_cache_for_wan22 single-transformer path
    Lines building db_cache_config are called once at the top of the function and again inside the if getattr(pipeline, "transformer_2", None) is None: branch. The second call shadows the first unnecessarily.

  2. Naming convention for refresh_cache_context_func TypeAlias
    PEP 8 and common practice use PascalCase for type aliases (e.g., RefreshCacheContextFunc). The lowercase refresh_cache_context_func reads like a variable name rather than a type. Minor, but worth aligning with conventions.

  3. Flux2-Klein SCM refresh also lost the Fn_compute_blocks override
    The old refresh function for Flux2-Klein passed Fn_compute_blocks=db_cache_config.Fn_compute_blocks in the DBCacheConfig().reset(...) call during SCM-enabled refreshes. The generic build_cache_context_refresh does not do this, so SCM refreshes will also regress.

Looks good

  • The build_cache_context_refresh factory is clean and well-structured.
  • _resolve_calibrator_config extraction is a nice touch.
  • Keeping Wan22 and Bagel as custom enablers makes sense given their multi-transformer complexity.
  • Removing the broken GLM Image enabler is the right call.
  • The _block_fwd_patterns declarative approach on models is a good foundation for the rest of the refactor series.

Please address items 1-3 (the behavioral regressions) before merge. Items 4-6 are nice-to-fix.

Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
@lishunyang12
Copy link
Copy Markdown
Collaborator

lishunyang12 commented Apr 28, 2026

I think it is a great design. But a lot of validation work is needed, be it unit testing and end-to-end. I don't think this pr is in ready state yet. Please show your test plan in more details and let's put it after 0.20.0 cut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants