Add Teacache Support for LongCat Image by alex-jw-brooks · Pull Request #1487 · vllm-project/vllm-omni

alex-jw-brooks · 2026-02-25T22:11:37Z

Enables TeaCache support for LongCat Image. The model coefficients and speedups were calculated with the current config, not main (see fix).
Includes some fixes to the coefficient estimator to avoid computing gradients and avoid dtype casting issues from running bf16 models
Updates docs to add some notes on estimating coefficients for models that have layers that require vLLM's fwd context and parallel groups to be set up, since it was needed for this one

Example Outputs

For both text to image and image edit, teacache is the left one.

$ python text_to_image.py --cache-backend tea_cache --model meituan-longcat/LongCat-Image --output coffee_tc.png
$ python text_to_image.py --model meituan-longcat/LongCat-Image --output coffee.png

For Image edit (using the coffee image above):

$ python image_edit.py --model meituan-longcat/LongCat-Image-Edit --image coffee.png --prompt "make the coffee cup transparent"  --cache-backend tea_cache --output edit_coffee_tc.png
$ python image_edit.py --model meituan-longcat/LongCat-Image-Edit --image coffee.png --prompt "make the coffee cup transparent" --output edit_coffee.png

Speed Benchmarks

With a thresh of .2, the speedup: is ~1.7x on an h100; didn't benchmark edit, but speedup looked comparable when I ran a quick check after.

Here is the full script I had used for testing, which can be used for reproduction for tti - it'll report the average speedup of 3 images

import os
import gc
import time
import torch
from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams

# Configuration
MODEL_ID = "meituan-longcat/LongCat-Image"
PROMPT = "A cup of coffee sitting on a table."
STEPS = 50
SEEDS = [444, 111, 3919]

TEACACHE_DIR = "cache_results"
NO_CACHE_DIR = "no_cache_results"
os.makedirs(TEACACHE_DIR, exist_ok=True)
os.makedirs(NO_CACHE_DIR, exist_ok=True)


def run_benchmark(use_cache=False):
    print(f"\n{'Testing with TeaCache' if use_cache else 'Testing without TeaCache'}...")
    times = []
    # Configure cache based on requirement
    out_dir = TEACACHE_DIR if use_cache else NO_CACHE_DIR
    cache_config = {
        "rel_l1_thresh": 0.2,
    } if use_cache else {}
    cache_backend = "tea_cache" if use_cache else None

    omni = Omni(
        model=MODEL_ID,
        cache_backend=cache_backend,
        cache_config=cache_config,
        dtype="bfloat16",
    )

    for seed in SEEDS:
        sampling_params = OmniDiffusionSamplingParams(num_inference_steps=STEPS, seed=seed)
        start = time.time()
        outputs = omni.generate(PROMPT, sampling_params)
        end = time.time()
        run_time = end - start
        times.append(run_time)
        # Save the generated image
        image = outputs[0].request_output[0].images[0]
        filename = f"{out_dir}/seed_{seed}.png"
        print(f"Run time: {run_time} for seed: {seed} [use_cache={use_cache}]")
        image.save(filename)

    avg_time = sum(times) / len(times)
    print(f"Average latency [use_cache={use_cache}]: {avg_time}")
    return avg_time


if __name__ == "__main__":
    # Run tests
    time_no_cache = run_benchmark(use_cache=False)
    torch.cuda.empty_cache()
    gc.collect()
    time_with_cache = run_benchmark(use_cache=True)

    print(f"\nResults:")
    print(f"Speedup: {time_no_cache / time_with_cache:.2f}x")

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 003d3b9f61

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-27T17:51:52Z

+    sp_size = module.parallel_config.sequence_parallel_size
+    get_forward_context().sequence_parallel_size = sp_size
+
+    hidden_states = module.x_embedder(hidden_states)


Preserve sequence-parallel sharding in LongCat extractor

In the SP case (sequence_parallel_size > 1), this code enables SP in the forward context but does not replicate the required LongCat preprocessing (chunking image hidden_states and RoPE by rank, as done in LongCatImageTransformer2DModel.forward). As a result, SP attention paths run on unsharded layouts, which yields invalid coefficient-collection behavior and can break distributed estimation runs.

Useful? React with 👍 / 👎.

Useful, but I think there are larger underlying problems in SP for this model at the moment (see #1556). I will investigate the fix for that as well, but see the same error with & without TeaCache at the moment, so open to any direction for how to handle it on this PR

alex-jw-brooks · 2026-02-27T19:10:09Z

+        # Explicitly use inference mode to avoid gradients since we
+        # are not creating the pipeline through the model runner
+        with torch.inference_mode():
+            self.pipeline.forward(req)


A few small fixes were needed in this script to avoid OOMs in my env from gradients, and to handle bf16 since it can't be .numpy the tensors

lishunyang12

Left a couple comments. The extractor mostly mirrors the model forward correctly, but the first block runs twice on non-cached steps which seems unintentional.

lishunyang12 · 2026-02-28T03:51:01Z

+    _, hs = first_block(
+        hidden_states=hidden_states,
+        encoder_hidden_states=encoder_hidden_states,
+        temb=temb,


This runs first_block(...) to get the modulated input, but then run_transformer_blocks() below iterates over all module.transformer_blocks (including [0]) again. So block 0 gets executed twice on every non-cached step.

The other extractors (e.g., qwen) avoid this by extracting the modulated input with just the lightweight norm call (block.img_mod(temb) + block.img_norm1(...)) without running the full block forward. Could you do something similar here, or at least start run_transformer_blocks from module.transformer_blocks[1:]?

Good catch 😬 Thanks! Fixed the modulated input and reran the coefficient calculations

lishunyang12 · 2026-02-28T03:51:01Z

+        pipeline.to(device)
+        return pipeline
+
+


Should this also be wrapped with set_default_torch_dtype(od_config.dtype) like BagelAdapter.load_pipeline was updated to do above?

I had actually added a set_default_torch_dtype around the call to the load pipeline on the adapter instead of just putting it around the one line 🙂 the better way to do this is

loader = DiffusersPipelineLoader(LoadConfig(), od_config=od_config) return loader.load_model(od_config=od_config, load_device=device)

because load_model will handle the device placement, put the model in eval mode, and handle the dtypes from the diffusion config. Updated both to avoid managing default dtypes manually and made sure the bagel one still runs

hsliuustc0106 · 2026-03-12T12:49:25Z

Hi @alex-jw-brooks 👋

Checking in on the Teacache support for LongCat Image PR — 12 days since last update. Any progress?

Thanks!

lishunyang12 · 2026-03-12T13:56:22Z

Hey @alex-jw-brooks — following up on the open threads from 2 weeks ago. The main concern is still the block 0 double execution + modulated input extraction.

Looking more carefully: first_block.norm1(hs, emb=temb)[0] extracts the modulated input from hs (the post-block-0 output), but it should be from the pre-block hidden_states. The Qwen extractor does this correctly — it calls block.img_norm1(hidden_states, img_mod1) on the original hidden states without running the full block. This means the cache decisions here are based on the wrong signal, and the coefficients were estimated with that bug.

Could you take a look?

alex-jw-brooks · 2026-03-12T18:00:14Z

Hey @hsliuustc0106 @lishunyang12, haven't forgotten about this PR, just paused it for a bit while fixing the sequence parallelism for this model to avoid copying things over here. I'll get back to it this afternoon and work on the comments, thanks for your patience 🙂

alex-jw-brooks · 2026-03-15T00:20:55Z

-| **LongCat-Image** | `meituan-longcat/LongCat-Image` | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
-| **LongCat-Image-Edit** | `meituan-longcat/LongCat-Image-Edit` | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ |
+| **LongCat-Image** | `meituan-longcat/LongCat-Image` | ✅  | ✅ | ✅  | ✅ | ✅ | ✅ | ❌ |
+| **LongCat-Image-Edit** | `meituan-longcat/LongCat-Image-Edit` | ✅  | ✅ | ✅  | ✅  | ✅ | ✅ | ❌ |


Columns for sequence parallel are out of date for LongCat; it does support ring attention and Ulysses SP.

Tested that teacache works with both types of SP on as part of this PR.

alex-jw-brooks · 2026-03-15T00:39:30Z

Hey @lishunyang12 @hsliuustc0106 thanks for the reviews - took another pass at this and added some additional info since I hadn't tested image edit yet originally. Ready for another look when you've got the bandwidth 🙂

lishunyang12 · 2026-03-21T23:24:04Z

Thanks for the update. Will re-review this week.

lishunyang12

The extractor looks correct now — norm1 is called on the pre-block hidden_states, and all blocks run in run_transformer_blocks(). Previous concern is addressed.

alex-jw-brooks · 2026-03-26T23:38:20Z

@wtomin @hsliuustc0106 could you please review this PR?

I've refactored a bit to share some of the code for the estimators since LongCat / the new Stable Audio coefficient were basically the same, but it is ready for a look when you have a moment

alex-jw-brooks · 2026-04-10T19:48:34Z

Hi @lishunyang12 @hsliuustc0106 @wtomin, could you please take a look at this one when you have the chance?

hsliuustc0106 · 2026-04-17T12:30:20Z

resolve conflicts please, I will add ready label for this PR

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks · 2026-04-17T17:13:47Z

Thanks @hsliuustc0106! Just rebased 😄

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks changed the title ~~[WIP]~~ [WIP] Add Teacache Support for LongCat Image Feb 25, 2026

alex-jw-brooks marked this pull request as ready for review February 27, 2026 17:48

alex-jw-brooks requested a review from hsliuustc0106 as a code owner February 27, 2026 17:48

chatgpt-codex-connector Bot reviewed Feb 27, 2026

View reviewed changes

alex-jw-brooks changed the title ~~[WIP] Add Teacache Support for LongCat Image~~ Add Teacache Support for LongCat Image Feb 27, 2026

alex-jw-brooks force-pushed the longcat_teacache branch from d35e91d to b0dd147 Compare February 27, 2026 19:09

alex-jw-brooks commented Feb 27, 2026

View reviewed changes

alex-jw-brooks mentioned this pull request Feb 27, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

This was referenced Mar 4, 2026

[BugFix] Fix LongCat Sequence Parallelism / Small Cleanup #1631

Merged

[Refactor] Use SP Plan for LongCat Sequence Parallelism #1772

Merged

Gaohan123 added this to the v0.18.0 milestone Mar 14, 2026

alex-jw-brooks force-pushed the longcat_teacache branch 2 times, most recently from 08396d7 to 53f39f2 Compare March 15, 2026 00:18

alex-jw-brooks commented Mar 15, 2026

View reviewed changes

alex-jw-brooks requested a review from lishunyang12 March 15, 2026 00:39

alex-jw-brooks force-pushed the longcat_teacache branch from 53f39f2 to 5284249 Compare March 19, 2026 17:24

lishunyang12 approved these changes Mar 21, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/cache/teacache/extractors.py

alex-jw-brooks force-pushed the longcat_teacache branch 2 times, most recently from adbe997 to 9c0fe26 Compare March 26, 2026 23:19

alex-jw-brooks force-pushed the longcat_teacache branch 3 times, most recently from 3e83fe8 to 22d1403 Compare April 6, 2026 06:36

alex-jw-brooks force-pushed the longcat_teacache branch from 22d1403 to 970940b Compare April 10, 2026 16:05

Gaohan123 modified the milestones: v0.18.0, v0.20.0 Apr 14, 2026

alex-jw-brooks added 17 commits April 17, 2026 15:54

first pass at longcat teacache

afe43c0

Signed-off-by: Alex Brooks <albrooks@redhat.com>

update longcat image teacache coefficients

422ec7e

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add teacache adapter for longcat

f13bc11

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add workaround for vllm context to docs

f6fb7e9

Signed-off-by: Alex Brooks <albrooks@redhat.com>

copy sequence parallel support to teacache

f153aad

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fix block output

21edcd4

Signed-off-by: Alex Brooks <albrooks@redhat.com>

correct coefficients

82f3bb6

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add docstring

5c4a3cd

Signed-off-by: Alex Brooks <albrooks@redhat.com>

remove sequence parallel hacks in teacache extractor

acba1dd

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fix img modulation in longcat image teacache

eac1d45

Signed-off-by: Alex Brooks <albrooks@redhat.com>

clean up coefficient dtype handling

c55112c

Signed-off-by: Alex Brooks <albrooks@redhat.com>

teacache doc tweaks

7a8e8fb

Signed-off-by: Alex Brooks <albrooks@redhat.com>

add comment about split_text_embed_in_sp

4815cbc

Signed-off-by: Alex Brooks <albrooks@redhat.com>

fmt

d7a19a3

Signed-off-by: Alex Brooks <albrooks@redhat.com>

make teacache estimator common

1a8d1ba

Signed-off-by: Alex Brooks <albrooks@redhat.com>

update docs

567b336

Signed-off-by: Alex Brooks <albrooks@redhat.com>

rebase flux2

eebbf2f

Signed-off-by: Alex Brooks <albrooks@redhat.com>

alex-jw-brooks force-pushed the longcat_teacache branch from 970940b to eebbf2f Compare April 17, 2026 16:47

fix flux2 import

537a0f5

Signed-off-by: Alex Brooks <albrooks@redhat.com>

lishunyang12 added the ready label to trigger buildkite CI label Apr 17, 2026

hsliuustc0106 merged commit f346f2f into vllm-project:main Apr 17, 2026
8 checks passed

lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026

Add Teacache Support for LongCat Image (vllm-project#1487)

e26a60d

Signed-off-by: Alex Brooks <albrooks@redhat.com>

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

Add Teacache Support for LongCat Image (vllm-project#1487)

322ed3f

Signed-off-by: Alex Brooks <albrooks@redhat.com>

Conversation

alex-jw-brooks commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example Outputs

Speed Benchmarks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 12, 2026

Uh oh!

lishunyang12 commented Mar 12, 2026

Uh oh!

alex-jw-brooks commented Mar 12, 2026

Uh oh!

alex-jw-brooks Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lishunyang12 commented Mar 21, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alex-jw-brooks commented Mar 26, 2026

Uh oh!

alex-jw-brooks commented Apr 10, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

alex-jw-brooks commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alex-jw-brooks commented Feb 25, 2026 •

edited

Loading

alex-jw-brooks Mar 15, 2026 •

edited

Loading

alex-jw-brooks commented Mar 15, 2026 •

edited

Loading