[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22 by yuanheng-zhao · Pull Request #4080 · vllm-project/vllm-omni

yuanheng-zhao · 2026-06-02T16:22:27Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR fixed transformers and vllm version compatibility issues on existing Ming-flash-omni-2.0 flows.

Test Plan

Offline and online tests (* with commenting out _SKIP_NEED_4_H100_NOT_CI to run online tests)
Omni-speech, TTS Recipe running
Image-gen

pytest -s tests/e2e/offline_inference/test_ming_flash_omni_expansion.py tests/e2e/online_serving/test_ming_flash_omni_expansion.py

Test Result

vllm                                     0.22.0
transformers                             5.5.4

unit tests output:

=============================================================================================== 13 passed, 18 warnings in 517.86s (0:08:37)

Following Ming-flash-omni-2.0 recipe, Omni-speech and TTS work well now.
For image-gen example, please refer to #4080 (comment)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

yuanheng-zhao · 2026-06-02T16:58:05Z

Hey @akshatvishu , please cherry-pick or copy-paste these commits into your PR. I've tested on your latest commit 6a7203c plus these commits and have passed the Ming e2e unit tests (cuda devices)

Adapt the Ming Flash Omni talker compatibility fixes suggested in PR vllm-project#4080. Suggested-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-06-02T18:00:57Z

Thanks for putting this up! Just pushed the commit at my side with the same! All test are passing now!

root@7:/app/vllm-omni#   pytest -q -ra \
    tests/e2e/offline_inference/test_ming_flash_omni_expansion.py \
    tests/e2e/online_serving/test_ming_flash_omni_expansion.py
.......ssssss                                                                                                                                                                            [100%]
=================================================================================== short test summary info ====================================================================================
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:70: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:96: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:122: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:149: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:176: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:203: Requires 4x H100 GPUs; skipped in CI for now.
7 passed, 6 skipped, 18 warnings in 278.51s (0:04:38)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
root@7:/app/vllm-omni#   pytest -q -ra \
    tests/e2e/online_serving/test_ming_flash_omni_expansion.py
......                                                                                                                                                                                   [100%]

--- Running Summary
6 passed, 17 warnings in 106.81s (0:01:46)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

yuanheng-zhao · 2026-06-03T03:25:21Z

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.

    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)

akshatvishu · 2026-06-03T09:41:26Z

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.
    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)

Hi @yuanheng-zhao ! I haven’t encountered this while running the e2e tests. Could you share a bit more detail on what’s triggering it on your side?

yuanheng-zhao · 2026-06-03T09:55:42Z

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.
    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)
Hi @yuanheng-zhao ! I haven’t encountered this while running the e2e tests. Could you share a bit more detail on what’s triggering it on your side?

I just recalled that these two embed_input_ids are for transformers 4.X compatibility with the current upstream vllm and vllm-omni version.

yuanheng-zhao · 2026-06-03T09:57:28Z

@akshatvishu Actually, would you like to push your commits 149b0c0 which fixed part of the transformers compatibility issues into this PR, so that we could fix prior Ming models on transformers 4.X and 5.X together?

I think it's better to split the transformers compatibility fixes into a single PR for later references.

akshatvishu · 2026-06-03T10:00:01Z

@yuanheng-zhao Good catch! Since get_language_model() returns the causal-LM wrapper, vLLM expects embed_input_ids() to be exposed there. This is consistent with other wrappers (e.g., Qwen MoE/Qwen2), so I've pushed a commit with the suggested change.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com> (cherry picked from commit 149b0c0)

akshatvishu · 2026-06-03T10:10:15Z

@akshatvishu Actually, would you like to push your commits 149b0c0 which fixed part of the transformers compatibility issues into this PR, so that we could fix prior Ming models on transformers 4.X and 5.X together?

I think it's better to split the transformers compatibility fixes into a single PR for later references.

Sure ! Can you add me as a collaborator with write access to this fork branch?

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

ZhengWG · 2026-06-03T11:27:09Z

@yuanheng-zhao Test passed with image_gen:

yuanheng-zhao · 2026-06-03T11:30:24Z

Thanks @ZhengWG for verifying Image-gen!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bcaf3e9c25

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T11:36:15Z

+        if _HAS_VIDEO_PROCESSOR:
+            processor_kwargs["video_processor"] = video_processor


Avoid registering a missing video processor as required

When AutoVideoProcessor exists, this always passes video_processor into ProcessorMixin even if the caller/checkpoint did not provide one. ProcessorMixin type-checks every advertised processor attribute during construction, so existing Ming processor constructions that only have the image processor, audio processor, and tokenizer now fail before any request is served, including text/image/audio-only uses. Please either require/load a real video processor before adding it here, or keep it out of the required processor attributes when it is absent.

Useful? React with 👍 / 👎.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

yuanheng-zhao · 2026-06-03T13:04:52Z

@akshatvishu This commit 9fbc63b imports new error:

ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.

akshatvishu · 2026-06-03T13:19:26Z

@akshatvishu This commit 9fbc63b imports new error:

ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.

Sorry! It was because my local testing env was still using the old transformer <5.x ! It should be resolved with the new commit!

yuanheng-zhao · 2026-06-03T13:38:04Z

@akshatvishu This commit 9fbc63b imports new error:
ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.
Sorry! It was because my local testing env was still using the old transformer <5.x ! It should be resolved with the new commit!

Btw, which transformers version are you using

akshatvishu · 2026-06-03T13:41:15Z

Btw, which transformers version are you using

At ssh it was 5.8.1 (Official Docker ROCm image) , local it was 4.57 -> which now I've upgraded to 5.8.1 .

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

yuanheng-zhao · 2026-06-03T16:12:49Z

I re-run unit tests as well as recipe and all tests passed, on 696b955

yuanheng-zhao · 2026-06-04T01:45:22Z

PTAL @linyueqian , @LHXuuu

linyueqian

Thanks for the compat fix. Direction looks right and matches how other models migrated to vllm 0.22 (logits_processor(self.lm_head, hidden_states), embed_input_ids contract, ignore_keys_at_rope_validation). 13 e2e green is a solid signal.

Two majors below are real correctness gaps in non-test paths but not blockers for the documented serve flow. The minors are cleanup that can land in this PR or a follow-up, whichever is easier.

linyueqian · 2026-06-04T05:17:43Z

@@ -156,6 +163,8 @@ class MingFlashOmniProcessor(ProcessorMixin):

    attributes = ["image_processor", "audio_processor", "tokenizer"]


[major] video_processor is set after super().__init__() and is not in attributes, so ProcessorMixin.save_pretrained() will not persist it. A user who does processor.save_pretrained('/tmp/x') then loads from /tmp/x will silently lose the video processor (the new from_pretrained override only rescues loads from the original HF repo path, where the video config exists). Suggest gating "video_processor" into attributes on _HAS_VIDEO_PROCESSOR so the round-trip is symmetric. Otherwise this is a regression on save+reload workflows.

linyueqian · 2026-06-04T05:17:43Z

+                    *args,
+                    **kwargs,
+                )
+            except OSError:


[major] Narrow except OSError only catches missing-file cases. A malformed or partial video_preprocessor_config.json (e.g. unknown processor class, schema mismatch) raises ValueError/KeyError and will crash the whole processor load, defeating the fallback. Consider broadening to (OSError, ValueError, KeyError) or Exception with a logger.warning(...) and video_processor = None, matching the spirit of the try/except TypeError fallback in __call__ below.

linyueqian · 2026-06-04T05:17:43Z

    def get_input_embeddings(self):
        return self.word_embeddings

+    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:


[minor] MingFlashOmniThinkerForConditionalGeneration.embed_input_ids in ming_flash_omni_thinker.py:930 already calls self.language_model.model.word_embeddings(input_ids), so these two new wrappers (BailingMoeV2Model here and BailingMoeV2ForCausalLM at L800) duplicate the path. If they exist because vllm 0.22 now expects embed_input_ids on the inner LM directly, worth a one-line # required by <caller> comment so a later sweep does not strip them as dead code. If nothing currently calls them, drop them.

linyueqian · 2026-06-04T05:17:43Z

+                        videos=videos,
+                        return_tensors="pt",
+                    )
+                except TypeError as exc:


[minor] On transformers>=5.0, image_processor no longer accepts videos= at all, so this try block raises TypeError unconditionally on the version this PR targets. Consider gating on _HAS_VIDEO_PROCESSOR from processors/ming.py (or a local equivalent) and raising the ValueError directly without the dead call. Cleaner traceback, same UX.

linyueqian · 2026-06-04T05:17:43Z

-        if isinstance(self.llm_config, dict):
-            return PretrainedConfig.from_dict(self.llm_config)
-        return self.llm_config
+        llm_config = getattr(self, "llm_config", None)


[nit] self.llm_config is set unconditionally in __init__, so getattr(self, "llm_config", None) defaulting to None is paranoia. Not wrong, just non-load-bearing; a direct self.llm_config is fine.

This is still required, as comments added

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

yuanheng-zhao · 2026-06-04T15:27:27Z

@linyueqian , @akshatvishu Tested on 3dec4ce

Offline and online unit tests (*with commenting out _SKIP_NEED_4_H100_NOT_CI to run online tests)

=================================================================== 13 passed, 18 warnings in 656.17s (0:10:56)

Recipe examples:
I run thinker+talker and talker TTS examples and all got passed.

linyueqian

lgtm!

akshatvishu · 2026-06-04T16:26:54Z

@linyueqian

A small doubt: for MingFlashOmniConfig, I think super().__init__(**kwargs) should move to the end, after thinker_config, image_gen_config, and talker_config are assigned.

Transformers >= v5 runs validation inside PretrainedConfig.__init__(), and that validation can call get_text_config(). In MingFlashOmniConfig, get_text_config() depends on self.thinker_config already existing.

I don’t think we should re-add getattr(self, "llm_config", None) in BailingMM2Config, because BailingMM2Config already assigns self.llm_config before super().__init__(). Adding that fallback would hide ordering bugs instead of fixing them.

We can check this from the init order:

BailingMM2Config:

self.llm_config = ...
self.mlp_depth = mlp_depth
super().__init__(**kwargs)

So BailingMM2Config.get_text_config() can safely return self.llm_config during Transformers validation.

MingFlashOmniConfig was the broken one:

super().__init__(**kwargs)
self.thinker_config = ...

But MingFlashOmniConfig.get_text_config() calls:

return self.thinker_config.get_text_config()

So Transformers validation can hit get_text_config() before thinker_config exists. That is where I think the actual parity issue is.

Same issue with vllm_omni/transformers_utils/configs/mammoth_moda2.py under class Mammothmoda2Config(PretrainedConfig):

yuanheng-zhao added 2 commits June 2, 2026 16:11

add embed_input_ids for Ming moe

b6f3728

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

fix for transformers 5.X

44a15ab

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

yuanheng-zhao changed the title ~~[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X~~ [Do-not-merge][Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X Jun 2, 2026

Fix Ming Flash Omni transformer compatibility

3d7c0ac

Signed-off-by: akshatvishu <akshatnayak197@gmail.com> (cherry picked from commit 149b0c0)

fix ming recipe

bcaf3e9

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

yuanheng-zhao changed the title ~~[Do-not-merge][Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X~~ [Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22 Jun 3, 2026

yuanheng-zhao marked this pull request as ready for review June 3, 2026 11:31

yuanheng-zhao requested review from Gaohan123, ZeldaHuang, david6666666, hsliuustc0106, linyueqian, princepride and tzhouam as code owners June 3, 2026 11:31

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

Fix optional Ming video processor registration

9fbc63b

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Hide optional Ming video processor from required attrs

696b955

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the fix/ming-flash-omni-transformers-5.x branch from 1105754 to 696b955 Compare June 3, 2026 14:21

linyueqian added the ready label to trigger buildkite CI label Jun 3, 2026

linyueqian reviewed Jun 4, 2026

View reviewed changes

akshatvishu and others added 2 commits June 4, 2026 18:40

Address Ming Flash Omni processor review

6fe0081

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix isues imported again

3dec4ce

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

linyueqian approved these changes Jun 4, 2026

View reviewed changes

yuanheng-zhao merged commit 95d56cf into vllm-project:main Jun 5, 2026
8 checks passed

		if _HAS_VIDEO_PROCESSOR:
		processor_kwargs["video_processor"] = video_processor

		@@ -156,6 +163,8 @@ class MingFlashOmniProcessor(ProcessorMixin):

		attributes = ["image_processor", "audio_processor", "tokenizer"]

Conversation

yuanheng-zhao commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

yuanheng-zhao commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshatvishu commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

akshatvishu commented Jun 3, 2026

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

akshatvishu commented Jun 3, 2026

Uh oh!

akshatvishu commented Jun 3, 2026

Uh oh!

ZhengWG commented Jun 3, 2026

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

akshatvishu commented Jun 3, 2026

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

akshatvishu commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanheng-zhao commented Jun 3, 2026

Uh oh!

yuanheng-zhao commented Jun 4, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linyueqian Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

linyueqian Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

akshatvishu commented Jun 4, 2026

Uh oh!

yuanheng-zhao commented Jun 2, 2026 •

edited

Loading

yuanheng-zhao commented Jun 2, 2026 •

edited

Loading

akshatvishu commented Jun 2, 2026 •

edited

Loading

akshatvishu commented Jun 3, 2026 •

edited

Loading

linyueqian Jun 4, 2026 •

edited

Loading

yuanheng-zhao commented Jun 4, 2026 •

edited

Loading