Skip to content

[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22#4080

Merged
yuanheng-zhao merged 8 commits into
vllm-project:mainfrom
yuanheng-zhao:fix/ming-flash-omni-transformers-5.x
Jun 5, 2026
Merged

[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22#4080
yuanheng-zhao merged 8 commits into
vllm-project:mainfrom
yuanheng-zhao:fix/ming-flash-omni-transformers-5.x

Conversation

@yuanheng-zhao

@yuanheng-zhao yuanheng-zhao commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR fixed transformers and vllm version compatibility issues on existing Ming-flash-omni-2.0 flows.

cc @akshatvishu , @ZhengWG , @LHXuuu

Test Plan

  1. Offline and online tests (* with commenting out _SKIP_NEED_4_H100_NOT_CI to run online tests)
  2. Omni-speech, TTS Recipe running
  3. Image-gen
pytest -s tests/e2e/offline_inference/test_ming_flash_omni_expansion.py tests/e2e/online_serving/test_ming_flash_omni_expansion.py

Test Result

vllm                                     0.22.0
transformers                             5.5.4

unit tests output:

=============================================================================================== 13 passed, 18 warnings in 517.86s (0:08:37)

Following Ming-flash-omni-2.0 recipe, Omni-speech and TTS work well now.
For image-gen example, please refer to #4080 (comment)


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao yuanheng-zhao changed the title [Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X [Do-not-merge][Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X Jun 2, 2026
@yuanheng-zhao

yuanheng-zhao commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Hey @akshatvishu , please cherry-pick or copy-paste these commits into your PR. I've tested on your latest commit 6a7203c plus these commits and have passed the Ming e2e unit tests (cuda devices)

akshatvishu added a commit to akshatvishu/vllm-omni that referenced this pull request Jun 2, 2026
Adapt the Ming Flash Omni talker compatibility fixes suggested in PR vllm-project#4080.

Suggested-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu

akshatvishu commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Thanks for putting this up! Just pushed the commit at my side with the same! All test are passing now!

root@7:/app/vllm-omni#   pytest -q -ra \
    tests/e2e/offline_inference/test_ming_flash_omni_expansion.py \
    tests/e2e/online_serving/test_ming_flash_omni_expansion.py
.......ssssss                                                                                                                                                                            [100%]
=================================================================================== short test summary info ====================================================================================
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:70: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:96: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:122: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:149: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:176: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:203: Requires 4x H100 GPUs; skipped in CI for now.
7 passed, 6 skipped, 18 warnings in 278.51s (0:04:38)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
root@7:/app/vllm-omni#   pytest -q -ra \
    tests/e2e/online_serving/test_ming_flash_omni_expansion.py
......                                                                                                                                                                                   [100%]

--- Running Summary
6 passed, 17 warnings in 106.81s (0:01:46)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.

    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)

@akshatvishu

Copy link
Copy Markdown
Contributor

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.

    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)

Hi @yuanheng-zhao ! I haven’t encountered this while running the e2e tests. Could you share a bit more detail on what’s triggering it on your side?

@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

@akshatvishu Btw, have you added changes from vllm_omni/model_executor/models/ming_flash_omni/modeling_bailing_moe_v2.py into your PR? As I encountered error without those changes on my side.

    def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
        return self.word_embeddings(input_ids)

Hi @yuanheng-zhao ! I haven’t encountered this while running the e2e tests. Could you share a bit more detail on what’s triggering it on your side?

I just recalled that these two embed_input_ids are for transformers 4.X compatibility with the current upstream vllm and vllm-omni version.

@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

@akshatvishu Actually, would you like to push your commits 149b0c0 which fixed part of the transformers compatibility issues into this PR, so that we could fix prior Ming models on transformers 4.X and 5.X together?

I think it's better to split the transformers compatibility fixes into a single PR for later references.

@akshatvishu

Copy link
Copy Markdown
Contributor

@yuanheng-zhao Good catch! Since get_language_model() returns the causal-LM wrapper, vLLM expects embed_input_ids() to be exposed there. This is consistent with other wrappers (e.g., Qwen MoE/Qwen2), so I've pushed a commit with the suggested change.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
(cherry picked from commit 149b0c0)
@akshatvishu

Copy link
Copy Markdown
Contributor

@akshatvishu Actually, would you like to push your commits 149b0c0 which fixed part of the transformers compatibility issues into this PR, so that we could fix prior Ming models on transformers 4.X and 5.X together?

I think it's better to split the transformers compatibility fixes into a single PR for later references.

Sure ! Can you add me as a collaborator with write access to this fork branch?

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
@ZhengWG

ZhengWG commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@yuanheng-zhao Test passed with image_gen:
ming_cat

@yuanheng-zhao yuanheng-zhao changed the title [Do-not-merge][Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on both transformers 4.X and 5.X [Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22 Jun 3, 2026
@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

Thanks @ZhengWG for verifying Image-gen!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bcaf3e9c25

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +200 to +201
if _HAS_VIDEO_PROCESSOR:
processor_kwargs["video_processor"] = video_processor

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid registering a missing video processor as required

When AutoVideoProcessor exists, this always passes video_processor into ProcessorMixin even if the caller/checkpoint did not provide one. ProcessorMixin type-checks every advertised processor attribute during construction, so existing Ming processor constructions that only have the image processor, audio processor, and tokenizer now fail before any request is served, including text/image/audio-only uses. Please either require/load a real video processor before adding it here, or keep it out of the required processor attributes when it is absent.

Useful? React with 👍 / 👎.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

@akshatvishu This commit 9fbc63b imports new error:

ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.

@akshatvishu

Copy link
Copy Markdown
Contributor

@akshatvishu This commit 9fbc63b imports new error:

ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.

Sorry! It was because my local testing env was still using the old transformer <5.x ! It should be resolved with the new commit!

@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

@akshatvishu This commit 9fbc63b imports new error:

ValueError: This processor requires 4 arguments: image_processor, video_processor, audio_processor, tokenizer. Got 0 arguments instead.

Sorry! It was because my local testing env was still using the old transformer <5.x ! It should be resolved with the new commit!

Btw, which transformers version are you using

@akshatvishu

akshatvishu commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Btw, which transformers version are you using

At ssh it was 5.8.1 (Official Docker ROCm image) , local it was 4.57 -> which now I've upgraded to 5.8.1 .

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu akshatvishu force-pushed the fix/ming-flash-omni-transformers-5.x branch from 1105754 to 696b955 Compare June 3, 2026 14:21
@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

I re-run unit tests as well as recipe and all tests passed, on 696b955

@linyueqian linyueqian added the ready label to trigger buildkite CI label Jun 3, 2026
@yuanheng-zhao

Copy link
Copy Markdown
Collaborator Author

PTAL @linyueqian , @LHXuuu

@linyueqian linyueqian left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the compat fix. Direction looks right and matches how other models migrated to vllm 0.22 (logits_processor(self.lm_head, hidden_states), embed_input_ids contract, ignore_keys_at_rope_validation). 13 e2e green is a solid signal.

Two majors below are real correctness gaps in non-test paths but not blockers for the documented serve flow. The minors are cleanup that can land in this PR or a follow-up, whichever is easier.

@@ -156,6 +163,8 @@ class MingFlashOmniProcessor(ProcessorMixin):

attributes = ["image_processor", "audio_processor", "tokenizer"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] video_processor is set after super().__init__() and is not in attributes, so ProcessorMixin.save_pretrained() will not persist it. A user who does processor.save_pretrained('/tmp/x') then loads from /tmp/x will silently lose the video processor (the new from_pretrained override only rescues loads from the original HF repo path, where the video config exists). Suggest gating "video_processor" into attributes on _HAS_VIDEO_PROCESSOR so the round-trip is symmetric. Otherwise this is a regression on save+reload workflows.

*args,
**kwargs,
)
except OSError:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Narrow except OSError only catches missing-file cases. A malformed or partial video_preprocessor_config.json (e.g. unknown processor class, schema mismatch) raises ValueError/KeyError and will crash the whole processor load, defeating the fallback. Consider broadening to (OSError, ValueError, KeyError) or Exception with a logger.warning(...) and video_processor = None, matching the spirit of the try/except TypeError fallback in __call__ below.

def get_input_embeddings(self):
return self.word_embeddings

def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:

@linyueqian linyueqian Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] MingFlashOmniThinkerForConditionalGeneration.embed_input_ids in ming_flash_omni_thinker.py:930 already calls self.language_model.model.word_embeddings(input_ids), so these two new wrappers (BailingMoeV2Model here and BailingMoeV2ForCausalLM at L800) duplicate the path. If they exist because vllm 0.22 now expects embed_input_ids on the inner LM directly, worth a one-line # required by <caller> comment so a later sweep does not strip them as dead code. If nothing currently calls them, drop them.

videos=videos,
return_tensors="pt",
)
except TypeError as exc:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] On transformers>=5.0, image_processor no longer accepts videos= at all, so this try block raises TypeError unconditionally on the version this PR targets. Consider gating on _HAS_VIDEO_PROCESSOR from processors/ming.py (or a local equivalent) and raising the ValueError directly without the dead call. Cleaner traceback, same UX.

Comment thread vllm_omni/model_executor/models/ming_flash_omni/talker_module.py
if isinstance(self.llm_config, dict):
return PretrainedConfig.from_dict(self.llm_config)
return self.llm_config
llm_config = getattr(self, "llm_config", None)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] self.llm_config is set unconditionally in __init__, so getattr(self, "llm_config", None) defaulting to None is paranoia. Not wrong, just non-load-bearing; a direct self.llm_config is fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still required, as comments added

akshatvishu and others added 2 commits June 4, 2026 18:40
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao

yuanheng-zhao commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

@linyueqian , @akshatvishu Tested on 3dec4ce

Offline and online unit tests (*with commenting out _SKIP_NEED_4_H100_NOT_CI to run online tests)

=================================================================== 13 passed, 18 warnings in 656.17s (0:10:56)

Recipe examples:
I run thinker+talker and talker TTS examples and all got passed.

@linyueqian linyueqian left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@akshatvishu

Copy link
Copy Markdown
Contributor

@linyueqian

A small doubt: for MingFlashOmniConfig, I think super().__init__(**kwargs) should move to the end, after thinker_config, image_gen_config, and talker_config are assigned.

Transformers >= v5 runs validation inside PretrainedConfig.__init__(), and that validation can call get_text_config(). In MingFlashOmniConfig, get_text_config() depends on self.thinker_config already existing.

I don’t think we should re-add getattr(self, "llm_config", None) in BailingMM2Config, because BailingMM2Config already assigns self.llm_config before super().__init__(). Adding that fallback would hide ordering bugs instead of fixing them.

We can check this from the init order:

BailingMM2Config:

self.llm_config = ...
self.mlp_depth = mlp_depth
super().__init__(**kwargs)

So BailingMM2Config.get_text_config() can safely return self.llm_config during Transformers validation.

MingFlashOmniConfig was the broken one:

super().__init__(**kwargs)
self.thinker_config = ...

But MingFlashOmniConfig.get_text_config() calls:

return self.thinker_config.get_text_config()

So Transformers validation can hit get_text_config() before thinker_config exists. That is where I think the actual parity issue is.

Same issue with vllm_omni/transformers_utils/configs/mammoth_moda2.py under class Mammothmoda2Config(PretrainedConfig):

@yuanheng-zhao yuanheng-zhao merged commit 95d56cf into vllm-project:main Jun 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants