[Bugfix] Fix Compatibility of Ming-flash-omni-2.0 on transformers 5.X and vllm 0.22#4080
Conversation
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
|
Hey @akshatvishu , please cherry-pick or copy-paste these commits into your PR. I've tested on your latest commit 6a7203c plus these commits and have passed the Ming e2e unit tests (cuda devices) |
Adapt the Ming Flash Omni talker compatibility fixes suggested in PR vllm-project#4080. Suggested-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
Thanks for putting this up! Just pushed the commit at my side with the same! All test are passing now! root@7:/app/vllm-omni# pytest -q -ra \
tests/e2e/offline_inference/test_ming_flash_omni_expansion.py \
tests/e2e/online_serving/test_ming_flash_omni_expansion.py
.......ssssss [100%]
=================================================================================== short test summary info ====================================================================================
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:70: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:96: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:122: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:149: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:176: Requires 4x H100 GPUs; skipped in CI for now.
SKIPPED [1] tests/e2e/online_serving/test_ming_flash_omni_expansion.py:203: Requires 4x H100 GPUs; skipped in CI for now.
7 passed, 6 skipped, 18 warnings in 278.51s (0:04:38)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
root@7:/app/vllm-omni# pytest -q -ra \
tests/e2e/online_serving/test_ming_flash_omni_expansion.py
...... [100%]
--- Running Summary
6 passed, 17 warnings in 106.81s (0:01:46)
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute |
|
@akshatvishu Btw, have you added changes from def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor:
return self.word_embeddings(input_ids) |
Hi @yuanheng-zhao ! I haven’t encountered this while running the e2e tests. Could you share a bit more detail on what’s triggering it on your side? |
I just recalled that these two |
|
@akshatvishu Actually, would you like to push your commits 149b0c0 which fixed part of the transformers compatibility issues into this PR, so that we could fix prior Ming models on transformers 4.X and 5.X together? I think it's better to split the transformers compatibility fixes into a single PR for later references. |
|
@yuanheng-zhao Good catch! Since |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com> (cherry picked from commit 149b0c0)
Sure ! Can you add me as a collaborator with write access to this fork branch? |
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
|
@yuanheng-zhao Test passed with image_gen: |
|
Thanks @ZhengWG for verifying Image-gen! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bcaf3e9c25
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if _HAS_VIDEO_PROCESSOR: | ||
| processor_kwargs["video_processor"] = video_processor |
There was a problem hiding this comment.
Avoid registering a missing video processor as required
When AutoVideoProcessor exists, this always passes video_processor into ProcessorMixin even if the caller/checkpoint did not provide one. ProcessorMixin type-checks every advertised processor attribute during construction, so existing Ming processor constructions that only have the image processor, audio processor, and tokenizer now fail before any request is served, including text/image/audio-only uses. Please either require/load a real video processor before adding it here, or keep it out of the required processor attributes when it is absent.
Useful? React with 👍 / 👎.
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
@akshatvishu This commit 9fbc63b imports new error: |
Sorry! It was because my local testing env was still using the old transformer <5.x ! It should be resolved with the new commit! |
Btw, which transformers version are you using |
At ssh it was |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
1105754 to
696b955
Compare
|
I re-run unit tests as well as recipe and all tests passed, on 696b955 |
|
PTAL @linyueqian , @LHXuuu |
linyueqian
left a comment
There was a problem hiding this comment.
Thanks for the compat fix. Direction looks right and matches how other models migrated to vllm 0.22 (logits_processor(self.lm_head, hidden_states), embed_input_ids contract, ignore_keys_at_rope_validation). 13 e2e green is a solid signal.
Two majors below are real correctness gaps in non-test paths but not blockers for the documented serve flow. The minors are cleanup that can land in this PR or a follow-up, whichever is easier.
| @@ -156,6 +163,8 @@ class MingFlashOmniProcessor(ProcessorMixin): | |||
|
|
|||
| attributes = ["image_processor", "audio_processor", "tokenizer"] | |||
There was a problem hiding this comment.
[major] video_processor is set after super().__init__() and is not in attributes, so ProcessorMixin.save_pretrained() will not persist it. A user who does processor.save_pretrained('/tmp/x') then loads from /tmp/x will silently lose the video processor (the new from_pretrained override only rescues loads from the original HF repo path, where the video config exists). Suggest gating "video_processor" into attributes on _HAS_VIDEO_PROCESSOR so the round-trip is symmetric. Otherwise this is a regression on save+reload workflows.
| *args, | ||
| **kwargs, | ||
| ) | ||
| except OSError: |
There was a problem hiding this comment.
[major] Narrow except OSError only catches missing-file cases. A malformed or partial video_preprocessor_config.json (e.g. unknown processor class, schema mismatch) raises ValueError/KeyError and will crash the whole processor load, defeating the fallback. Consider broadening to (OSError, ValueError, KeyError) or Exception with a logger.warning(...) and video_processor = None, matching the spirit of the try/except TypeError fallback in __call__ below.
| def get_input_embeddings(self): | ||
| return self.word_embeddings | ||
|
|
||
| def embed_input_ids(self, input_ids: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
[minor] MingFlashOmniThinkerForConditionalGeneration.embed_input_ids in ming_flash_omni_thinker.py:930 already calls self.language_model.model.word_embeddings(input_ids), so these two new wrappers (BailingMoeV2Model here and BailingMoeV2ForCausalLM at L800) duplicate the path. If they exist because vllm 0.22 now expects embed_input_ids on the inner LM directly, worth a one-line # required by <caller> comment so a later sweep does not strip them as dead code. If nothing currently calls them, drop them.
| videos=videos, | ||
| return_tensors="pt", | ||
| ) | ||
| except TypeError as exc: |
There was a problem hiding this comment.
[minor] On transformers>=5.0, image_processor no longer accepts videos= at all, so this try block raises TypeError unconditionally on the version this PR targets. Consider gating on _HAS_VIDEO_PROCESSOR from processors/ming.py (or a local equivalent) and raising the ValueError directly without the dead call. Cleaner traceback, same UX.
| if isinstance(self.llm_config, dict): | ||
| return PretrainedConfig.from_dict(self.llm_config) | ||
| return self.llm_config | ||
| llm_config = getattr(self, "llm_config", None) |
There was a problem hiding this comment.
[nit] self.llm_config is set unconditionally in __init__, so getattr(self, "llm_config", None) defaulting to None is paranoia. Not wrong, just non-load-bearing; a direct self.llm_config is fine.
There was a problem hiding this comment.
This is still required, as comments added
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
|
@linyueqian , @akshatvishu Tested on 3dec4ce Offline and online unit tests (*with commenting out _SKIP_NEED_4_H100_NOT_CI to run online tests) Recipe examples: |
|
A small doubt: for Transformers >= v5 runs validation inside I don’t think we should re-add We can check this from the init order:
self.llm_config = ...
self.mlp_depth = mlp_depth
super().__init__(**kwargs)So
super().__init__(**kwargs)
self.thinker_config = ...But return self.thinker_config.get_text_config()So Transformers validation can hit Same issue with |

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR fixed
transformersandvllmversion compatibility issues on existing Ming-flash-omni-2.0 flows.cc @akshatvishu , @ZhengWG , @LHXuuu
Test Plan
_SKIP_NEED_4_H100_NOT_CIto run online tests)Test Result
unit tests output:
Following Ming-flash-omni-2.0 recipe, Omni-speech and TTS work well now.
For image-gen example, please refer to #4080 (comment)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)