[Model]Support MiniCPM-o 4.5#3642
Conversation
Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: GKangaroo <1095103651@qq.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: GKangaroo <gqx24@mails.tsinghua.edu.cn>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74b5e5fd67
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
|
thanks for you contribution, can we split it into 2 PRs? |
74b5e5f to
8fd1276
Compare
No problem. I'll continue merging this PR with minicpm-o4.5, and then open another PR to merge o2.6. Do you think this is appropriate? Opening two PRs simultaneously might require back-and-forth discussions about the merging syntax. Merging one first will allow me to understand the requirements for merging vllm-omni, saving you the trouble of reviewing twice. |
We can focus on MiniCPM-o 4.5 first. You can down scope this pr so that we can fast forward the reviewing process. |
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
…rch fallback (avoids 2.6 collision) Signed-off-by: tc-mb <tianchi_cai@icloud.com>
…ilently returning empty audio Signed-off-by: tc-mb <tianchi_cai@icloud.com>
…tra instead of doc-only Signed-off-by: tc-mb <tianchi_cai@icloud.com>
…e info delivery and OmniOutput packaging Signed-off-by: tc-mb <tianchi_cai@icloud.com>
fdc5c79 to
fb6abc2
Compare
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
|
please add UT case if it is necessary |
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
ok, added a unit-test suite for the MiniCPM-o 4.5 path in tests/model_executor/models/minicpmo_4_5 |
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
There was a problem hiding this comment.
could we have E2E tests covering offline and online inference!?
There was a problem hiding this comment.
I suggest to add them in nightly-test, please follow the corresponding md
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: GKangaroo <1095103651@qq.com> Co-authored-by: GKangaroo <gqx24@mails.tsinghua.edu.cn> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: GKangaroo <1095103651@qq.com> Co-authored-by: GKangaroo <gqx24@mails.tsinghua.edu.cn> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Purpose
Hi team — I'm from the MiniCPM-V / MiniCPM-o model team and have been
maintaining the integration of the V and O series into vLLM. We have been
following
vllm-omnisince day one, and this PR brings bothMiniCPM-o 2.6 and MiniCPM-o 4.5 into the project.
Apologies for the delay — over the past weeks the team has been focused on
shipping the MiniCPM-o technical report
(arXiv 2604.27393) and the
MiniCPM-V 4.6 release. Hope this PR makes it easier for the community to
serve the MiniCPM-o family on
vllm-omni, and we look forward to deepercollaboration on omni-modal serving going forward.
What's added
Models (
vllm_omni/model_executor/models/)minicpmo_4_5/: full omni pipeline for MiniCPM-o 4.5minicpmo_4_5_omni.py— top-level conditional generation wrapperminicpmo_4_5_omni_llm.py— thinker (LLM) stageminicpmo_4_5_omni_tts.py— talker (TTS) stageminicpmo_4_5_omni_t2w.py— token-to-waveform stageminicpmo_2_6/: full omni pipeline for MiniCPM-o 2.6 (same 4-filelayout as 4.5).
model_executor/models/registry.pyfor all 8 newarchitectures (
MiniCPMO{26,45}Omni{,LLM,TTS,T2W}ForConditionalGeneration).Stage input processors
(
vllm_omni/model_executor/stage_input_processors/)minicpmo_2_6_omni.py,minicpmo_4_5_omni.py— providellm2tts/tts2t2wadapters wired into the stage YAMLs below.Default stage configs (
vllm_omni/model_executor/stage_configs/)minicpmo.yaml— MiniCPM-o 2.6 defaultminicpmo_8x4090.yaml— MiniCPM-o 2.6 on an 8×4090 hostminicpmo45_2gpu.yaml— MiniCPM-o 4.5, 2-GPU layoutminicpmo45_3gpu.yaml— MiniCPM-o 4.5, 3-GPU (thinker TP=2)minicpmo45_8x4090.yaml— MiniCPM-o 4.5 on an 8×4090 hostOnline serving example (
examples/online_serving/minicpmo/)gradio_demo.py,run_gradio_demo.sh,README.md— single Gradio UIthat drives both 2.6 and 4.5 endpoints over the OpenAI-compatible API.
API server (
vllm_omni/entrypoints/openai/api_server.py)trust_remote_code=True(with GPUvisibility temporarily hidden) so HuggingFace
transformers_modulesisregistered in the API server process. This is required for ZMQ pickle
deserialization of MiniCPM-o stage outputs that reference dynamic
modules. Failures cleanly fall through, so non-
trust_remote_codemodels are unaffected.
Notes
This PR was merged with the latest
main(clean fast-forward fromfdb0efea); MiniCPM-o-specific code lives entirely under the pathslisted above and the changes outside those paths are limited to the
registry entry and the API-server pre-load described above.
Test Plan
We validate both models via the OpenAI-compatible server and the Gradio
demo shipped in
examples/online_serving/minicpmo/.1. Launch a backend server