Skip to content

[WIP][Model] Add Ming-flash-omni-2.0 Image Generation (Diffusion) Stage#2875

Open
ZhengWG wants to merge 14 commits intovllm-project:mainfrom
ZhengWG:py/ming-omni-dev
Open

[WIP][Model] Add Ming-flash-omni-2.0 Image Generation (Diffusion) Stage#2875
ZhengWG wants to merge 14 commits intovllm-project:mainfrom
ZhengWG:py/ming-omni-dev

Conversation

@ZhengWG
Copy link
Copy Markdown

@ZhengWG ZhengWG commented Apr 17, 2026

Purpose

This PR extends #1822 (Ming-flash-omni-2.0 Thinker stage) by adding the image generation (diffusion) stage for inclusionAI/Ming-flash-omni-2.0 https://huggingface.co/inclusionAI/Ming-flash-omni-2.0, enabling end-to-end text-to-image.

Modified HF model repo to use: https://huggingface.co/Jonathan1909/Ming-flash-omni-2.0

cc @yuanheng-zhao

Usage

Start the server:

vllm serve /home/admin/model/ --omni \
--stage-configs-path vllm_omni/model_executor/stage_configs/ming_flash_omni_dual.yaml  \
--trust-remote-code --log-stats --port 8188 --host 0.0.0.0

Test request:

curl http://127.0.0.1:8188/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/home/admin/model/",
    "messages": [
      {"role":"user","content":"Please draw a cute cat."}
    ],
    "modalities": ["image"]
  }' -o /tmp/ming_response.json


python -c "
import base64, json
r = json.load(open('/tmp/ming_response.json'))
url = r['choices'][0]['message']['content'][0]['image_url']['url']
png = base64.b64decode(url.split(',')[1])
open('/tmp/ming_cat.png', 'wb').write(png)
print('PNG bytes:', len(png))
"
ming_cat

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@ZhengWG ZhengWG requested a review from hsliuustc0106 as a code owner April 17, 2026 08:17
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

This PR is marked [WIP]. Ready for full review when work-in-progress status is removed.

Missing PR Body Evidence Required:

Before review can proceed, please add the following to the PR description:

  • vLLM-Omni generation script (offline Omni or online vllm serve)
  • Generated sample output (image)
  • vLLM-Omni e2e latency (hardware: GPU model, count; resolution; steps)
  • vLLM-Omni peak VRAM usage (GB)

See Diffusion Model Requirements for details.

Preliminary scan available on request once the above evidence is provided.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Ready for full review when WIP status removed. Preliminary scan available on request.

When ready for review, please ensure PR description includes:

  • Generation script (offline or online )
  • Sample outputs (you already have an image - good)
  • End-to-end latency (hardware specs, resolution, steps)
  • Peak VRAM usage

Also consider adding a diffusers baseline comparison for performance numbers.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

FYI, #1822 merged now, resolve conflicts please

@yuanheng-zhao
Copy link
Copy Markdown
Contributor

The thinker stage has been merged to main, let's rebase onto main with cutting off from the thinker stage changes.
For example, git rebase --onto main the-thinker-branch your-current-branch

@ZhengWG
Copy link
Copy Markdown
Author

ZhengWG commented Apr 17, 2026

The thinker stage has been merged to main, let's rebase onto main with cutting off from the thinker stage changes.

For example, git rebase --onto main the-thinker-branch your-current-branch

OK,I will do it ASAP.

ZhengWG added 11 commits April 19, 2026 16:08
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
Signed-off-by: ZhengWG <zwg0606@gmail.com>
@yuanheng-zhao
Copy link
Copy Markdown
Contributor

Ming-flash-omni-2.0 talker (TTS & Omni-Speech) #2890 has been merged to main. Shall we rebase on/merge from main please? Thanks! @ZhengWG

@ZhengWG
Copy link
Copy Markdown
Author

ZhengWG commented Apr 23, 2026

Ming-flash-omni-2.0 talker (TTS & Omni-Speech) #2890 has been merged to main. Shall we rebase on/merge from main please? Thanks! @ZhengWG

Ok, I will do it ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants