Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c126213
add configs and stage yaml
yuanheng-zhao Feb 21, 2026
b672354
draft: add BailingMoeV2Model
yuanheng-zhao Feb 23, 2026
5d2e529
upd
yuanheng-zhao Feb 23, 2026
301a3d5
draft: add thinker stage audio, vision encoders, connectors
yuanheng-zhao Feb 24, 2026
69332ae
upd audio, vision encoders, connectors
yuanheng-zhao Feb 24, 2026
de162a5
draft: add thinker and omni gen cls
yuanheng-zhao Feb 24, 2026
45608c2
draft: add processor
yuanheng-zhao Feb 24, 2026
aa03a2f
draft: upd processor
yuanheng-zhao Feb 25, 2026
f27e94e
draft: upd processor
yuanheng-zhao Feb 25, 2026
d966861
upd and fix
yuanheng-zhao Feb 28, 2026
6901d8e
add to model registry
yuanheng-zhao Mar 1, 2026
09d9cfe
hack to fix file not in hf repo
yuanheng-zhao Mar 1, 2026
7a85029
fix name word_embeddings
yuanheng-zhao Mar 3, 2026
5c3273b
fix and register temp processor
yuanheng-zhao Mar 4, 2026
daf6798
fix thinker stage weight loading
yuanheng-zhao Mar 4, 2026
3b5b0be
adapt to vllm layer Attention
yuanheng-zhao Mar 4, 2026
39aa66f
refine the temp hack in arg_util
yuanheng-zhao Mar 5, 2026
d57c1ae
make ming thinker dummy inputs builder ret all modalities
yuanheng-zhao Mar 6, 2026
06cb1fb
upd
yuanheng-zhao Mar 8, 2026
67957d3
use vllm FusedMoE
yuanheng-zhao Mar 9, 2026
264dbb3
Adapt ming configs to transformer_utils configs
yuanheng-zhao Mar 11, 2026
84ea1e5
clean ming configs
yuanheng-zhao Mar 11, 2026
b85aa2a
trivial revert
yuanheng-zhao Mar 11, 2026
da03cf2
trivial upd
yuanheng-zhao Mar 11, 2026
16efd7f
register omni custom configs to vllm configs registry
yuanheng-zhao Mar 12, 2026
f8cc6b7
register tokenizer for custom config
yuanheng-zhao Mar 12, 2026
e7c0d1c
upd rotary embeddding impl, reuse vllm layers
yuanheng-zhao Mar 16, 2026
7956f2e
upd ming's processors (must upload modified ver preprocessor_config)
yuanheng-zhao Mar 17, 2026
ef3da97
upd data processors & fix audio encoder
yuanheng-zhao Mar 17, 2026
4ad80e8
fix placeholder_audio_loc_lens prep
yuanheng-zhao Mar 18, 2026
1476099
upd and clean code
yuanheng-zhao Mar 24, 2026
b6b77fb
rm debug logs
yuanheng-zhao Mar 24, 2026
926c749
add e2e offline test
yuanheng-zhao Mar 26, 2026
97242e8
add e2e example
yuanheng-zhao Mar 26, 2026
34db46d
fix rebase
yuanheng-zhao Mar 26, 2026
581de44
upd ming e2e example
yuanheng-zhao Mar 28, 2026
54cfd08
upd ming e2e offline tests
yuanheng-zhao Mar 28, 2026
6efbad9
upd and cleanup
yuanheng-zhao Mar 28, 2026
de44539
clean up ming processor
yuanheng-zhao Mar 29, 2026
155ccc7
cleanup weigths loading and utils
yuanheng-zhao Mar 29, 2026
7df2488
flatten ming components
yuanheng-zhao Mar 29, 2026
1f9947b
inherit from Qwen2VLProcessingInfo
yuanheng-zhao Mar 29, 2026
0f85a42
upd and cleanup
yuanheng-zhao Mar 29, 2026
c0c3576
improve audio proj path
yuanheng-zhao Mar 29, 2026
030f7ba
upd and cleanup
yuanheng-zhao Mar 30, 2026
0019bda
add e2e offline example readme
yuanheng-zhao Mar 30, 2026
9342b3f
fix vision/audio mask
yuanheng-zhao Mar 30, 2026
68f4a9d
upd e2e example: use processr chat template, verify reasoning mode
yuanheng-zhao Mar 31, 2026
3221748
upd BailingMoeV2.load_weights
yuanheng-zhao Mar 31, 2026
1db2801
fix PP
yuanheng-zhao Mar 31, 2026
da9d746
upd mrope handling
yuanheng-zhao Apr 2, 2026
dceb27a
canonicalize thinker; fix image mask; fix image prompt update
yuanheng-zhao Apr 2, 2026
108a4a2
use shared fused moe
yuanheng-zhao Apr 4, 2026
8d406df
enable torch compile
yuanheng-zhao Apr 4, 2026
ab5ef47
add e2e online test
yuanheng-zhao Apr 5, 2026
80653e9
trivial cleanup
yuanheng-zhao Apr 5, 2026
966f42b
Merge branch 'main' into model/ming-omni
yuanheng-zhao Apr 5, 2026
6ef71bb
upd imports and config modality tokens
yuanheng-zhao Apr 5, 2026
809f5c0
rm redundant config patch
yuanheng-zhao Apr 7, 2026
3cfe105
extract whisper utils
yuanheng-zhao Apr 7, 2026
0cc613c
trivial: add license header to test
yuanheng-zhao Apr 7, 2026
6cb13e5
fix ming processor modality token counts
yuanheng-zhao Apr 7, 2026
32a775c
upd cu_seqlens as required
yuanheng-zhao Apr 7, 2026
fe7ad5c
Add online serving example and doc (thinker)
yuanheng-zhao Apr 7, 2026
b0f91fa
trivial upd readme
yuanheng-zhao Apr 8, 2026
a4c5ed2
trivial upd
yuanheng-zhao Apr 9, 2026
aeae057
Merge from main
yuanheng-zhao Apr 9, 2026
d0f7b96
Merge from main
yuanheng-zhao Apr 13, 2026
300b56c
cleanup unused config cls
yuanheng-zhao Apr 13, 2026
cfd949c
enforce all sub processors (audio,image/video towers) exist
yuanheng-zhao Apr 14, 2026
121dd76
trivial cleanup
yuanheng-zhao Apr 14, 2026
d11cf28
Merge from main
yuanheng-zhao Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions examples/offline_inference/ming_flash_omni/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Ming-flash-omni 2.0

[Ming-flash-omni-2.0](https://github.com/inclusionAI/Ming) is an omni-modal model supporting text, image, video, and audio understanding, with outputs in text, image, and audio. For now, Ming-flash-omni-2.0 in vLLM-Omni is supported with thinker stage (multi-modal understanding).

## Setup

Please refer to the [stage configuration documentation](https://docs.vllm.ai/projects/vllm-omni/en/latest/configuration/stage_configs/) to configure memory allocation appropriately for your hardware setup.

## Run examples

### Text-only
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type text
```

#### Reasoning (Thinking Mode)

Reasoning (Thinking) mode is enabled via applying "detailed thinking on" when building the system prompt template (in `apply_chat_template`).

In the end2end example, a default problem for thinking mode is provided, as referred to the example usage of Ming's cookbook;
To utilize it, you have to download the example figure from https://github.com/inclusionAI/Ming/blob/3954fcb880ff5e61ff128bcf7f1ec344d46a6fe3/figures/cases/3_0.png

```bash
python examples/offline_inference/ming_flash_omni/end2end.py -q reasoning --image-path ./3_0.png
```

### Image understanding
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_image

# With a local image
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_image --image-path /path/to/image.jpg
```

### Audio understanding
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_audio

# With a local audio file
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_audio --audio-path /path/to/audio.wav
```

### Video understanding
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_video

# With a local video and custom frame count
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_video --video-path /path/to/video.mp4 --num-frames 16
```

### Mixed modalities (image + audio)
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_mixed_modalities \
--image-path /path/to/image.jpg \
--audio-path /path/to/audio.wav
```

If media file paths are not provided, the script uses built-in default assets.

### Modality control
To control output modalities (e.g. text-only output):
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_audio --modalities text
```

*For now, only text output is supported*

### Custom stage config
```bash
python examples/offline_inference/ming_flash_omni/end2end.py --query-type use_image \
--stage-configs-path /path/to/your_config.yaml
```

## Online serving

For online serving via the OpenAI-compatible API, see [examples/online_serving/ming_flash_omni/README.md](../../online_serving/ming_flash_omni/README.md).
Loading
Loading