Support funaudiochat s2s#1748
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d2f3e1f43
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if hf_config_path is not None or model_arch not in _COSYVOICE3_MODEL_ARCHES: | ||
| return hf_config_path | ||
|
|
||
| return str(Path(__file__).resolve().parent.parent / "model_executor" / "models" / "cosyvoice3" / "hf_config") |
There was a problem hiding this comment.
Avoid defaulting hf_config_path to a nonexistent bundle
This fallback always returns .../model_executor/models/cosyvoice3/hf_config when hf_config_path is unset, but that directory is not present in this repo/package, so default stage configs (for example funaudiochat_s2s.yaml, which does not set hf_config_path) will pass an invalid local path into model config creation and fail before loading the model unless users manually override hf_config_path.
Useful? React with 👍 / 👎.
| valid_rows = audio_token_ids.any(dim=-1) | ||
| audio_token_ids = audio_token_ids[valid_rows] |
There was a problem hiding this comment.
Keep all-zero codec rows when flattening stage-0 tokens
Using audio_token_ids.any(dim=-1) drops any 2D row made entirely of zeros, but token 0 is treated as valid elsewhere in this same function (filtered >= 0) and in tests, so a legitimate all-zero codec group from stage-0 would be silently removed before code2wav, shortening or corrupting synthesized audio for that request.
Useful? React with 👍 / 👎.
5910a38 to
3807c1b
Compare
|
resolve conflicts please |
|
Tried to run this locally but wasn't able to start the server. There are several API incompatibilities with the current
It looks like this PR was developed against a different version of vllm-omni. Could you rebase onto the latest |
1768dd8 to
b4d8620
Compare
|
|
sorry wait, seems still have a mismatch problem |
4921519 to
b6ae819
Compare
|
@ramos please let me know if it is ready to be reviewed again. thanks! |
ok! When it's done, I will notify you asap. |
0a7d6df to
fe924de
Compare
Thanks for waiting. I have checked this branch and I believe it's ready right now @linyueqian |
|
@claude can you also help review this again |
|
@codex please also review this? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 813dc0e16f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| flat = prompt_token_ids[0] if isinstance(prompt_token_ids[0], list) else prompt_token_ids | ||
| if len(flat) == 0: |
There was a problem hiding this comment.
Split prompt_token_ids by request before decoding
This helper collapses batched prompt_token_ids to a single 1-D sequence (prompt_token_ids[0] for list inputs), so when stage-1 runs with more than one request in a batch, requests after index 0 will decode from the wrong codec stream. This corrupts per-request audio output as soon as max_batch_size is increased beyond 1 for throughput.
Useful? React with 👍 / 👎.
| del positions, intermediate_tensors, inputs_embeds | ||
|
|
||
| sampling_metadata = kwargs.get("sampling_metadata") | ||
| token, is_dummy_profile = self._build_decode_tokens(input_ids, sampling_metadata) |
There was a problem hiding this comment.
Decode batched code2wav inputs per request
The forward path builds a single token tensor and emits one concatenated waveform, without using per-request boundaries, so batched requests are merged into one utterance instead of producing isolated outputs. If operators raise stage-1 batching (runtime.max_batch_size > 1), this mixes users’ codec tokens and returns incorrect audio for all requests in that batch.
Useful? React with 👍 / 👎.
Signed-off-by: ramos.ma <wyrmyf@gmail.com> Signed-off-by: mayufeng <mayufeng@asr-h100>
Signed-off-by: ramos.ma <wyrmyf@gmail.com>
Signed-off-by: ramos.ma <wyrmyf@gmail.com>
Signed-off-by: mayufeng <mayufeng@asr-h100>
Signed-off-by: mayufeng <mayufeng@asr-h100>
Signed-off-by: mayufeng <mayufeng@asr-h100>
Signed-off-by: ramos.ma <wyrmyf@gmail.com> Signed-off-by: mayufeng <mayufeng@asr-h100>
fdb750d to
567e6d5
Compare
|
Tested E2E with local checkpoints ( Bugs found1.
|
Signed-off-by: mayufeng <mayufeng@asr-h100>
Signed-off-by: ramos.ma <wyrmyf@gmail.com>
Thanks for your thorough review. @codex please review this again follow the rule of vllm-omni-contrib. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 600f45bd90
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| local_path = Path(model_path) | ||
| if local_path.exists(): |
There was a problem hiding this comment.
Expand user path before probing local checkpoint
_resolve_model_path tests Path(model_path).exists() without expanding ~, so a valid local override like ~/ckpt/Fun-CosyVoice3-0.5B-2512 is treated as missing and the code falls through to snapshot_download(model_path). In stage YAML/CLI workflows this causes local-checkpoint runs to fail with a repo/path resolution error even though the directory exists; expanding user/env components before the existence check would keep local S2S setups working.
Useful? React with 👍 / 👎.
|
resolve conflicts please. |
|
Could you also add the corresponding buildkite CI entries (test-merge.yml / test-nightly.yml) so these tests actually run on L4? See #1911 for reference on how the Qwen3-TTS tests are wired up. |
Signed-off-by: mayufeng <mayufeng@example.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Signed-off-by: ramos.ma <wyrmyf@gmail.com>
0013f49 to
1076195
Compare
|
@nemoramo please rebase fix conflicts |
PR Description
Summary
This PR adds Speech-to-Speech (S2S) support for
FunAudioLLM/Fun-Audio-Chat-8Bin vLLM-Omni.Following the vLLM-Omni contributing guide, this PR keeps the scope limited to the upstream-ready integration itself:
Changes
Model integration
Runtime support
OmniRequestOutputhandling when pipelinerequest_outputis a listTests and docs
OmniRequestOutputregression coveragedocs/models/supported_models.mdNotes
FUN_AUDIO_CHAT_HOMEchange minimal and upstream-ready
Testing
Ran the following checks locally: