[Core] Add prompt_preprocess_func to stage config for server-side pro…#2796
[Core] Add prompt_preprocess_func to stage config for server-side pro…#2796adamkbaranowski wants to merge 4 commits into
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…mpt formatting Signed-off-by: adamkbaranowski <adam.baranowski@protonmail.com>
377a7b7 to
87d29f7
Compare
lishunyang12
left a comment
There was a problem hiding this comment.
Review: LGTM with minor suggestions
The PR cleanly adds prompt_preprocess_func following the established patterns for prompt_expand_func and cfg_kv_collect_func. The bug fix for custom_process_input_func is correct and well-motivated.
What looks good
- Consistent pattern: The importlib loading, null-safe
getattr, and "last stage wins" collection all mirror the existing func hooks exactly. - Bug fix: Replacing
hasattr+ direct attribute access withgetattr(..., None)+ truthiness check forcustom_process_input_funcprevents theNone.rsplit()crash. Nice catch. - Tests: Good coverage of the loading path (dotted path resolution, None when missing, None when not configured) and the engine-level collection.
- Security:
importlibloading from server-side YAML configs is the same trust model as the other func hooks — acceptable since these are operator-deployed configs, not user input. - Docs: Clear explanation with a concrete use case (GLM-Image chat template).
Minor observations (non-blocking)
-
Diffusion branch omits
prompt_preprocess_func: Inextract_stage_metadata, the diffusion return path (around the original line 212) does not explicitly passprompt_preprocess_func. This works because the dataclass default isNone, but it means if someone addsprompt_preprocess_functo a diffusion stage config, it would be silently ignored. This is the same asprompt_expand_funcso it's consistent, but worth being aware of. -
No input/output type validation on the preprocessor: The preprocessor receives and returns a prompt, but there's no validation that the return type matches the input type (dict vs. str). A preprocessor that accidentally returns
Noneor changes the type could cause confusing downstream errors. Consider adding a lightweight assertion or at least a log warning — but this is a "nice to have" for a follow-up, not a blocker. -
Test uses
copy.copyas the dotted path: This is pragmatic for testing importlib resolution, but a one-line comment explaining whycopy.copywas chosen (avoids depending on model code) would help future readers. The docstring on_identity_preprocesssuggests it was meant to be used but wasn't.
Overall this is a clean, well-scoped addition. Approved.
|
@amy-why-3459 PTAL |


Purpose
Add a new stage config field
prompt_preprocess_functhat transforms the raw user prompt before Stage 0 tokenization. This enables server-side chat template application and prompt formatting without requiring client-side changes.The mechanism follows the same pattern as
prompt_expand_funcandcfg_kv_collect_func: a dotted Python path in the YAML config loaded viaimportlib.Also fixes pre-existing bug:
custom_process_input_funcusedhasattrwithout null check, causingNone.rsplit()crash when the attribute is explicitlyNone.Changes:
vllm_omni/engine/stage_init_utils.py: Addedprompt_preprocess_funcfield toStageMetadata, importlib loading, null guard fix forcustom_process_input_funcvllm_omni/engine/async_omni_engine.py: Collects, stores, and callsprompt_preprocess_funcbefore tokenizationtests/engine/test_prompt_preprocess_func.py: 4 unit testsdocs/configuration/stage_configs.md: Documentationdocs/contributing/model/adding_omni_model.md: DocumentationTest Plan
pytest -s -v tests/engine/test_prompt_preprocess_func.py -m "core_model and cpu"Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)