Skip to content

[Core] Add prompt_preprocess_func to stage config for server-side pro…#2796

Open
adamkbaranowski wants to merge 4 commits into
vllm-project:mainfrom
adamkbaranowski:prompt-preprocess-func
Open

[Core] Add prompt_preprocess_func to stage config for server-side pro…#2796
adamkbaranowski wants to merge 4 commits into
vllm-project:mainfrom
adamkbaranowski:prompt-preprocess-func

Conversation

@adamkbaranowski
Copy link
Copy Markdown

Purpose

Add a new stage config field prompt_preprocess_func that transforms the raw user prompt before Stage 0 tokenization. This enables server-side chat template application and prompt formatting without requiring client-side changes.

The mechanism follows the same pattern as prompt_expand_func and cfg_kv_collect_func: a dotted Python path in the YAML config loaded via importlib.

Also fixes pre-existing bug: custom_process_input_func used hasattr without null check, causing None.rsplit() crash when the attribute is explicitly None.

Changes:

  • vllm_omni/engine/stage_init_utils.py: Added prompt_preprocess_func field to StageMetadata, importlib loading, null guard fix for custom_process_input_func
  • vllm_omni/engine/async_omni_engine.py: Collects, stores, and calls prompt_preprocess_func before tokenization
  • tests/engine/test_prompt_preprocess_func.py: 4 unit tests
  • docs/configuration/stage_configs.md: Documentation
  • docs/contributing/model/adding_omni_model.md: Documentation

Test Plan

pytest -s -v tests/engine/test_prompt_preprocess_func.py -m "core_model and cpu"

Test Result

test_prompt_preprocess_func_loaded_from_config PASSED
test_prompt_preprocess_func_none_when_not_configured PASSED
test_prompt_preprocess_func_none_when_attr_missing PASSED
test_initialize_stages_collects_prompt_preprocess_func PASSED
======================== 4 passed, 18 warnings in 8.59s ========================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands.
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. Not applicable - new opt-in feature.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

…mpt formatting

Signed-off-by: adamkbaranowski <adam.baranowski@protonmail.com>
@adamkbaranowski adamkbaranowski force-pushed the prompt-preprocess-func branch from 377a7b7 to 87d29f7 Compare April 14, 2026 12:35
@adamkbaranowski
Copy link
Copy Markdown
Author

Tested by adding a system prompt to GLM as an example.

Prompt "a dog" without prompt rewriting.
output_vanilla
The same with a prompt template below.
output_preprocess
diff:

diff --git a/vllm_omni/model_executor/stage_configs/glm_image.yaml b/vllm_omni/model_executor/stage_configs/glm_image.yaml
index 3cc23e1e..ae986026 100644
--- a/vllm_omni/model_executor/stage_configs/glm_image.yaml
+++ b/vllm_omni/model_executor/stage_configs/glm_image.yaml
@@ -8,6 +8,7 @@ stage_args:
   # for conditioning the diffusion process.
   - stage_id: 0
     stage_type: llm
+    prompt_preprocess_func: vllm_omni.model_executor.stage_input_processors.glm_image.preprocess_prompt_for_glm
     runtime:
       process: true
       devices: "0"

diff --git a/vllm_omni/model_executor/stage_input_processors/glm_image.py b/vllm_omni/model_executor/stage_input_processors/glm_image.py
index 3063620b..b5a16290 100644
--- a/vllm_omni/model_executor/stage_input_processors/glm_image.py
+++ b/vllm_omni/model_executor/stage_input_processors/glm_image.py
@@ -4,6 +4,29 @@
 
 from typing import Any
 
+from vllm.logger import init_logger as _init_logger
+
+_logger = _init_logger(__name__)
+
+
+def preprocess_prompt_for_glm(prompt: Any) -> Any:
+    """Apply GLM-Image chat template before Stage 0 tokenization.
+
+    This preprocessor wraps the raw user prompt in the chat format expected
+    by the GLM-Image AR model so that clients can send plain text via
+    ``/v1/images/generations`` without knowing the internal template.
+
+    NOTE: verify the exact template against the model's tokenizer_config.json.
+    """
+    SYSTEM_PROMPT = "You are an image generation assistant. Take user's prompt and rewrite it as if it was a task from a professional cartoon studio."
+    if isinstance(prompt, dict) and "prompt" in prompt:
+        raw_text = prompt["prompt"]
+        prompt["prompt"] = f"[gMASK]<sop><|system|>\n{SYSTEM_PROMPT}<|user|>\n{raw_text}<|assistant|>\n"
+    elif isinstance(prompt, str):
+        prompt = f"[gMASK]<sop><|system|>\n{SYSTEM_PROMPT}<|user|>\n{prompt}<|assistant|>\n"
+    return prompt
+
+
 import torch
 from vllm.inputs import TextPrompt
 from vllm.logger import init_logger

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: LGTM with minor suggestions

The PR cleanly adds prompt_preprocess_func following the established patterns for prompt_expand_func and cfg_kv_collect_func. The bug fix for custom_process_input_func is correct and well-motivated.

What looks good

  • Consistent pattern: The importlib loading, null-safe getattr, and "last stage wins" collection all mirror the existing func hooks exactly.
  • Bug fix: Replacing hasattr + direct attribute access with getattr(..., None) + truthiness check for custom_process_input_func prevents the None.rsplit() crash. Nice catch.
  • Tests: Good coverage of the loading path (dotted path resolution, None when missing, None when not configured) and the engine-level collection.
  • Security: importlib loading from server-side YAML configs is the same trust model as the other func hooks — acceptable since these are operator-deployed configs, not user input.
  • Docs: Clear explanation with a concrete use case (GLM-Image chat template).

Minor observations (non-blocking)

  1. Diffusion branch omits prompt_preprocess_func: In extract_stage_metadata, the diffusion return path (around the original line 212) does not explicitly pass prompt_preprocess_func. This works because the dataclass default is None, but it means if someone adds prompt_preprocess_func to a diffusion stage config, it would be silently ignored. This is the same as prompt_expand_func so it's consistent, but worth being aware of.

  2. No input/output type validation on the preprocessor: The preprocessor receives and returns a prompt, but there's no validation that the return type matches the input type (dict vs. str). A preprocessor that accidentally returns None or changes the type could cause confusing downstream errors. Consider adding a lightweight assertion or at least a log warning — but this is a "nice to have" for a follow-up, not a blocker.

  3. Test uses copy.copy as the dotted path: This is pragmatic for testing importlib resolution, but a one-line comment explaining why copy.copy was chosen (avoids depending on model code) would help future readers. The docstring on _identity_preprocess suggests it was meant to be used but wasn't.

Overall this is a clean, well-scoped addition. Approved.

@lishunyang12
Copy link
Copy Markdown
Collaborator

@amy-why-3459 PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants