Add filtering for chat template kwargs#25794
Add filtering for chat template kwargs#25794DarkLight1337 merged 7 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Code Review
This pull request addresses a critical security vulnerability (GHSA-6fvq-23cw-5628) related to arbitrary code execution via malicious chat templates. The core of the fix is the new resolve_chat_template_kwargs function, which intelligently filters keyword arguments passed to the Jinja2 template renderer. It ensures that only arguments both supported by the apply_chat_template function and explicitly declared as variables within the template are passed, effectively blocking the injection of a malicious chat_template. The changes also introduce a --trust-request-chat_template flag for defense-in-depth, preventing user-supplied templates from being used unless explicitly allowed. The implementation is robust, and the accompanying tests correctly validate the fix. Overall, this is a solid and important security patch.
|
@Isotr0py PTAL at the V1 test failure |
|
Let's also reject the request if it passes an untrusted chat template, to avoid silent regressions |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: yewentao256 <zhyanwentao@126.com>
| resolved_kwargs = resolve_chat_template_kwargs( | ||
| tokenizer=tokenizer, | ||
| chat_template=hf_chat_template, | ||
| chat_template_kwargs=kwargs, | ||
| ) |
There was a problem hiding this comment.
is it called by each request? do we need to cache it?
There was a problem hiding this comment.
Yea, it's called by each request. This function won't compile the chat template, so I think it won't introduces too much overhead.
Given that kwargs can be various depending on the request's extra body, it's not really applicable to cache it.
There was a problem hiding this comment.
Yea, it's called by each request. This function won't compile the chat template, so I think it won't introduces too much overhead.
Given that
kwargscan be various depending on the request's extra body, it's not really applicable to cache it.
using npu(ascend 910c) it costs about 10-20ms, it is great to cache it
There was a problem hiding this comment.
It seems this change might be linked to a ~10x latency jump we saw in our benchmarks on 0.11.0 (where CPU hit 100%).
The caching in 0.11.1 appears to solve it, but upgrading is a bit tricky since it requires CUDA ≥ 12.9.
There was a problem hiding this comment.
You can still use older CUDA versions if you install vLLM from source.
|
in which case do we ever use chat template from users? I thought we should just disallow it. |
|
We may also want a similar |
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Uh oh!
There was an error while loading. Please reload this page.