add "enable_prefix_caching" args for vllm engine. #2939

Leoyzen · 2025-01-20T08:06:38Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

The Automatic Prefix Caching is very useful for modeling service.
This PR add argument (--enable_prefix_caching) for both infer and deploy mode. Prefix caching disable by default.

Experiment results

Jintao-Huang · 2025-01-23T05:52:32Z

thanks!

…-qwen-prm * commit '6524bcc5caf7b63307f458fe45356ad18bf8f3b1': (21 commits) Fix vllm docs link & fix web-ui (modelscope#2970) add "enable_prefix_caching" args for vllm engine. (modelscope#2939) fix install_all.sh ppo compat transformers>=4.47.* (modelscope#2964) fix seq_cls patcher (modelscope#2963) fix max_length error print (modelscope#2960) update quant_mllm shell (modelscope#2959) update web-ui images (modelscope#2958) update requirements (modelscope#2957) fix bugs (modelscope#2954) fix citest (modelscope#2953) fix infer_stream (modelscope#2952) fix demo_hf (modelscope#2951) support deepseek_r1_distill (modelscope#2946) Fix mllm seq cls (modelscope#2945) Support minimax (modelscope#2943) Fix quant template (modelscope#2942) support deepseek-ai/DeepSeek-R1 (modelscope#2940) fix bugs (modelscope#2938) Support mllm seq_cls/rm (modelscope#2934) ...

Leoyzen added 3 commits January 20, 2025 15:53

add "enable_prefix_caching" args for vllm engine.

a863aa8

add docs for vllm prefix caching feature.

13c7711

fix infer_args double quota lint error

0b19a64

Leoyzen force-pushed the feature/vllm-prefix-caching branch from 180f9a2 to 0b19a64 Compare January 22, 2025 14:34

Leoyzen changed the title ~~[WIP]add "enable_prefix_caching" args for vllm engine.~~ add "enable_prefix_caching" args for vllm engine. Jan 22, 2025

Jintao-Huang approved these changes Jan 23, 2025

View reviewed changes

Jintao-Huang merged commit e02ebfd into modelscope:main Jan 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add "enable_prefix_caching" args for vllm engine. #2939

add "enable_prefix_caching" args for vllm engine. #2939

Leoyzen commented Jan 20, 2025 •

edited

Loading

Jintao-Huang commented Jan 23, 2025

add "enable_prefix_caching" args for vllm engine. #2939

add "enable_prefix_caching" args for vllm engine. #2939

Conversation

Leoyzen commented Jan 20, 2025 • edited Loading

PR type

PR information

Experiment results

Jintao-Huang commented Jan 23, 2025

Leoyzen commented Jan 20, 2025 •

edited

Loading