Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "enable_prefix_caching" args for vllm engine. #2939

Merged
merged 3 commits into from
Jan 23, 2025

Conversation

Leoyzen
Copy link
Contributor

@Leoyzen Leoyzen commented Jan 20, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

The Automatic Prefix Caching is very useful for modeling service.
This PR add argument (--enable_prefix_caching) for both infer and deploy mode. Prefix caching disable by default.

Experiment results

@Leoyzen Leoyzen force-pushed the feature/vllm-prefix-caching branch from 180f9a2 to 0b19a64 Compare January 22, 2025 14:34
@Leoyzen Leoyzen changed the title [WIP]add "enable_prefix_caching" args for vllm engine. add "enable_prefix_caching" args for vllm engine. Jan 22, 2025
@Jintao-Huang
Copy link
Collaborator

thanks!

@Jintao-Huang Jintao-Huang merged commit e02ebfd into modelscope:main Jan 23, 2025
2 checks passed
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Jan 23, 2025
…-qwen-prm

* commit '6524bcc5caf7b63307f458fe45356ad18bf8f3b1': (21 commits)
  Fix vllm docs link & fix web-ui (modelscope#2970)
  add "enable_prefix_caching" args for vllm engine. (modelscope#2939)
  fix install_all.sh
  ppo compat transformers>=4.47.* (modelscope#2964)
  fix seq_cls patcher (modelscope#2963)
  fix max_length error print (modelscope#2960)
  update quant_mllm shell (modelscope#2959)
  update web-ui images (modelscope#2958)
  update requirements (modelscope#2957)
  fix bugs (modelscope#2954)
  fix citest (modelscope#2953)
  fix infer_stream (modelscope#2952)
  fix demo_hf (modelscope#2951)
  support deepseek_r1_distill (modelscope#2946)
  Fix mllm seq cls (modelscope#2945)
  Support minimax (modelscope#2943)
  Fix quant template (modelscope#2942)
  support deepseek-ai/DeepSeek-R1 (modelscope#2940)
  fix bugs (modelscope#2938)
  Support mllm seq_cls/rm (modelscope#2934)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants