[Core] Parse vLLM engine required fields from hf_config to model_arch_config#28454
[Core] Parse vLLM engine required fields from hf_config to model_arch_config#28454heheda12345 merged 34 commits intovllm-project:mainfrom
Conversation
|
Documentation preview: https://vllm--28454.org.readthedocs.build/en/28454/ |
There was a problem hiding this comment.
Code Review
This pull request introduces a model_arch_config as a prototype to refactor model configuration normalization. This is a significant and positive change towards better code structure and maintainability. The changes involve creating new configuration classes and parsers, and migrating the Llama model to use this new configuration structure. The overall approach is solid. I've found one critical issue that would cause a runtime error, which I've detailed in a specific comment. Once that is addressed, this PR will be in a great shape.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
A possible solution could be something like: class AllConfig(PretrainedConfig):
hidden_size: int
num_hidden_layers: int
num_attention_heads: int
use_deepseek_mla: bool
head_dim: int
vocab_size: int
num_key_value_heads: int
num_experts: intWhich would contain the union of all possible config fields using their standard names. Would that be an acceptable solution? |
zhuohan123
left a comment
There was a problem hiding this comment.
Thanks for the great work! Left some comments.
vllm/config/model_arch.py
Outdated
| use_deepseek_mla: bool | ||
| head_dim: int | ||
| vocab_size: int | ||
| num_key_value_heads: int |
There was a problem hiding this comment.
A model can have mixed different attention layers. Should we make some of the fields here per-layer?
@hmellor The main issue of this inheritance approach is that we want an explicit error if anybody try to access a field that is not in the defined config list. So when people implements something in the engine, they won't accidentally access a field that only exists for some models. |
The suggested
Something like the suggested I appreciate that this doesn't resolve the inheritance issue though. |
Got it, thanks for explaining! |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Based on the previous discussion, charlotte12l#2 (merged into current branch) has make the following modifications:
WIP:
|
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @charlotte12l, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
updated branch due to the v1-entrypoint timeout failure that is already fixed on main. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
Hi @charlotte12l, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…assmethod Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
Hi @charlotte12l, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
|
Hi @charlotte12l, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|

Purpose
See #24384 for more context.
Use llama3 as prototype
Design
model_arch_configexplicitly specify all standardized fields required for vllm runtimemodel arch parser read from config.json/params.json/etc. and perform the standardization process. The goal is to eventually remove most of the standardization logic from config/model.py once the migration to the new parser workflow is complete
For hf-model-arch-parser, if the model is not in
_CONFIG_REGISTRY, we still callAutoConfig.from_pretrained. This allows us to leverage the normalization already implemented in HuggingFace’sPretrainedConfig. A more standardized PretrainedConfig will enable a thinner, simpler parser layer.vllm/vllm/transformers_utils/config.py
Lines 79 to 102 in f0359ff
parser will inspect the model cls and get
per_layer_attn_cls. This information is useful for the KV cache managerTODOs
Test Plan
Besides, I run generate for llama3 and the outputs before/after this PR are the same.
the outputs before/after this PR are the same.
Test Result
passed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.