[Core] Parse vLLM engine required fields from hf_config to model_arch_config by charlotte12l · Pull Request #28454 · vllm-project/vllm

charlotte12l · 2025-11-11T08:25:52Z

Purpose

See #24384 for more context.

Use llama3 as prototype

Design

model_arch_config explicitly specify all standardized fields required for vllm runtime
model arch parser read from config.json/params.json/etc. and perform the standardization process. The goal is to eventually remove most of the standardization logic from config/model.py once the migration to the new parser workflow is complete

For hf-model-arch-parser, if the model is not in _CONFIG_REGISTRY, we still call AutoConfig.from_pretrained. This allows us to leverage the normalization already implemented in HuggingFace’s PretrainedConfig. A more standardized PretrainedConfig will enable a thinner, simpler parser layer.

vllm/vllm/transformers_utils/config.py

Lines 79 to 102 in f0359ff

    
           _CONFIG_REGISTRY: dict[str, type[PretrainedConfig]] = LazyConfigDict( 
        
               chatglm="ChatGLMConfig", 
        
               deepseek_vl_v2="DeepseekVLV2Config", 
        
               deepseek_v32=DeepseekV3Config, 
        
               flex_olmo="FlexOlmoConfig", 
        
               kimi_linear="KimiLinearConfig", 
        
               kimi_vl="KimiVLConfig", 
        
               RefinedWeb="RWConfig",  # For tiiuae/falcon-40b(-instruct) 
        
               RefinedWebModel="RWConfig",  # For tiiuae/falcon-7b(-instruct) 
        
               jais="JAISConfig", 
        
               mlp_speculator="MLPSpeculatorConfig", 
        
               medusa="MedusaConfig", 
        
               midashenglm="MiDashengLMConfig", 
        
               eagle="EAGLEConfig", 
        
               speculators="SpeculatorsConfig", 
        
               nemotron="NemotronConfig", 
        
               olmo3="Olmo3Config", 
        
               ovis="OvisConfig", 
        
               ultravox="UltravoxConfig", 
        
               step3_vl="Step3VLConfig", 
        
               step3_text="Step3TextConfig", 
        
               qwen3_next="Qwen3NextConfig", 
        
               lfm2_moe="Lfm2MoeConfig", 
        
           )

parser will inspect the model cls and get per_layer_attn_cls. This information is useful for the KV cache manager

TODOs

[Should probably implement&merge it before this PR] Refactor out the existing normalizations in config/model.py so it can be shared by model-arch-parser and hf_config during the migration
Some fields are not normalized yet: max_model_len
Mistral parser
Get_kv_cache spec use model_arch_config.per_layer_attn_cls, and tests
Have a clear separation between model_info and model_arch_config, per [RFC]: Decoupling vLLM Configuration from Hugging Face #24384 (comment)
Gradually migrate other models(also note that some models have configuration class re-implemented in vllm) and tests

Test Plan

python -m pytest tests/test_config.py::test_model_arch_config_loading

Besides, I run generate for llama3 and the outputs before/after this PR are the same.

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)


def main():
    # Create an LLM.
    llm = LLM(model="/data/local/models/oss/Llama-3.1-8B-Instruct")
    # Generate texts from the prompts.
    # The output is a list of RequestOutput objects
    # that contain the prompt, generated text, and other information.
    outputs = llm.generate(prompts, sampling_params)
    # Print the outputs.
    print("\nGenerated Outputs:\n" + "-" * 60)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt:    {prompt!r}")
        print(f"Output:    {generated_text!r}")
        print("-" * 60)

if __name__ == "__main__":
    main()

the outputs before/after this PR are the same.

------------------------------------------------------------
Prompt:    'Hello, my name is'
Output:    ' Emily and I am a 25-year-old freelance writer and editor. I have'
------------------------------------------------------------
Prompt:    'The president of the United States is'
Output:    ' the head of state and head of government of the United States. The president serves'
------------------------------------------------------------
Prompt:    'The capital of France is'
Output:    ' a city of romance, art, fashion, and cuisine. Paris is a must'
------------------------------------------------------------
Prompt:    'The future of AI is'
Output:    ' bright, but it also raises concerns about bias, accountability, and the impact on'
------------------------------------------------------------

Test Result

passed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
[] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
[] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-11-11T08:26:36Z

Documentation preview: https://vllm--28454.org.readthedocs.build/en/28454/

gemini-code-assist

Code Review

This pull request introduces a model_arch_config as a prototype to refactor model configuration normalization. This is a significant and positive change towards better code structure and maintainability. The changes involve creating new configuration classes and parsers, and migrating the Llama model to use this new configuration structure. The overall approach is solid. I've found one critical issue that would cause a runtime error, which I've detailed in a specific comment. Once that is addressed, this PR will be in a great shape.

vllm/config/model.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/config/model.py

hmellor · 2025-11-11T17:59:04Z

A more standardized PretrainedConfig will enable a thinner, simpler parser layer

PretrainedConfig cannot contain the union of all possible fields. It is a base class. We can't include all possible fields because not all models use them and having the present for every model would cause unnecesssary confusion.

A possible solution could be something like:

class AllConfig(PretrainedConfig):
    hidden_size: int
    num_hidden_layers: int
    num_attention_heads: int
    use_deepseek_mla: bool
    head_dim: int
    vocab_size: int
    num_key_value_heads: int
    num_experts: int

Which would contain the union of all possible config fields using their standard names. Would that be an acceptable solution?

zhuohan123

Thanks for the great work! Left some comments.

vllm/transformers_utils/model_arch_config_parser.py

zhuohan123 · 2025-11-11T19:44:26Z

vllm/config/model_arch.py

+    use_deepseek_mla: bool
+    head_dim: int
+    vocab_size: int
+    num_key_value_heads: int


A model can have mixed different attention layers. Should we make some of the fields here per-layer?

vllm/config/model.py

vllm/config/model_arch.py

zhuohan123 · 2025-11-11T19:54:11Z

class AllConfig(PretrainedConfig):
    hidden_size: int
    num_hidden_layers: int
    num_attention_heads: int
    use_deepseek_mla: bool
    head_dim: int
    vocab_size: int
    num_key_value_heads: int
    num_experts: int

Which would contain the union of all possible config fields using their standard names. Would that be an acceptable solution?

@hmellor The main issue of this inheritance approach is that we want an explicit error if anybody try to access a field that is not in the defined config list. So when people implements something in the engine, they won't accidentally access a field that only exists for some models.

hmellor · 2025-11-11T20:54:30Z

The main issue of this inheritance approach is that we want an explicit error if anybody try to access a field that is not in the defined config list.

The suggested AllConfig would error if a non-existent field was accessed. I'm not sure I understand your point.

So when people implements something in the engine, they won't accidentally access a field that only exists for some models

Something like the suggested AllConfig allows for this. The idea is that we would type/cast the config classes to AllConfig when they enter vLLM and then we would have the standard interface you want.

I appreciate that this doesn't resolve the inheritance issue though.

zhuohan123 · 2025-11-12T01:36:55Z

The suggested AllConfig would error if a non-existent field was accessed. I'm not sure I understand your point.

In your suggestion, Allconfigs inherits from PretrainedConfig, which includes many things that vLLM does not need:

We want to explicitly error out when we access those fields.

vllm/config/model_arch.py

vllm/model_executor/models/llama.py

vllm/transformers_utils/model_arch_config_parser.py

vllm/config/model_arch.py

hmellor · 2025-11-12T09:06:33Z

inherits from PretrainedConfig, which includes many things that vLLM does not need

Got it, thanks for explaining!

mergify · 2025-11-27T04:17:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

charlotte12l · 2025-11-27T05:19:56Z

Based on the previous discussion, charlotte12l#2 (merged into current branch) has make the following modifications:

model_arch_config only contains the minimal fields
model_arch_config parse from hf_config
model_executor still uses hf_config
normalize quantization_config
avoid functionalities duplications by fully utilizing model_arch_config in config/model.py (because of this, we still make architectures List[str])

WIP:

Still adding more testing for each convertors. My plan is for all the models, run before this PR and stored the results as files for model_config.get_hidden_size etc. ; then add tests to compare the stored files with model_arch_config
per layer attn cls for kv_cache_spec will be in next PR as we need to refactor slidingwindow attention out of attention. In next PR, is_deepseek_mla will be removed per discussion.

mergify · 2025-11-30T13:23:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-12-16T20:02:17Z

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

heheda12345 · 2025-12-18T16:49:24Z

FYI two new models with different head_size_k and head_size_v. #30836 #28775
should we consider this case in ModelArchitectureConfig?

heheda12345 · 2025-12-21T19:04:51Z

updated branch due to the v1-entrypoint timeout failure that is already fixed on main.

mergify · 2025-12-23T02:08:48Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

mergify · 2025-12-23T06:33:02Z

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…assmethod Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

mergify · 2025-12-23T23:34:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

mergify · 2025-12-24T00:07:19Z

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

mergify · 2026-01-02T19:53:34Z

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners November 11, 2025 08:25

mergify bot added documentation Improvements or additions to documentation llama Related to Llama models labels Nov 11, 2025

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

vllm/config/model.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 11, 2025

View reviewed changes

vllm/config/model.py Outdated Show resolved Hide resolved

vllm/config/model.py Outdated Show resolved Hide resolved

charlotte12l changed the title ~~[Feature] model_arch_config prototype~~ [Feature] model_arch_config prototype with llama3 Nov 11, 2025

charlotte12l changed the title ~~[Feature] model_arch_config prototype with llama3~~ [RFC] model_arch_config prototype with llama3 Nov 11, 2025

zhuohan123 reviewed Nov 11, 2025

View reviewed changes

heheda12345 reviewed Nov 12, 2025

View reviewed changes

mergify bot added the needs-rebase label Nov 27, 2025

mergify bot removed the needs-rebase label Nov 27, 2025

charlotte12l changed the title ~~[RFC] model_arch_config prototype with llama3~~ [Still Testing]Parse vLLM runtime required fields from hf_config to model_arch_config Nov 30, 2025

fix precommit

f5348c8

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Merge branch 'main' into model_arch_cfg

9b239b9

charlotte12l and others added 2 commits December 22, 2025 22:20

Merge branch 'main' into model_arch_cfg

f7344c9

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

sync with vllm-project#30957

78d4749

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l added 3 commits December 22, 2025 23:08

remove convertor dependency on model&revision, only torch_dtype is cl…

5bde69c

…assmethod Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

make get_model_arch_config not classmethod

1c3db56

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

fix tests

441d735

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l and others added 2 commits December 23, 2025 15:59

Merge branch 'main' into model_arch_cfg

a55be1f

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

sync with vllm-project#29788

e1b6bfa

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l and others added 5 commits December 23, 2025 16:11

precommit fix

0d143e4

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

remove mosaicml/mpt-7b in tests

58eb6c4

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

remove databricks/dbrx-instruct in tests

9d27b1a

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Merge branch 'main' into model_arch_cfg

e0a2198

Update test_model_arch_config.py

a8b20be

Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>

linter

4923278

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l mentioned this pull request Jan 3, 2026

[Core]ModelConfig use architecture rather than archiectures #31633

Open

2 tasks

This was referenced Jan 3, 2026

[CI] Skip Phi-MoE test due to old API util #31632

Merged

[CI Failure]: mi325_1: Language Models Test (Extended Pooling) #29466

Closed

wjunLu mentioned this pull request Jan 5, 2026

[Main2Main] Upgrade vllm commit to 0105 vllm-project/vllm-ascend#5595

Merged

charlotte12l mentioned this pull request Jan 12, 2026

[RFC]: Consolidate Multimodal Related Info #32218

Open

1 task

charlotte12l mentioned this pull request Jan 24, 2026

[Core]Consolidate RoPE-related parsing into ModelArchitectureConfig #32989

Open

5 tasks

hmellor mentioned this pull request Feb 24, 2026

Fix hf_override_fn when it modifies model_type #35200

Merged

	_CONFIG_REGISTRY: dict[str, type[PretrainedConfig]] = LazyConfigDict(
	chatglm="ChatGLMConfig",
	deepseek_vl_v2="DeepseekVLV2Config",
	deepseek_v32=DeepseekV3Config,
	flex_olmo="FlexOlmoConfig",
	kimi_linear="KimiLinearConfig",
	kimi_vl="KimiVLConfig",
	RefinedWeb="RWConfig", # For tiiuae/falcon-40b(-instruct)
	RefinedWebModel="RWConfig", # For tiiuae/falcon-7b(-instruct)
	jais="JAISConfig",
	mlp_speculator="MLPSpeculatorConfig",
	medusa="MedusaConfig",
	midashenglm="MiDashengLMConfig",
	eagle="EAGLEConfig",
	speculators="SpeculatorsConfig",
	nemotron="NemotronConfig",
	olmo3="Olmo3Config",
	ovis="OvisConfig",
	ultravox="UltravoxConfig",
	step3_vl="Step3VLConfig",
	step3_text="Step3TextConfig",
	qwen3_next="Qwen3NextConfig",
	lfm2_moe="Lfm2MoeConfig",
	)

Uh oh!

Conversation

charlotte12l commented Nov 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Design

TODOs

Test Plan

Test Result

passed

Uh oh!

mergify bot commented Nov 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hmellor commented Nov 11, 2025

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhuohan123 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhuohan123 commented Nov 11, 2025

Uh oh!

hmellor commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuohan123 commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmellor commented Nov 12, 2025

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

charlotte12l commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Nov 30, 2025

Uh oh!

mergify bot commented Dec 16, 2025

Uh oh!

heheda12345 commented Dec 18, 2025

Uh oh!

heheda12345 commented Dec 21, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

mergify bot commented Dec 24, 2025

Uh oh!

mergify bot commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

charlotte12l commented Nov 11, 2025 •

edited by github-actions bot

Loading

hmellor commented Nov 11, 2025 •

edited

Loading

charlotte12l commented Nov 27, 2025 •

edited

Loading