Skip to content

[Core] Parse vLLM engine required fields from hf_config to model_arch_config#28454

Merged
heheda12345 merged 34 commits intovllm-project:mainfrom
charlotte12l:model_arch_cfg
Jan 2, 2026
Merged

[Core] Parse vLLM engine required fields from hf_config to model_arch_config#28454
heheda12345 merged 34 commits intovllm-project:mainfrom
charlotte12l:model_arch_cfg

Conversation

@charlotte12l
Copy link
Contributor

@charlotte12l charlotte12l commented Nov 11, 2025

Purpose

See #24384 for more context.

Use llama3 as prototype

Design

  • model_arch_config explicitly specify all standardized fields required for vllm runtime

  • model arch parser read from config.json/params.json/etc. and perform the standardization process. The goal is to eventually remove most of the standardization logic from config/model.py once the migration to the new parser workflow is complete

  • For hf-model-arch-parser, if the model is not in _CONFIG_REGISTRY, we still call AutoConfig.from_pretrained. This allows us to leverage the normalization already implemented in HuggingFace’s PretrainedConfig. A more standardized PretrainedConfig will enable a thinner, simpler parser layer.

    _CONFIG_REGISTRY: dict[str, type[PretrainedConfig]] = LazyConfigDict(
    chatglm="ChatGLMConfig",
    deepseek_vl_v2="DeepseekVLV2Config",
    deepseek_v32=DeepseekV3Config,
    flex_olmo="FlexOlmoConfig",
    kimi_linear="KimiLinearConfig",
    kimi_vl="KimiVLConfig",
    RefinedWeb="RWConfig", # For tiiuae/falcon-40b(-instruct)
    RefinedWebModel="RWConfig", # For tiiuae/falcon-7b(-instruct)
    jais="JAISConfig",
    mlp_speculator="MLPSpeculatorConfig",
    medusa="MedusaConfig",
    midashenglm="MiDashengLMConfig",
    eagle="EAGLEConfig",
    speculators="SpeculatorsConfig",
    nemotron="NemotronConfig",
    olmo3="Olmo3Config",
    ovis="OvisConfig",
    ultravox="UltravoxConfig",
    step3_vl="Step3VLConfig",
    step3_text="Step3TextConfig",
    qwen3_next="Qwen3NextConfig",
    lfm2_moe="Lfm2MoeConfig",
    )

  • parser will inspect the model cls and get per_layer_attn_cls. This information is useful for the KV cache manager

TODOs

  • [Should probably implement&merge it before this PR] Refactor out the existing normalizations in config/model.py so it can be shared by model-arch-parser and hf_config during the migration
  • Some fields are not normalized yet: max_model_len
  • Mistral parser
  • Get_kv_cache spec use model_arch_config.per_layer_attn_cls, and tests
  • Have a clear separation between model_info and model_arch_config, per [RFC]: Decoupling vLLM Configuration from Hugging Face #24384 (comment)
  • Gradually migrate other models(also note that some models have configuration class re-implemented in vllm) and tests

Test Plan

python -m pytest tests/test_config.py::test_model_arch_config_loading

Besides, I run generate for llama3 and the outputs before/after this PR are the same.

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0)


def main():
    # Create an LLM.
    llm = LLM(model="/data/local/models/oss/Llama-3.1-8B-Instruct")
    # Generate texts from the prompts.
    # The output is a list of RequestOutput objects
    # that contain the prompt, generated text, and other information.
    outputs = llm.generate(prompts, sampling_params)
    # Print the outputs.
    print("\nGenerated Outputs:\n" + "-" * 60)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt:    {prompt!r}")
        print(f"Output:    {generated_text!r}")
        print("-" * 60)

if __name__ == "__main__":
    main()

the outputs before/after this PR are the same.

------------------------------------------------------------
Prompt:    'Hello, my name is'
Output:    ' Emily and I am a 25-year-old freelance writer and editor. I have'
------------------------------------------------------------
Prompt:    'The president of the United States is'
Output:    ' the head of state and head of government of the United States. The president serves'
------------------------------------------------------------
Prompt:    'The capital of France is'
Output:    ' a city of romance, art, fashion, and cuisine. Paris is a must'
------------------------------------------------------------
Prompt:    'The future of AI is'
Output:    ' bright, but it also raises concerns about bias, accountability, and the impact on'
------------------------------------------------------------

Test Result

passed

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • [] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • [] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Nov 11, 2025

Documentation preview: https://vllm--28454.org.readthedocs.build/en/28454/

@mergify mergify bot added documentation Improvements or additions to documentation llama Related to Llama models labels Nov 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a model_arch_config as a prototype to refactor model configuration normalization. This is a significant and positive change towards better code structure and maintainability. The changes involve creating new configuration classes and parsers, and migrating the Llama model to use this new configuration structure. The overall approach is solid. I've found one critical issue that would cause a runtime error, which I've detailed in a specific comment. Once that is addressed, this PR will be in a great shape.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@charlotte12l charlotte12l changed the title [Feature] model_arch_config prototype [Feature] model_arch_config prototype with llama3 Nov 11, 2025
@charlotte12l charlotte12l changed the title [Feature] model_arch_config prototype with llama3 [RFC] model_arch_config prototype with llama3 Nov 11, 2025
@hmellor
Copy link
Member

hmellor commented Nov 11, 2025

A more standardized PretrainedConfig will enable a thinner, simpler parser layer

PretrainedConfig cannot contain the union of all possible fields. It is a base class. We can't include all possible fields because not all models use them and having the present for every model would cause unnecesssary confusion.


A possible solution could be something like:

class AllConfig(PretrainedConfig):
    hidden_size: int
    num_hidden_layers: int
    num_attention_heads: int
    use_deepseek_mla: bool
    head_dim: int
    vocab_size: int
    num_key_value_heads: int
    num_experts: int

Which would contain the union of all possible config fields using their standard names. Would that be an acceptable solution?

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! Left some comments.

use_deepseek_mla: bool
head_dim: int
vocab_size: int
num_key_value_heads: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A model can have mixed different attention layers. Should we make some of the fields here per-layer?

@zhuohan123
Copy link
Member

class AllConfig(PretrainedConfig):
    hidden_size: int
    num_hidden_layers: int
    num_attention_heads: int
    use_deepseek_mla: bool
    head_dim: int
    vocab_size: int
    num_key_value_heads: int
    num_experts: int

Which would contain the union of all possible config fields using their standard names. Would that be an acceptable solution?

@hmellor The main issue of this inheritance approach is that we want an explicit error if anybody try to access a field that is not in the defined config list. So when people implements something in the engine, they won't accidentally access a field that only exists for some models.

@hmellor
Copy link
Member

hmellor commented Nov 11, 2025

The main issue of this inheritance approach is that we want an explicit error if anybody try to access a field that is not in the defined config list.

The suggested AllConfig would error if a non-existent field was accessed. I'm not sure I understand your point.

So when people implements something in the engine, they won't accidentally access a field that only exists for some models

Something like the suggested AllConfig allows for this. The idea is that we would type/cast the config classes to AllConfig when they enter vLLM and then we would have the standard interface you want.


I appreciate that this doesn't resolve the inheritance issue though.

@zhuohan123
Copy link
Member

The suggested AllConfig would error if a non-existent field was accessed. I'm not sure I understand your point.

In your suggestion, Allconfigs inherits from PretrainedConfig, which includes many things that vLLM does not need:
image

We want to explicitly error out when we access those fields.

@hmellor
Copy link
Member

hmellor commented Nov 12, 2025

inherits from PretrainedConfig, which includes many things that vLLM does not need

Got it, thanks for explaining!

@mergify
Copy link

mergify bot commented Nov 27, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 27, 2025
@charlotte12l
Copy link
Contributor Author

charlotte12l commented Nov 27, 2025

Based on the previous discussion, charlotte12l#2 (merged into current branch) has make the following modifications:

  • model_arch_config only contains the minimal fields
  • model_arch_config parse from hf_config
  • model_executor still uses hf_config
  • normalize quantization_config
  • avoid functionalities duplications by fully utilizing model_arch_config in config/model.py (because of this, we still make architectures List[str])

WIP:

  • Still adding more testing for each convertors. My plan is for all the models, run before this PR and stored the results as files for model_config.get_hidden_size etc. ; then add tests to compare the stored files with model_arch_config
  • per layer attn cls for kv_cache_spec will be in next PR as we need to refactor slidingwindow attention out of attention. In next PR, is_deepseek_mla will be removed per discussion.

@mergify mergify bot removed the needs-rebase label Nov 27, 2025
@charlotte12l charlotte12l changed the title [RFC] model_arch_config prototype with llama3 [Still Testing]Parse vLLM runtime required fields from hf_config to model_arch_config Nov 30, 2025
@mergify
Copy link

mergify bot commented Nov 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Dec 16, 2025

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
@heheda12345
Copy link
Collaborator

FYI two new models with different head_size_k and head_size_v. #30836 #28775
should we consider this case in ModelArchitectureConfig?

@heheda12345
Copy link
Collaborator

updated branch due to the v1-entrypoint timeout failure that is already fixed on main.

@mergify
Copy link

mergify bot commented Dec 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

charlotte12l and others added 2 commits December 22, 2025 22:20
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
@mergify
Copy link

mergify bot commented Dec 23, 2025

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…assmethod

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
@mergify
Copy link

mergify bot commented Dec 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

charlotte12l and others added 2 commits December 23, 2025 15:59
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
@mergify
Copy link

mergify bot commented Dec 24, 2025

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

charlotte12l and others added 5 commits December 23, 2025 16:11
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
@mergify
Copy link

mergify bot commented Jan 2, 2026

Hi @charlotte12l, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1

Projects

Status: Done
Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants