Add default 'auto' MODEL_IMPL_TYPE that resolves based on architecture by xingliu14 · Pull Request #1255 · vllm-project/tpu-inference

xingliu14 · 2025-12-05T20:30:36Z

Description

Add auto as default value for MODEL_IMPL_TYPE env var
For GptOssForCausalLM, auto resolves to vllm for better performance
For all other architectures, 'auto' resolves to flax_nnx for better performance
Add tests for 'auto' resolution behavior

Tests

pytest

tests/test_envs.py
tests/models/common/test_model_loader.py

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

xingliu14 · 2025-12-05T20:38:14Z

@kyuyeunk Please review.

kyuyeunk

Wouldn't it be possible move 'auto' to 'match/case' as well?

xingliu14 · 2025-12-09T01:05:12Z

It is possible to move it in to match-case, but in that case it will have duplicated codes, including: get_vllm_model, get_flax_model and the fall back check. I think resolve first then use the same code will be more clean.

tpu_inference/models/common/model_loader.py

kyuyeunk · 2025-12-11T04:55:18Z

I don't see any update? Can you verify if the changes were pushed?

kyuyeunk

LGTM. Thank you for working on it!

kyuyeunk · 2025-12-11T05:21:55Z

seems like CI is failing. Have you rebase the branch to HEAD?

xingliu14 · 2025-12-11T05:30:39Z

I believe so, let me look into it.

xingliu14 · 2025-12-11T05:41:58Z

Looks like tests/layers/vllm/test_awq.py is failing from the main branch as well.

kyuyeunk · 2025-12-11T05:46:44Z

seems like it's due to an upstream change. let me create a quick fix for this.

kyuyeunk · 2025-12-11T05:58:05Z

Please wait until this is merged: #1284

kyuyeunk · 2025-12-11T06:46:30Z

the PR has been merged. please update the branch and try again.

- Add 'auto' as default value for MODEL_IMPL_TYPE env var - For GptOssForCausalLM, 'auto' resolves to 'vllm' for better performance - For all other architectures, 'auto' resolves to 'flax_nnx' - Add _VLLM_REQUIRED_ARCHITECTURES frozenset in model_loader.py - Use match/case pattern in get_model() for implementation selection - Add tests for 'auto' resolution behavior Signed-off-by: Xing Liu <xingliu14@gmail.com>

Signed-off-by: Xing Liu <xingliu14@gmail.com>

tpu_inference/models/common/model_loader.py

kyuyeunk · 2025-12-11T08:09:43Z

https://github.com/vllm-project/tpu-inference/actions/runs/20125295806/job/57753569146?pr=1255

pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/tpu_inference/models/common/model_loader.py b/tpu_inference/models/common/model_loader.py
index f260e5f..fb035bd 100644
--- a/tpu_inference/models/common/model_loader.py
+++ b/tpu_inference/models/common/model_loader.py
@@ -27,7 +27,8 @@ _MODEL_REGISTRY = {}
 # Architectures that prefer "vllm" implementation type when MODEL_IMPL_TYPE is "auto".
 # These architectures are listed here because they have better performance with the
 # vLLM PyTorch backend compared to the flax_nnx JAX backend for now.
-_VLLM_PREFERRED_ARCHITECTURES: frozenset[str] = frozenset({"GptOssForCausalLM"})
+_VLLM_PREFERRED_ARCHITECTURES: frozenset[str] = frozenset(
+    {"GptOssForCausalLM"})
 
 
 class UnsupportedArchitectureError(ValueError):
Error: Process completed with exit code 1.

Please fix pre-commit failure.

Signed-off-by: Xing Liu <xingliu14@gmail.com>

kyuyeunk · 2025-12-11T09:00:24Z

Thank you so much for making this feature!

xingliu14 requested review from kyuyeunk, py4, sixiang-google, vipannalla and wenxindongwork as code owners December 5, 2025 20:30

xingliu14 force-pushed the env_var branch from f89de2b to adc343e Compare December 5, 2025 20:36

kyuyeunk reviewed Dec 7, 2025

View reviewed changes

xingliu14 force-pushed the env_var branch from adc343e to 80c1023 Compare December 9, 2025 01:03

kyuyeunk reviewed Dec 9, 2025

View reviewed changes

kyuyeunk added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025

xingliu14 force-pushed the env_var branch from 80c1023 to dc3706a Compare December 11, 2025 04:38

kyuyeunk approved these changes Dec 11, 2025

View reviewed changes

xingliu14 force-pushed the env_var branch from e67405b to 66f7865 Compare December 11, 2025 05:05

xingliu14 added 2 commits December 11, 2025 06:57

fix

eb6c2d1

Signed-off-by: Xing Liu <xingliu14@gmail.com>

xingliu14 force-pushed the env_var branch from 66f7865 to eb6c2d1 Compare December 11, 2025 07:00

kyuyeunk reviewed Dec 11, 2025

View reviewed changes

tpu_inference/models/common/model_loader.py Outdated Show resolved Hide resolved

fix

9724d4a

Signed-off-by: Xing Liu <xingliu14@gmail.com>

xingliu14 force-pushed the env_var branch from 494d9de to 9724d4a Compare December 11, 2025 08:17

kyuyeunk merged commit 9919cfb into vllm-project:main Dec 11, 2025
40 checks passed

xingliu14 deleted the env_var branch December 11, 2025 16:38

Conversation

xingliu14 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

xingliu14 commented Dec 5, 2025

Uh oh!

kyuyeunk left a comment

Choose a reason for hiding this comment

Uh oh!

xingliu14 commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

kyuyeunk left a comment

Choose a reason for hiding this comment

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

xingliu14 commented Dec 11, 2025

Uh oh!

xingliu14 commented Dec 11, 2025

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

Uh oh!

kyuyeunk commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xingliu14 commented Dec 5, 2025 •

edited

Loading