[CI] Skip Phi-MoE test due to old API util by AndreasKaratzas · Pull Request #31632 · vllm-project/vllm

AndreasKaratzas · 2026-01-02T23:36:16Z

This PR addresses test failures in the Language Models Test (Extended Generation) test group on ROCm by:

Skipping test_phimoe.py due to a known upstream issue
Documenting a dependency on an upstream mamba-ssm fix for ROCm 7.0+
Fixing NomicBert max_model_len validation regression from [Core] Parse vLLM engine required fields from hf_config to model_arch_config #28454

Changes

1. Skip PhiMoE Tests

Skipped all tests in tests/models/language/generation/test_phimoe.py due to a known issue where AttributeError: 'DynamicCache' object has no attribute 'seen_tokens' is raised.

Upstream issue: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/58
Affected test: test_phimoe.py::test_models[5-64-bfloat16-microsoft/Phi-3.5-MoE-instruct]
This issue is platform-agnostic

2. Mamba Installation on ROCm 7.0+ (Awaiting Upstream Fix)

Mamba-ssm currently fails to build and run correctly on ROCm 7.0+. We are awaiting the merge of an upstream fix:

Upstream PR: Fix ROCm 7.0+ compatibility: constexpr WARP_THREADS and lane_id mask for 64-wide wavefronts state-spaces/mamba#831

Then we are also going to update the version of the package.

3. Fix NomicBert max_model_len Validation

After #28454, cached derived_max_model_len_and_key wasn't updated when NomicBertModelConfig restricted max_position_embeddings, causing validation to use stale values. Affects both ROCm and CUDA.

Test: pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request correctly addresses a CI failure by skipping a failing test for Phi-MoE models. The change to add -rs to the pytest command is a good improvement for visibility into skipped tests. However, the implementation for skipping the test is too broad. It disables all tests in the test_phimoe.py file, while only the test_models function seems to be affected by the upstream issue. To maintain test coverage, the skip should be applied more narrowly to only the failing test.

gemini-code-assist · 2026-01-02T23:37:17Z

tests/models/language/generation/test_phimoe.py

+# There is a known issue that triggers `AttributeError: 'DynamicCache'
+# object has no attribute 'seen_tokens'` when running:
+# `tests/models/language/generation/test_phimoe.py::test_models
+#   [5-64-bfloat16-microsoft/Phi-3.5-MoE-instruct]`
+# This issue is being investigated and tracked in:
+#   https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/58
+# It is platform-agnostic. Therefore, we skip this test on all platforms for now.
+pytest.skip(
+    "Skipping due to known issue: "
+    "'DynamicCache' object has no attribute 'seen_tokens'. See: "
+    "https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/58 "
+    "for details.",
+    allow_module_level=True,
+)


This module-level pytest.skip disables all tests in this file, including test_phimoe_routing_function. Based on the issue description and the comment, only the test_models function is affected by the 'DynamicCache' object has no attribute 'seen_tokens' error. The test_phimoe_routing_function appears to be a simple unit test that does not involve model generation and is likely still passing.

To avoid unnecessarily reducing test coverage, it would be better to apply the skip only to the failing test. This can be done by removing this module-level skip and adding a @pytest.mark.skip(reason=...) decorator directly to the test_models function.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

DarkLight1337 · 2026-01-03T06:26:32Z

.buildkite/test-amd.yaml

    - uv pip install --system --no-build-isolation 'git+https://github.com/state-spaces/mamba@v2.2.5'
    - uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.2'
-    - pytest -v -s models/language/generation -m '(not core_model) and (not hybrid_model)'
+    - pytest -v -rs models/language/generation -m '(not core_model) and (not hybrid_model)'


Revert this? I think you only used this for debugging

@DarkLight1337 done :)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-03T19:11:40Z

I also modified the mamba package installation path until the PR gets merged so that the test turns green. Having green CI is critical for AMD and I will monitor my PR at mamba upstream and revert this change as soon as it gets merged, like I did with the xgrammar upstream.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-03T22:52:58Z

@charlotte12l @hmellor @heheda12345

Could you please review my last commit?

After #28454, _get_and_verify_max_len uses a cached derived_max_model_len_and_key value from model_arch_config. However, NomicBertModelConfig.verify_and_update_config updates hf_text_config.max_position_embeddings to max_trained_positions (2048) without updating the cached value (which remains 8192 from the original HF config).

This causes max_model_len=4096 to incorrectly pass validation because 4096 < 8192, when it should fail because 4096 > 2048. With my last commit, I introduce an update of model_arch_config.derived_max_model_len_and_key when NomicBertModelConfig restricts the max position embeddings to max_trained_positions. To test, run:

pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal now passes. Before, it was failing on both ROCm and CUDA.

charlotte12l · 2026-01-03T23:45:36Z

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

vllm/vllm/transformers_utils/model_arch_config_convertor.py

Line 383 in 268b1c5

MODEL_ARCH_CONFIG_CONVERTORS = {

, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.

Besides, I saw NomicBertModelConfig also updates below, we should also move those logics into the convertor. Other models in vllm/model_executor/models/config.py are fine.

        config.hidden_size = config.n_embd
        config.num_hidden_layers = config.n_layer

AndreasKaratzas · 2026-01-03T23:55:54Z

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

vllm/vllm/transformers_utils/model_arch_config_convertor.py

Line 383 in 268b1c5

MODEL_ARCH_CONFIG_CONVERTORS = {

, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.

@charlotte12l Let me see what I can do. I might need help with that tbh 😅 But I'll ping you in that case.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-04T02:59:10Z

@charlotte12l I think the last commit resolves what you are suggesting. I tested:
pytest -s -v tests/models/language/pooling_mteb_test/test_nomic.py and pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal and they are still passing. But take yourself a look as well to see if my new modifications are legit.

noooop · 2026-01-04T03:46:22Z

Could you please let me fix tests/models/language/pooling_mteb_test/test_nomic.py? PTAL #31662

AndreasKaratzas · 2026-01-04T03:51:19Z

Could you please let me fix tests/models/language/pooling_mteb_test/test_nomic.py? PTAL #31662

I checked your PR and I did not see my changes, so for now I think I'm gonna keep them here, but @charlotte12l can probably comment on that matter too.

noooop · 2026-01-04T03:59:24Z

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

vllm/vllm/transformers_utils/model_arch_config_convertor.py

Line 383 in 268b1c5

MODEL_ARCH_CONFIG_CONVERTORS = {

, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.
Besides, I saw NomicBertModelConfig also updates below, we should also move those logics into the convertor. Other models in vllm/model_executor/models/config.py are fine.
        config.hidden_size = config.n_embd
        config.num_hidden_layers = config.n_layer

The odd logic for NomicBertModelConfig was added by me in #18755.
it’s so specific that I didn’t want this logic split across different locations (model_arch_config_convertor).
At the same time, I also want to remove VllmConfig.recalculate_max_model_len.

Let me and @charlotte12l discuss this logic further so that your other fixes can be merged quickly.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-04T04:06:34Z

@noooop I just removed the logic for Nordic. But please if you can have the PR that you are working on merged quickly because it is important for the AMD CI. We would like to have this test green. Let me know if I can help in any way :)

noooop

Thanks for the fix.

charlotte12l · 2026-01-04T04:07:32Z

@noooop I'm okay with keeping those logics in vllm/model_executor/models/config.py for now, but in such case, we still need to update model_arch_config.hidden_size and model_arch_config.num_hidden_layers .

I agree to avoid splitting across different locations. I propose to consolidate those into convertor in the future if you are okay with it.

noooop · 2026-01-04T04:12:41Z

@noooop I'm okay with keeping those logics in vllm/model_executor/models/config.py for now, but in such case, we still need to update model_arch_config.hidden_size and model_arch_config.num_hidden_layers .

I agree to avoid splitting across different locations. I propose to consolidate those into convertor in the future if you are okay with it.

noooop · 2026-01-04T04:13:30Z

Sorry, I click close by mistake.

noooop · 2026-01-04T04:17:04Z

@AndreasKaratzas

I'm very sorry. Please submit an empty commit or anything to restart the Read the Docs build.

AndreasKaratzas · 2026-01-04T04:19:05Z

Give me some time for that cause I just switched off my PC 😅

@AndreasKaratzas

I'm very sorry. Please submit an empty commit or anything to restart the Read the Docs build.

noooop · 2026-01-04T04:28:48Z

Wait a second, it looks like the CI has already recovered automatically. You don't need to do anything.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[CI] Skip Phi-MoE test due to old API util

9f8afc8

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from DarkLight1337 and ywang96 as code owners January 2, 2026 23:36

mergify bot added the ci/build label Jan 2, 2026

gemini-code-assist bot reviewed Jan 2, 2026

View reviewed changes

Containing skip to function

25a07ab

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

DarkLight1337 reviewed Jan 3, 2026

View reviewed changes

AndreasKaratzas added 3 commits January 3, 2026 19:02

Reverted verbose debugging behavior

fdd8bbc

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_lang_ext_gen

aebb9e2

Temporary branch change to include mamba fix until merge

b245a13

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 3 commits January 3, 2026 19:13

Deleted redundant mamba patched install

d2c04c8

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Fix NomicBert max_model_len validation after model_arch_config refactor

923af12

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_lang_ext_gen

18404a7

AndreasKaratzas changed the title ~~[CI] Skip Phi-MoE test due to old API util~~ [CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation Jan 3, 2026

AndreasKaratzas changed the title ~~[CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation~~ [Core][CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation Jan 3, 2026

This was referenced Jan 3, 2026

[CI Failure]: mi325_1: Language Models Test (Extended Pooling) #29466

Closed

[CI Failure]: mi325_1: Language Models Test (Extended Generation) #29459

Closed

Add NomicBertModelArchConfigConvertor for proper config handling

87afeab

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

noooop mentioned this pull request Jan 4, 2026

[CI Failure] Fix NomicBert max_model_len validation #31662

Merged

5 tasks

Removed fix for Nordic model

7646058

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

noooop approved these changes Jan 4, 2026

View reviewed changes

noooop changed the title ~~[Core][CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation~~ [Core][CI] Skip Phi-MoE test due to old API util Jan 4, 2026

noooop enabled auto-merge (squash) January 4, 2026 04:07

noooop closed this Jan 4, 2026

auto-merge was automatically disabled January 4, 2026 04:12
Pull request was closed

noooop reopened this Jan 4, 2026

noooop enabled auto-merge (squash) January 4, 2026 04:13

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2026

noooop changed the title ~~[Core][CI] Skip Phi-MoE test due to old API util~~ [CI] Skip Phi-MoE test due to old API util Jan 4, 2026

noooop disabled auto-merge January 4, 2026 05:16

noooop enabled auto-merge (squash) January 4, 2026 05:16

AndreasKaratzas added 2 commits January 4, 2026 18:18

Merge remote-tracking branch 'origin/main' into akaratza_lang_ext_gen

f347a55

Temporary branch change to include mamba fix until merge

a1d2a3b

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

auto-merge was automatically disabled January 4, 2026 18:27
Head branch was pushed to by a user without write access

AndreasKaratzas mentioned this pull request Jan 4, 2026

[CI Failure]: mi325_8: Language Models Tests (Hybrid) %N #29462

Closed

3 tasks

noooop merged commit 89f1f25 into vllm-project:main Jan 5, 2026
20 checks passed

AndreasKaratzas deleted the akaratza_lang_ext_gen branch January 5, 2026 16:01

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

[CI] Skip Phi-MoE test due to old API util (vllm-project#31632)

69b8f42

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[CI] Skip Phi-MoE test due to old API util (vllm-project#31632)

04266f8

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[CI] Skip Phi-MoE test due to old API util (vllm-project#31632)

11a069a

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[CI] Skip Phi-MoE test due to old API util (vllm-project#31632)

7abe20b

Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[CI] Skip Phi-MoE test due to old API util (vllm-project#31632)

27821dc

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Uh oh!

Conversation

AndreasKaratzas commented Jan 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

1. Skip PhiMoE Tests

2. Mamba Installation on ROCm 7.0+ (Awaiting Upstream Fix)

3. Fix NomicBert max_model_len Validation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Jan 3, 2026

Uh oh!

AndreasKaratzas commented Jan 3, 2026

Uh oh!

charlotte12l commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Jan 3, 2026

Uh oh!

AndreasKaratzas commented Jan 4, 2026

Uh oh!

noooop commented Jan 4, 2026

Uh oh!

AndreasKaratzas commented Jan 4, 2026

Uh oh!

noooop commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Jan 4, 2026

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

charlotte12l commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Jan 4, 2026

Uh oh!

noooop commented Jan 4, 2026

Uh oh!

noooop commented Jan 4, 2026

Uh oh!

AndreasKaratzas commented Jan 4, 2026

Uh oh!

noooop commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndreasKaratzas commented Jan 2, 2026 •

edited by github-actions bot

Loading

charlotte12l commented Jan 3, 2026 •

edited

Loading

noooop commented Jan 4, 2026 •

edited

Loading

charlotte12l commented Jan 4, 2026 •

edited

Loading