Skip to content

[CI] Skip Phi-MoE test due to old API util#31632

Merged
noooop merged 12 commits intovllm-project:mainfrom
ROCm:akaratza_lang_ext_gen
Jan 5, 2026
Merged

[CI] Skip Phi-MoE test due to old API util#31632
noooop merged 12 commits intovllm-project:mainfrom
ROCm:akaratza_lang_ext_gen

Conversation

@AndreasKaratzas
Copy link
Collaborator

@AndreasKaratzas AndreasKaratzas commented Jan 2, 2026

This PR addresses test failures in the Language Models Test (Extended Generation) test group on ROCm by:

  1. Skipping test_phimoe.py due to a known upstream issue
  2. Documenting a dependency on an upstream mamba-ssm fix for ROCm 7.0+
  3. Fixing NomicBert max_model_len validation regression from [Core] Parse vLLM engine required fields from hf_config to model_arch_config #28454

Changes

1. Skip PhiMoE Tests

Skipped all tests in tests/models/language/generation/test_phimoe.py due to a known issue where AttributeError: 'DynamicCache' object has no attribute 'seen_tokens' is raised.

2. Mamba Installation on ROCm 7.0+ (Awaiting Upstream Fix)

Mamba-ssm currently fails to build and run correctly on ROCm 7.0+. We are awaiting the merge of an upstream fix:

Then we are also going to update the version of the package.

3. Fix NomicBert max_model_len Validation

After #28454, cached derived_max_model_len_and_key wasn't updated when NomicBertModelConfig restricted max_position_embeddings, causing validation to use stale values. Affects both ROCm and CUDA.

  • Test: pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a CI failure by skipping a failing test for Phi-MoE models. The change to add -rs to the pytest command is a good improvement for visibility into skipped tests. However, the implementation for skipping the test is too broad. It disables all tests in the test_phimoe.py file, while only the test_models function seems to be affected by the upstream issue. To maintain test coverage, the skip should be applied more narrowly to only the failing test.

Comment on lines +11 to +24
# There is a known issue that triggers `AttributeError: 'DynamicCache'
# object has no attribute 'seen_tokens'` when running:
# `tests/models/language/generation/test_phimoe.py::test_models
# [5-64-bfloat16-microsoft/Phi-3.5-MoE-instruct]`
# This issue is being investigated and tracked in:
# https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/58
# It is platform-agnostic. Therefore, we skip this test on all platforms for now.
pytest.skip(
"Skipping due to known issue: "
"'DynamicCache' object has no attribute 'seen_tokens'. See: "
"https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/58 "
"for details.",
allow_module_level=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This module-level pytest.skip disables all tests in this file, including test_phimoe_routing_function. Based on the issue description and the comment, only the test_models function is affected by the 'DynamicCache' object has no attribute 'seen_tokens' error. The test_phimoe_routing_function appears to be a simple unit test that does not involve model generation and is likely still passing.

To avoid unnecessarily reducing test coverage, it would be better to apply the skip only to the failing test. This can be done by removing this module-level skip and adding a @pytest.mark.skip(reason=...) decorator directly to the test_models function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
- uv pip install --system --no-build-isolation 'git+https://github.com/state-spaces/mamba@v2.2.5'
- uv pip install --system --no-build-isolation 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.2'
- pytest -v -s models/language/generation -m '(not core_model) and (not hybrid_model)'
- pytest -v -rs models/language/generation -m '(not core_model) and (not hybrid_model)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this? I think you only used this for debugging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 done :)

@AndreasKaratzas
Copy link
Collaborator Author

I also modified the mamba package installation path until the PR gets merged so that the test turns green. Having green CI is critical for AMD and I will monitor my PR at mamba upstream and revert this change as soon as it gets merged, like I did with the xgrammar upstream.

@AndreasKaratzas
Copy link
Collaborator Author

@charlotte12l @hmellor @heheda12345

Could you please review my last commit?

After #28454, _get_and_verify_max_len uses a cached derived_max_model_len_and_key value from model_arch_config. However, NomicBertModelConfig.verify_and_update_config updates hf_text_config.max_position_embeddings to max_trained_positions (2048) without updating the cached value (which remains 8192 from the original HF config).

This causes max_model_len=4096 to incorrectly pass validation because 4096 < 8192, when it should fail because 4096 > 2048. With my last commit, I introduce an update of model_arch_config.derived_max_model_len_and_key when NomicBertModelConfig restricts the max position embeddings to max_trained_positions. To test, run:

pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal now passes. Before, it was failing on both ROCm and CUDA.

@AndreasKaratzas AndreasKaratzas changed the title [CI] Skip Phi-MoE test due to old API util [CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation Jan 3, 2026
@AndreasKaratzas AndreasKaratzas changed the title [CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation [Core][CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation Jan 3, 2026
@charlotte12l
Copy link
Contributor

charlotte12l commented Jan 3, 2026

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

MODEL_ARCH_CONFIG_CONVERTORS = {
, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.

Besides, I saw NomicBertModelConfig also updates below, we should also move those logics into the convertor. Other models in vllm/model_executor/models/config.py are fine.

        config.hidden_size = config.n_embd
        config.num_hidden_layers = config.n_layer

@AndreasKaratzas
Copy link
Collaborator Author

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

MODEL_ARCH_CONFIG_CONVERTORS = {

, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.

@charlotte12l Let me see what I can do. I might need help with that tbh 😅 But I'll ping you in that case.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Collaborator Author

@charlotte12l I think the last commit resolves what you are suggesting. I tested:
pytest -s -v tests/models/language/pooling_mteb_test/test_nomic.py and pytest -s -v models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal and they are still passing. But take yourself a look as well to see if my new modifications are legit.

@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

Could you please let me fix tests/models/language/pooling_mteb_test/test_nomic.py? PTAL #31662

@AndreasKaratzas
Copy link
Collaborator Author

Could you please let me fix tests/models/language/pooling_mteb_test/test_nomic.py? PTAL #31662

I checked your PR and I did not see my changes, so for now I think I'm gonna keep them here, but @charlotte12l can probably comment on that matter too.

@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

@AndreasKaratzas Thanks for the fix! Could you create a convertor for NomicBert models,

MODEL_ARCH_CONFIG_CONVERTORS = {

, and put the max_trained_positions logics into convertor? We hope to use the convertor to consolidate all configuration update/read logics.
Besides, I saw NomicBertModelConfig also updates below, we should also move those logics into the convertor. Other models in vllm/model_executor/models/config.py are fine.

        config.hidden_size = config.n_embd
        config.num_hidden_layers = config.n_layer

The odd logic for NomicBertModelConfig was added by me in #18755.
it’s so specific that I didn’t want this logic split across different locations (model_arch_config_convertor).
At the same time, I also want to remove VllmConfig.recalculate_max_model_len.


Let me and @charlotte12l discuss this logic further so that your other fixes can be merged quickly.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas
Copy link
Collaborator Author

@noooop I just removed the logic for Nordic. But please if you can have the PR that you are working on merged quickly because it is important for the AMD CI. We would like to have this test green. Let me know if I can help in any way :)

Copy link
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@noooop noooop changed the title [Core][CI] Skip Phi-MoE test due to old API util an d fix NomicBert max_model_len validation [Core][CI] Skip Phi-MoE test due to old API util Jan 4, 2026
@noooop noooop enabled auto-merge (squash) January 4, 2026 04:07
@charlotte12l
Copy link
Contributor

charlotte12l commented Jan 4, 2026

@noooop I'm okay with keeping those logics in vllm/model_executor/models/config.py for now, but in such case, we still need to update model_arch_config.hidden_size and model_arch_config.num_hidden_layers .

I agree to avoid splitting across different locations. I propose to consolidate those into convertor in the future if you are okay with it.

@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

@noooop I'm okay with keeping those logics in vllm/model_executor/models/config.py for now, but in such case, we still need to update model_arch_config.hidden_size and model_arch_config.num_hidden_layers .

I agree to avoid splitting across different locations. I propose to consolidate those into convertor in the future if you are okay with it.

@noooop noooop closed this Jan 4, 2026
auto-merge was automatically disabled January 4, 2026 04:12

Pull request was closed

@noooop noooop reopened this Jan 4, 2026
@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

Sorry, I click close by mistake.

@noooop noooop enabled auto-merge (squash) January 4, 2026 04:13
@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

@AndreasKaratzas

I'm very sorry. Please submit an empty commit or anything to restart the Read the Docs build.

@AndreasKaratzas
Copy link
Collaborator Author

Give me some time for that cause I just switched off my PC 😅

@AndreasKaratzas

I'm very sorry. Please submit an empty commit or anything to restart the Read the Docs build.

@noooop
Copy link
Collaborator

noooop commented Jan 4, 2026

Wait a second, it looks like the CI has already recovered automatically. You don't need to do anything.

@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2026
@noooop noooop changed the title [Core][CI] Skip Phi-MoE test due to old API util [CI] Skip Phi-MoE test due to old API util Jan 4, 2026
@noooop noooop disabled auto-merge January 4, 2026 05:16
@noooop noooop enabled auto-merge (squash) January 4, 2026 05:16
auto-merge was automatically disabled January 4, 2026 18:27

Head branch was pushed to by a user without write access

@noooop noooop merged commit 89f1f25 into vllm-project:main Jan 5, 2026
20 checks passed
@AndreasKaratzas AndreasKaratzas deleted the akaratza_lang_ext_gen branch January 5, 2026 16:01
LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants