Skip to content

Remove incorrect tokenizer info test#33565

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:remove-bad-test
Feb 2, 2026
Merged

Remove incorrect tokenizer info test#33565
DarkLight1337 merged 1 commit intovllm-project:mainfrom
hmellor:remove-bad-test

Conversation

@hmellor
Copy link
Member

@hmellor hmellor commented Feb 2, 2026

#20575 added, what is now called, /tokenizer_info endpoint.

It added the following serialisation method that is used before the tokenizer info is sent:

def _make_json_serializable(self, obj):
"""Convert any non-JSON-serializable objects to serializable format."""
if hasattr(obj, "content"):
return obj.content
elif isinstance(obj, dict):
return {k: self._make_json_serializable(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [self._make_json_serializable(item) for item in obj]
else:
return obj

In this you can see that if an obj has the attribute content it will only return the content from that object. The values of the the added_tokens_decoder field are AddedToken objects, which do have the attribute content.

This means that, by design, the tokenizer info endpoint was only meant to return the string value of the token. Unfortunately, this is directly contradicted in the test that was added by the same PR

@pytest.mark.asyncio
async def test_tokenizer_info_added_tokens_structure(
server: RemoteOpenAIServer,
):
"""Test added_tokens_decoder structure if present."""
response = requests.get(server.url_for("tokenizer_info"))
response.raise_for_status()
result = response.json()
added_tokens = result.get("added_tokens_decoder")
if added_tokens:
for token_id, token_info in added_tokens.items():
assert isinstance(token_id, str), "Token IDs should be strings"
assert isinstance(token_info, dict), "Token info should be a dict"
assert "content" in token_info, "Token info should have content"
assert "special" in token_info, "Token info should have special flag"
assert isinstance(token_info["special"], bool), (
"Special flag should be boolean"
)

This was not noticed because the model used in the test does not have and added_tokens_decoder for Transformers v4.

This PR opts to simply remove the test so that:

  • The existing API (whether or not it was intended) remains unchanged
  • The test does not fail for Transformers v5 where added_tokens_decoder are present and the test fails

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@hmellor hmellor changed the title Remove incorrect test Remove incorrect tokenizer info test Feb 2, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes an incorrect test for the /tokenizer_info endpoint. The test was asserting an incorrect structure for added_tokens_decoder. While removing the test is a valid fix, I've suggested replacing it with a corrected test that asserts the actual API behavior. This maintains test coverage for this feature.

I am having trouble creating individual review comments. Click here to see my feedback.

tests/entrypoints/openai/test_tokenization.py (304-321)

high

You've correctly identified that this test is incorrect given the serialization logic. While removing it fixes the immediate issue of a failing test, it also removes test coverage for the added_tokens_decoder field.

A better approach would be to correct the test to reflect the actual behavior of the API. Based on your description, the token_info is serialized to just the string content. The test should be updated to assert this.

This ensures we still have a test for this part of the API response, and it correctly documents the expected behavior.

@pytest.mark.asyncio
async def test_tokenizer_info_added_tokens_structure(
    server: RemoteOpenAIServer,
):
    """Test added_tokens_decoder structure if present."""
    response = requests.get(server.url_for("tokenizer_info"))
    response.raise_for_status()
    result = response.json()
    added_tokens = result.get("added_tokens_decoder")
    if added_tokens:
        for token_id, token_info in added_tokens.items():
            assert isinstance(token_id, str), "Token IDs should be strings"
            assert isinstance(token_info, str), "Token info should be a string"

@hmellor
Copy link
Member Author

hmellor commented Feb 2, 2026

I'd ignore Gemini's suggestion. There is little value in testing that a JSON is a dict mapping str to str

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 2, 2026 14:58
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 2, 2026
@DarkLight1337 DarkLight1337 merged commit 6141ebe into vllm-project:main Feb 2, 2026
17 checks passed
@hmellor hmellor deleted the remove-bad-test branch February 3, 2026 07:41
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Pai <416932041@qq.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Pai <416932041@qq.com>
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants