Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
167 commits
Select commit Hold shift + click to select a range
71a59ec
[Olmo 3] olmo3 tool parser and tests
pdasigi Sep 26, 2025
e046165
[CI/Build] fix doc build warning: Failed to get 'name: description' p…
yitingdc Sep 26, 2025
6410745
fix: revert cast to cpu in `MsgpackEncoder._encode_tensor` to avoid h…
qthequartermasterman Sep 26, 2025
8bafbfe
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds…
qthequartermasterman Sep 26, 2025
c6f880f
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300…
xaguilar-amd Sep 26, 2025
95ecee9
fix: print outputt offline_inference/base/chat.py example (#25744)
Iceber Sep 26, 2025
4c31bde
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and …
sighingnow Sep 26, 2025
e1b68e0
Remove cuda hard-code in compute_causal_conv1d_metadata (#25555)
wxsIcey Sep 26, 2025
6784078
[misc] refactor speculative config (#25657)
yyzxw Sep 26, 2025
9d1dbba
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk…
SageMoore Sep 26, 2025
d357598
Support LongCat-Flash-Chat tool call (#24083)
Xu-Wenqing Sep 26, 2025
b75c0a3
[Doc] Update Batch-level DP docs (#25757)
DarkLight1337 Sep 26, 2025
38712fe
[Model] Mamba2 varlen refactor (#21467)
cyang49 Sep 26, 2025
9380098
[CI] Fix test_shared_storage_connector_hashes (#25748)
chaunceyjiang Sep 26, 2025
b13e5c7
[Bugfix] Properly abort pooling request. (#25734)
noooop Sep 26, 2025
b1c49ca
[CI/Build] Split up Distributed Tests (#25572)
DarkLight1337 Sep 26, 2025
e3f4572
[CI/Build] Fix some V1 tests not being run (#25569)
DarkLight1337 Sep 26, 2025
7f76f26
[Quantization] Add field to skip unquantized modules for GPTQ config …
Isotr0py Sep 26, 2025
af123cf
[BugFix] Fix using `dbo_decode_token_threshold` always (and ignoring …
LucasWilkinson Sep 26, 2025
dcfeea4
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility i…
eicherseiji Sep 26, 2025
471c031
[Misc] fix unique_filepath (#25732)
ZJY0516 Sep 26, 2025
e2f1a4a
Eagle3 that supports the Minicpm3 model (#24243)
LDLINGLINGLING Sep 26, 2025
b2819f0
[Doc]: improve CPU(x86) build-wheel-from-source section (#25617)
brokedba Sep 26, 2025
aa5b385
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Conditi…
frankwang28 Sep 26, 2025
f74612d
[Docs] Add Toronto Meetup (#25773)
mgoin Sep 26, 2025
b05e292
[CI] Add E2E Blackwell Quantized MoE Test (#25723)
mgoin Sep 26, 2025
c3de86e
[V1] address post issues related to #20059 (part 1) (#23046)
fhl2000 Sep 26, 2025
d093b42
[CI] Fix FlashInfer AOT in release docker image (#25730)
mgoin Sep 26, 2025
55b306b
[spec decode] Consolidate speculative decode method name for MTP (#25…
zixi-qi Sep 26, 2025
e31766e
Reduce the Cuda Graph memory footprint when running with DBO (#25779)
SageMoore Sep 26, 2025
db06e13
Kernel-override Determinism [1/n] (#25603)
bwasti Sep 26, 2025
af2c6eb
[Bugfix] Optimize CpuGpuBuffer initialization (#25447)
namanlalitnyu Sep 27, 2025
f05b917
[Spec decode] automatically disable mm for text-only draft models (#2…
jmkuebler Sep 27, 2025
cbf0eca
[Core] Don't count preempted tokens in prefix cache hit rate (#25787)
zhuohan123 Sep 27, 2025
b018331
Add option to restrict media domains (#25783)
russellb Sep 27, 2025
3052fc5
Add flashinfer-build.sh and register precompiled cu128 wheel in Docke…
mgoin Sep 27, 2025
2d6c910
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement…
david6666666 Sep 27, 2025
be7a248
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)
yewentao256 Sep 27, 2025
0ebc207
[CI/Build] Consolidate model loader tests and requirements (#25765)
DarkLight1337 Sep 27, 2025
7f94748
[CI/Build] Add timing to Model Executor Test (#25799)
22quinn Sep 27, 2025
c11b2d8
[CI/Build] Reorganize root-level V1 tests (#25767)
DarkLight1337 Sep 27, 2025
fbb7895
[Misc] Fix codeowners override for v1 sample and attention (#25037)
22quinn Sep 27, 2025
1687706
[Misc] Update openai client example file for multimodal (#25795)
ywang96 Sep 27, 2025
1a97bb5
[Bugfix] Add missing `image_size` for phi4_multimodal (#25796)
Renovamen Sep 27, 2025
1031f21
[Bugfix] Merge MM embeddings by index instead of token IDs (#16229)
DarkLight1337 Sep 27, 2025
cdebe9e
Validate API tokens in constant time (#25781)
russellb Sep 27, 2025
04df893
Add filtering for chat template kwargs (#25794)
russellb Sep 27, 2025
d85bd46
Fix GPTQ model loading in Transformers backend (#25770)
hmellor Sep 27, 2025
2fcf6dc
[Bugfix] Fix triton import precommit failure (#25803)
tlrmchlsmth Sep 27, 2025
b106db1
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
tlrmchlsmth Sep 27, 2025
bc9b970
[docs] transcriptions API audio upload (#25446)
yyzxw Sep 27, 2025
c4b673c
[env] default nixl side port conflicts with kv-event zmq port (#25056)
panpan0000 Sep 27, 2025
66b2a31
[Core] Refactor self.model() to call a helper for subclassing. (#25084)
patrick-toulme Sep 27, 2025
eded855
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651)
ZJY0516 Sep 27, 2025
0488225
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#…
smarterclayton Sep 27, 2025
b81791e
[Core] GC Debug callback (#24829)
Jialin Sep 27, 2025
0c412f0
[Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
NickLucche Sep 27, 2025
150982e
[MM] Optimize memory profiling for scattered multimodal embeddings (#…
ywang96 Sep 28, 2025
a0ccd47
[Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
ywang96 Sep 28, 2025
b23faca
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurab…
Isotr0py Sep 28, 2025
453971a
Fix random dataset mismatched token length with config. (#24937)
weireweire Sep 28, 2025
58d2117
Update GLM-4.5 Doc transformers version (#25830)
zRzRzRzRzRzRzR Sep 28, 2025
a6a2b4d
[Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838)
JJJYmmm Sep 28, 2025
2512056
Remove redundant cudagraph dispatcher warning (#25841)
mgoin Sep 28, 2025
7b326f7
[Misc] fix tests failure by using current_platform (#25825)
kingsmad Sep 29, 2025
ccbfaab
[P/D] NIXL Updates (#25844)
robertgshaw2-redhat Sep 29, 2025
29312d6
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832)
tdoublep Sep 29, 2025
e37c80d
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847)
jikunshang Sep 29, 2025
9e74829
[Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851)
ywang96 Sep 29, 2025
b948798
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings me…
Isotr0py Sep 29, 2025
fd2fcf3
[Misc] Remove more `get_input_embeddings_v0` (#25857)
DarkLight1337 Sep 29, 2025
f44292b
update to latest deepgemm for dsv3.2 (#25871)
youkaichao Sep 29, 2025
e8a4abc
[Bugfix] Fix requirements paths in install instructions (#25827)
yingjun-mou Sep 29, 2025
864ba4d
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantize…
zhoukezi Sep 29, 2025
68c275d
[torch.compile] serialize cudagraph_mode as its enum name instead of …
ZJY0516 Sep 29, 2025
23ea746
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690)
chenxi-yang Sep 29, 2025
da931fc
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (…
rahul-tuli Sep 29, 2025
11f34dc
[CI/Build] Include Transformers backend test in nightly transformers …
Isotr0py Sep 29, 2025
9f91bc2
[Model] Remove MotifForCausalLM (#25866)
jeejeelee Sep 29, 2025
3f22bc9
[Bugfix] Use correct key "ignore" for config.json non-quantized layer…
leejnau Sep 29, 2025
9280e95
[BugFix][torch.compile] KV scale calculation issues with FP8 quantiza…
adabeyta Sep 29, 2025
2c2a771
[Doc] Add documentation for vLLM continuous benchmarking and profilin…
namanlalitnyu Sep 29, 2025
b67583d
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libn…
gshtras Sep 29, 2025
4142c77
[Kernel] Chunk-aligned mamba2 (#24683)
tdoublep Sep 29, 2025
7375eee
[Doc] Polish example for torchrun dp (#25899)
zhuohan123 Sep 29, 2025
fbcf37f
[NIXL] Increase default KV block eviction timeout on P (#25897)
NickLucche Sep 29, 2025
75ed63e
[V0 Deprecation] Remove `vllm.worker` and update according imports (#…
aarnphm Sep 29, 2025
5b84d22
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT…
qthequartermasterman Sep 30, 2025
a9b6fc9
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
yewentao256 Sep 30, 2025
dfb8e20
[Benchmark] Support benchmark throughput for external launcher DP (#2…
zhuohan123 Sep 30, 2025
4b1cf5e
Move`VllmConfig` from `config/__init__.py` to `config/vllm.py` (#25271)
hmellor Sep 30, 2025
9fe29b1
[BugFix] Fix DP/EP hang (#25906)
LucasWilkinson Sep 30, 2025
f7d6b13
[BugFix] Pass config_format via try_get_generation_config (#25912)
acisseJZhong Sep 30, 2025
e78363f
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorr…
zhoukezi Sep 30, 2025
687f42d
[Bugfix]: Clean up chunked prefill logging when using whisper (#25075)
simondanielsson Sep 30, 2025
886bd12
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
zyongye Sep 30, 2025
049c6bc
[Doc] Add Cambricon MLU support (#25942)
a120092009 Sep 30, 2025
af1dec3
Updated TRL integration docs (#25684)
sergiopaniego Sep 30, 2025
3a89e8c
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 (#25936)
CSWYF3634076 Sep 30, 2025
226073e
[Model] Move `vision_feature_select_strategy` into `resolve_visual_en…
DarkLight1337 Sep 30, 2025
1116b82
[perf] Use CPU tensor to reduce GPU->CPU sync (#25884)
lhtin Sep 30, 2025
beee5de
[NIXL] Add support for MLA caches with different latent dim (#25902)
NickLucche Sep 30, 2025
8291c88
[CI] Move applicable tests to CPU (#24080)
rzabarazesh Sep 30, 2025
2b16dad
[Fix] Improve CPU backend compatibility for RISC-V (#25816)
ihb2032 Sep 30, 2025
c13967c
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 a…
Josephasafg Sep 30, 2025
caaa5a6
Add Hugging Face Inference Endpoints guide to Deployment docs (#25886)
sergiopaniego Sep 30, 2025
8721e48
[Bugfix][Model] Fix inference for Hunyuan dense models (#25354)
Anionex Sep 30, 2025
50a6b46
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging (#2…
pavanimajety Sep 30, 2025
2ad110a
[Bugfix] Token type and position embeddings fail to be applied to `in…
DarkLight1337 Sep 30, 2025
9732cbd
[bugfix][deepseek] fix flashmla kernel selection (#25956)
youkaichao Sep 30, 2025
28793de
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute…
yewentao256 Sep 30, 2025
eedc4b5
[Doc] Improve MM Pooling model documentation (#25966)
DarkLight1337 Sep 30, 2025
d945c6c
[Docs] Add moe kernel features doc (#25297)
bnellnm Sep 30, 2025
dc5fa2e
OffloadingConnector: Fix GPU block tracking bug (#25856)
orozery Sep 30, 2025
6a0a842
[Llama4] [multimodal] Fix misplaced dtype cast of `cos_sin_cache` in …
cjackal Sep 30, 2025
c395498
[Bench] Add DeepSeekV32 to MoE benchmark (#25962)
jeejeelee Sep 30, 2025
98775a8
[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)
sdavidbd Sep 30, 2025
92e28e7
Add explicit pooling classes for the Transformers backend (#25322)
hmellor Sep 30, 2025
3b60bdc
[Docs] Remove API Reference from search index (#25949)
hmellor Sep 30, 2025
d51f36d
[gpt-oss] use vLLM instead of openai types for streaming (#25186)
qandrew Sep 30, 2025
645622b
[Misc] Make EP kernels install script support uv (#25785)
LucasWilkinson Sep 30, 2025
fad6c1a
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
luccafong Oct 1, 2025
5b6a701
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arc…
DrStone1971 Oct 1, 2025
d6862ab
[Log] Optimize Log for FP8MOE (#25709)
yewentao256 Oct 1, 2025
2d3e81d
Fix INT8 quantization error on Blackwell GPUs (SM100+) (#25935)
certainly-param Oct 1, 2025
35fd946
[MM] Add text-only mode for Qwen3-VL (#26000)
ywang96 Oct 1, 2025
3e0baef
[Bugfix] Fix `__syncwarp` on ROCM (#25996)
zhewenl Oct 1, 2025
bff7b71
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988)
LucasWilkinson Oct 1, 2025
3b00255
Update to Transformers `v4.56.2` (#24638)
hmellor Oct 1, 2025
31c973c
[Misc]allow disable pynccl (#25421)
luccafong Oct 1, 2025
20fecb7
[Doc] updating torch.compile doc link (#25989)
vnadathur Oct 1, 2025
f18b6ee
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-t…
wwl2755 Oct 1, 2025
b1f4d92
[Misc] Factor out common `_apply_feature_select_strategy` (#26003)
DarkLight1337 Oct 1, 2025
d4ac9ac
[CI] Only capture a single CUDA graph size in CI by default (#25951)
hmellor Oct 1, 2025
ac12bfe
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes <…
billishyahao Oct 1, 2025
065f4b6
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type …
natoscott Oct 1, 2025
8dad1a0
[Bugfix] Apply same sampling parameters for both `n=1` and `n>1` (#26…
kmaehashi Oct 1, 2025
4e65348
[NVIDIA] Blackwell Family (#24673)
johnnynunez Oct 1, 2025
83aa1f5
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_in…
hl475 Oct 1, 2025
4b9738d
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability (#26030)
mgoin Oct 1, 2025
30ea5a5
[BugFix][DP/EP] Fix CUTLASS MLA hang under load (#26026)
LucasWilkinson Oct 1, 2025
559ff41
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908)
hyoon1 Oct 1, 2025
752d3d5
[Bug] Fix Negative Cuda Memory Usage (#25683)
yewentao256 Oct 1, 2025
4095627
[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)
LucasWilkinson Oct 1, 2025
815431c
Support RL online quantization with torchao (#23014)
jerryzh168 Oct 1, 2025
1609496
[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029)
gshtras Oct 2, 2025
20e0d34
[Misc] Make handling of SamplingParams clearer in n>1 case (#26032)
njhill Oct 2, 2025
dac1ec3
Run:ai model streamer add GCS package support (#24909)
pwschuurman Oct 2, 2025
38bb882
Update base image to 22.04 (jammy) (#26065)
huydhn Oct 2, 2025
461797c
Change size of single CUDA graph for CI to 4 (#26089)
tdoublep Oct 2, 2025
6c56006
[FA/Chore] Bump vllm-flash-attention (#25537)
LucasWilkinson Oct 2, 2025
3c64522
[Model] Use `merge_by_field_config` for MM models (A-C) (#26073)
DarkLight1337 Oct 2, 2025
ae28032
[Model] Use `merge_by_field_config` for MM models (D-F) (#26076)
DarkLight1337 Oct 2, 2025
2614dde
[Platform][CI] Added OOT platform interface e2e test that running on …
leo-pony Oct 2, 2025
c48129c
[Qwen][ROCm] Flash Attention Rotary Embeddings (#24642)
vllmellm Oct 2, 2025
e7290aa
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests (#26040)
mgoin Oct 2, 2025
2e83ce3
[CI/Build] Replace `vllm.entrypoints.openai.api_server` entrypoint wi…
DarkLight1337 Oct 2, 2025
2047f0c
[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063)
LucasWilkinson Oct 2, 2025
4937686
[Small] Prevent bypassing media domain restriction via HTTP redirects…
huachenheli Oct 2, 2025
b7e4aac
[Deepseek v3.2] Support indexer prefill chunking (#25999)
heheda12345 Oct 2, 2025
6e67ce2
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 3…
ekagra-ranjan Oct 2, 2025
bccb2c0
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MT…
heheda12345 Oct 2, 2025
f36b3be
[Olmo 3] pre-commit fixes
pdasigi Oct 2, 2025
ad3fb42
[Olmo 3] safer xml tag removal in tool parser
pdasigi Oct 2, 2025
2d7cbf4
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25…
ElizaWszola Oct 2, 2025
1f88f47
Fix MTP with deepep_low_latency (#25904)
MatthewBonanni Oct 2, 2025
62efc2e
Merge branch 'main' into olmo3_parser
pdasigi Oct 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions tests/entrypoints/openai/tool_parsers/test_olmo3_tool_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

from unittest.mock import MagicMock, patch

import pytest

from tests.entrypoints.openai.tool_parsers.utils import (
run_tool_extraction, run_tool_extraction_streaming)
from vllm.entrypoints.openai.protocol import FunctionCall
from vllm.entrypoints.openai.tool_parsers import ToolParser, ToolParserManager

# https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/text_prompt_format.md#model-response-format-1
SIMPLE_FUNCTION_OUTPUT = "get_weather(city='San Francisco', metric='celsius')"
SIMPLE_FUNCTION_CALL = FunctionCall(
name="get_weather",
arguments='{"city": "San Francisco", "metric": "celsius"}',
)
MORE_TYPES_FUNCTION_OUTPUT = (
"register_user(name='John Doe', "
"age=37, "
"address={'city': 'San Francisco', 'state': 'CA'}, "
"role=None, "
"passed_test=True, "
"aliases=['John', 'Johnny'])")
MORE_TYPES_FUNCTION_OUTPUT_JSON_LITERALS = (
"register_user(name='John Doe', "
"age=37, "
"address={'city': 'San Francisco', 'state': 'CA'}, "
"role=null, "
"passed_test=true, "
"aliases=['John', 'Johnny'])")
MORE_TYPES_FUNCTION_CALL = FunctionCall(
name="register_user",
arguments='{"name": "John Doe", '
'"age": 37, '
'"address": {"city": "San Francisco", "state": "CA"}, '
'"role": null, '
'"passed_test": true, '
'"aliases": ["John", "Johnny"]}',
)
PARAMETERLESS_FUNCTION_OUTPUT = "get_weather()"
PARAMETERLESS_FUNCTION_CALL = FunctionCall(
name="get_weather",
arguments='{}',
)
EMPTY_DICT_FUNCTION_OUTPUT = "do_something_cool(additional_data={})"
EMPTY_DICT_FUNCTION_CALL = FunctionCall(
name="do_something_cool",
arguments='{"additional_data": {}}',
)
EMPTY_LIST_FUNCTION_OUTPUT = "do_something_cool(steps=[])"
EMPTY_LIST_FUNCTION_CALL = FunctionCall(
name="do_something_cool",
arguments='{"steps": []}',
)
ESCAPED_STRING_FUNCTION_OUTPUT = (
r"get_weather(city='Martha\'s Vineyard', metric='\"cool units\"')")
ESCAPED_STRING_FUNCTION_CALL = FunctionCall(
name="get_weather",
arguments='{"city": "Martha\'s Vineyard", "metric": "\\"cool units\\""}',
)


@pytest.mark.parametrize("streaming", [True, False])
def test_no_tool_call(streaming: bool):
mock_tokenizer = MagicMock()
tool_parser: ToolParser = ToolParserManager.get_tool_parser("olmo3")(
mock_tokenizer)
model_output = "How can I help you today?"

content, tool_calls = run_tool_extraction(tool_parser,
model_output,
streaming=streaming)

assert content == model_output
assert len(tool_calls) == 0


TEST_CASES = [
pytest.param(True,
f"<function_calls>{SIMPLE_FUNCTION_OUTPUT}</function_calls>",
[SIMPLE_FUNCTION_CALL],
id="simple_streaming"),
pytest.param(False,
f"<function_calls>{SIMPLE_FUNCTION_OUTPUT}</function_calls>",
[SIMPLE_FUNCTION_CALL],
id="simple_nonstreaming"),
pytest.param(
True,
f"<function_calls>{MORE_TYPES_FUNCTION_OUTPUT}</function_calls>",
[MORE_TYPES_FUNCTION_CALL],
id="more_types_streaming"),
pytest.param(
False,
f"<function_calls>{MORE_TYPES_FUNCTION_OUTPUT}</function_calls>",
[MORE_TYPES_FUNCTION_CALL],
id="more_types_nonstreaming"),
pytest.param(
True,
f"<function_calls>{MORE_TYPES_FUNCTION_OUTPUT_JSON_LITERALS}</function_calls>",
[MORE_TYPES_FUNCTION_CALL],
id="more_types_streaming_json_literals"),
pytest.param(
False,
f"<function_calls>{MORE_TYPES_FUNCTION_OUTPUT_JSON_LITERALS}</function_calls>",
[MORE_TYPES_FUNCTION_CALL],
id="more_types_nonstreaming_json_literals"),
pytest.param(
True,
f"<function_calls>{PARAMETERLESS_FUNCTION_OUTPUT}</function_calls>",
[PARAMETERLESS_FUNCTION_CALL],
id="parameterless_streaming"),
pytest.param(
False,
f"<function_calls>{PARAMETERLESS_FUNCTION_OUTPUT}</function_calls>",
[PARAMETERLESS_FUNCTION_CALL],
id="parameterless_nonstreaming"),
pytest.param(
True,
f"<function_calls>{EMPTY_DICT_FUNCTION_OUTPUT}</function_calls>",
[EMPTY_DICT_FUNCTION_CALL],
id="empty_dict_streaming"),
pytest.param(
False,
f"<function_calls>{EMPTY_DICT_FUNCTION_OUTPUT}</function_calls>",
[EMPTY_DICT_FUNCTION_CALL],
id="empty_dict_nonstreaming"),
pytest.param(
True,
f"<function_calls>{EMPTY_LIST_FUNCTION_OUTPUT}</function_calls>",
[EMPTY_LIST_FUNCTION_CALL],
id="empty_list_streaming"),
pytest.param(
False,
f"<function_calls>{EMPTY_LIST_FUNCTION_OUTPUT}</function_calls>",
[EMPTY_LIST_FUNCTION_CALL],
id="empty_list_nonstreaming"),
pytest.param(
True,
f"<function_calls>{ESCAPED_STRING_FUNCTION_OUTPUT}</function_calls>",
[ESCAPED_STRING_FUNCTION_CALL],
id="escaped_string_streaming"),
pytest.param(
False,
f"<function_calls>{ESCAPED_STRING_FUNCTION_OUTPUT}</function_calls>",
[ESCAPED_STRING_FUNCTION_CALL],
id="escaped_string_nonstreaming"),
pytest.param(
True,
f"<function_calls>{SIMPLE_FUNCTION_OUTPUT}\n{MORE_TYPES_FUNCTION_OUTPUT}</function_calls>",
[SIMPLE_FUNCTION_CALL, MORE_TYPES_FUNCTION_CALL],
id="parallel_calls_streaming"),
pytest.param(
False,
f"<function_calls>{SIMPLE_FUNCTION_OUTPUT}\n{MORE_TYPES_FUNCTION_OUTPUT}</function_calls>",
[SIMPLE_FUNCTION_CALL, MORE_TYPES_FUNCTION_CALL],
id="parallel_calls_nonstreaming"),
]


@pytest.mark.parametrize("streaming, model_output, expected_tool_calls",
TEST_CASES)
def test_tool_call(streaming: bool, model_output: str,
expected_tool_calls: list[FunctionCall]):
mock_tokenizer = MagicMock()
tool_parser: ToolParser = ToolParserManager.get_tool_parser("olmo3")(
mock_tokenizer)

content, tool_calls = run_tool_extraction(tool_parser,
model_output,
streaming=streaming)

assert content is None
assert len(tool_calls) == len(expected_tool_calls)
for actual, expected in zip(tool_calls, expected_tool_calls):
assert actual.type == "function"
assert actual.function == expected


def test_streaming_tool_call_with_large_steps():
mock_tokenizer = MagicMock()
tool_parser: ToolParser = ToolParserManager.get_tool_parser("olmo3")(
mock_tokenizer)
model_output_deltas = [
"<function_calls>get_weather(city='San",
" Francisco', metric='celsius')\n"
f"{PARAMETERLESS_FUNCTION_OUTPUT}\n"
f"{EMPTY_LIST_FUNCTION_OUTPUT}</function_calls>",
]

reconstructor = run_tool_extraction_streaming(
tool_parser, model_output_deltas, assert_one_tool_per_delta=False)

assert reconstructor.other_content == ""
assert len(reconstructor.tool_calls) == 3
assert reconstructor.tool_calls[0].function == SIMPLE_FUNCTION_CALL
assert reconstructor.tool_calls[1].function == PARAMETERLESS_FUNCTION_CALL
assert reconstructor.tool_calls[2].function == EMPTY_LIST_FUNCTION_CALL


@pytest.mark.parametrize("streaming", [False])
def test_regex_timeout_handling(streaming: bool):
"""test regex timeout is handled gracefully"""
mock_tokenizer = MagicMock()
tool_parser: ToolParser = ToolParserManager.get_tool_parser("olmo3")(
mock_tokenizer)

fake_problematic_input = "hello world[A(A=" + "\t)A(A=,\t" * 2

# create a mock regex that raises TimeoutError
mock_regex = MagicMock()
mock_regex.match.side_effect = TimeoutError("Regex timeout")

with patch.object(tool_parser, 'TOOL_CALL_REGEX', mock_regex):
content, tool_calls = run_tool_extraction(tool_parser,
fake_problematic_input,
streaming=streaming)

# should treat as regular text when regex times out
assert content == fake_problematic_input
assert len(tool_calls) == 0
mock_regex.match.assert_called_once()
2 changes: 2 additions & 0 deletions vllm/entrypoints/openai/tool_parsers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from .longcat_tool_parser import LongcatFlashToolParser
from .minimax_tool_parser import MinimaxToolParser
from .mistral_tool_parser import MistralToolParser
from .olmo3_tool_parser import Olmo3PythonicToolParser
from .openai_tool_parser import OpenAIToolParser
from .phi4mini_tool_parser import Phi4MiniJsonToolParser
from .pythonic_tool_parser import PythonicToolParser
Expand Down Expand Up @@ -52,4 +53,5 @@
"SeedOssToolParser",
"Step3ToolParser",
"OpenAIToolParser",
"Olmo3PythonicToolParser",
]
Loading
Loading