Skip to content

[Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter#38847

Merged
ywang96 merged 1 commit intovllm-project:mainfrom
hospedales:fix/gemma4-tool-parser-init
Apr 2, 2026
Merged

[Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter#38847
ywang96 merged 1 commit intovllm-project:mainfrom
hospedales:fix/gemma4-tool-parser-init

Conversation

@hospedales
Copy link
Copy Markdown
Contributor

Purpose

Fix Gemma4ToolParser.__init__() to accept the tools parameter, matching the base ToolParser interface.

Without this fix, enabling tool calling with --tool-call-parser gemma4 results in:

400 Gemma4ToolParser.__init__() takes 2 positional arguments but 3 were given

The tools parameter was added to the base ToolParser class in #38029, but the Gemma4ToolParser introduced in #38826 was written against the old signature.

Fixes #38837

Test Plan

vllm serve nvidia/Gemma-4-31B-IT-NVFP4 \
    --enable-auto-tool-choice \
    --tool-call-parser gemma4

Send a chat completion request with tools specified — previously returned 400, now works.

Test Result

Tested on NVIDIA DGX Spark (GB10, SM 12.1) with nvidia/Gemma-4-31B-IT-NVFP4. Tool calls are processed correctly after the fix.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update.

Copilot AI review requested due to automatic review settings April 2, 2026 20:12
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Gemma4ToolParser class to support an optional tools parameter in its constructor, which is then passed to the base ToolParser class. It also adds the necessary Tool import. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a constructor signature mismatch in the Gemma 4 tool-call parser so it conforms to the shared ToolParser interface and can be instantiated with tool definitions when tool calling is enabled.

Changes:

  • Update Gemma4ToolParser.__init__ to accept an optional tools parameter.
  • Pass tools through to the base ToolParser constructor.
  • Import the Tool type for the new constructor annotation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hospedales hospedales changed the title [Bug Fix]: Fix Gemma4ToolParser.__init__() missing tools parameter [Bugfix]: Fix Gemma4ToolParser.__init__() missing tools parameter Apr 2, 2026
Copy link
Copy Markdown
Contributor

@sfeng33 sfeng33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please fix DCO.

The ToolParser base class now accepts (tokenizer, tools) from PR vllm-project#38029,
but Gemma4ToolParser only accepted (tokenizer). This caused a 400 error
when tool calls were attempted.

Signed-off-by: Michael Hospedales <hospedales@me.com>
@hospedales
Copy link
Copy Markdown
Contributor Author

Thanks! Please fix DCO.

DCO Fixed.

@hospedales hospedales force-pushed the fix/gemma4-tool-parser-init branch from fe5c554 to f06e02b Compare April 2, 2026 20:58
@sfeng33
Copy link
Copy Markdown
Contributor

sfeng33 commented Apr 2, 2026

@ywang96 @Isotr0py Can we merge this hot fix for gemma 4 please

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 2, 2026
@ywang96 ywang96 merged commit bb39382 into vllm-project:main Apr 2, 2026
7 of 8 checks passed
@ywang96 ywang96 added this to the v0.19.0 cherry picks milestone Apr 2, 2026
khluu pushed a commit that referenced this pull request Apr 2, 2026
…38847)

Signed-off-by: Michael Hospedales <hospedales@me.com>
(cherry picked from commit bb39382)
@Gregory-Pereira
Copy link
Copy Markdown
Contributor

Gregory-Pereira commented Apr 4, 2026

FYI not sure if this is related but I just my image off this commit trying to get gemma4 working and I got the following error:

(EngineCore pid=462) INFO 04-04 18:09:09 [core.py:283] init engine (profile, create kv cache, warmup model) took 33.24 seconds
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108] EngineCore failed to start.
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     super().__init__(
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/v1/engine/core.py", line 125, in __init__
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     self.structured_output_manager = StructuredOutputManager(vllm_config)
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/v1/structured_output/__init__.py", line 91, in __init__
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     self.reasoner = reasoner_cls(tokenizer=self.tokenizer)
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/reasoning/gemma4_reasoning_parser.py", line 47, in __init__
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     super().__init__(tokenizer, *args, **kwargs)
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]   File "/opt/vllm-source/vllm/reasoning/basic_parsers.py", line 57, in __init__
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108]     raise RuntimeError(
(EngineCore pid=462) ERROR 04-04 18:09:10 [core.py:1108] RuntimeError: Gemma4ReasoningParser reasoning parser could not locate think start/end tokens in the tokenizer!
(Worker_TP3 pid=664) WARNING 04-04 18:09:10 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP2 pid=663) WARNING 04-04 18:09:10 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP0 pid=661) WARNING 04-04 18:09:10 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP1 pid=662) WARNING 04-04 18:09:10 [multiproc_executor.py:871] WorkerProc was terminated
(EngineCore pid=462) ERROR 04-04 18:09:12 [multiproc_executor.py:277] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore pid=462) Process EngineCore:
(EngineCore pid=462) Traceback (most recent call last):
(EngineCore pid=462)   File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=462)     self.run()
(EngineCore pid=462)   File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=462)     self._target(*self._args, **self._kwargs)
(EngineCore pid=462)   File "/opt/vllm-source/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=462)     raise e
(EngineCore pid=462)   File "/opt/vllm-source/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=462)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=462)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462)   File "/opt/vllm-source/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=462)     return func(*args, **kwargs)
(EngineCore pid=462)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462)   File "/opt/vllm-source/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=462)     super().__init__(
(EngineCore pid=462)   File "/opt/vllm-source/vllm/v1/engine/core.py", line 125, in __init__
(EngineCore pid=462)     self.structured_output_manager = StructuredOutputManager(vllm_config)
(EngineCore pid=462)                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462)   File "/opt/vllm-source/vllm/v1/structured_output/__init__.py", line 91, in __init__
(EngineCore pid=462)     self.reasoner = reasoner_cls(tokenizer=self.tokenizer)
(EngineCore pid=462)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=462)   File "/opt/vllm-source/vllm/reasoning/gemma4_reasoning_parser.py", line 47, in __init__
(EngineCore pid=462)     super().__init__(tokenizer, *args, **kwargs)
(EngineCore pid=462)   File "/opt/vllm-source/vllm/reasoning/basic_parsers.py", line 57, in __init__
(EngineCore pid=462)     raise RuntimeError(
(EngineCore pid=462) RuntimeError: Gemma4ReasoningParser reasoning parser could not locate think start/end tokens in the tokenizer!
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=1)   File "<frozen runpy>", line 88, in _run_code
(APIServer pid=1)   File "/opt/vllm-source/vllm/entrypoints/openai/api_server.py", line 724, in <module>
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/opt/vllm/lib64/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib64/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/opt/vllm/lib64/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/opt/vllm-source/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/core_client.py", line 872, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/core_client.py", line 534, in __init__
(APIServer pid=1)     with launch_core_engines(
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib64/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/utils.py", line 1073, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/opt/vllm-source/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib64/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@ywang96
Copy link
Copy Markdown
Member

ywang96 commented Apr 4, 2026

@lucianommartins FYI

@lucianommartins
Copy link
Copy Markdown
Contributor

lucianommartins commented Apr 4, 2026

can you share the commands you ate using? ie. how are you starting the vllm server and how you are performing the inference attempt @Gregory-Pereira ?

@Gregory-Pereira
Copy link
Copy Markdown
Contributor

Container just fails to start:

cat sally-gemma4.yaml
apiVersion: v1
kind: Pod
metadata:
  name: llm-d-gemma-4
  namespace: greg
spec:
  containers:
  - args:
    - google/gemma-4-26B-A4B-it
    - --tensor-parallel-size
    - "4"
    - --distributed-executor-backend
    - mp
    - --max-model-len
    - "32768"
    - --gpu-memory-utilization
    - "0.90"
    - --enable-auto-tool-choice
    - --reasoning-parser
    - gemma4
    - --tool-call-parser
    - gemma4
    - --language-model-only
    - --host
    - 0.0.0.0
    - --port
    - "8000"
    env:
    - name: HF_TOKEN
      valueFrom:
        secretKeyRef:
          key: HF_TOKEN
          name: llm-d-hf-token
    - name: XDG_CACHE_HOME
      value: /.cache
    - name: HF_HOME
      value: /.cache/huggingface
    - name: TRITON_CACHE_DIR
      value: /.cache/triton
    - name: TORCHINDUCTOR_CACHE_DIR
      value: /.cache/torchinductor
    - name: HOME
      value: /.cache/home
    - name: OTEL_SERVICE_NAME
      value: gemma4-vllm-openai
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://otel-collector:4317
    - name: OTEL_EXPORTER_OTLP_PROTOCOL
      value: grpc
    - name: OTEL_TRACES_EXPORTER
      value: otlp
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: service.name=gemma4-vllm-openai,model.name=google/gemma-4-26B-A4B-it
    - name: VLLM_NO_USAGE_STATS
      value: "1"
    - name: DO_NOT_TRACK
      value: "1"
    image: ghcr.io/llm-d/llm-d-cuda-dev:pr-1082
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 6
      httpGet:
        path: /health
        port: http
        scheme: HTTP
      periodSeconds: 20
      successThreshold: 1
      timeoutSeconds: 5
    name: openai
    ports:
    - containerPort: 8000
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 6
      httpGet:
        path: /v1/models
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "16"
        memory: 64Gi
        nvidia.com/gpu: "4"
      requests:
        cpu: "16"
        memory: 64Gi
        nvidia.com/gpu: "4"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
    startupProbe:
      failureThreshold: 60
      httpGet:
        path: /v1/models
        port: http
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 5
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dev/shm
      name: shm
    - mountPath: /.cache
      name: cache
  volumes:
  - emptyDir:
      medium: Memory
      sizeLimit: 16Gi
    name: shm
  - emptyDir: {}
    name: cache

Im building this image from my own ci here: llm-d/llm-d#1082. I previously was pointing at your commit but am now picking the RSNorm changes on main after that. Another thing to note - previously I as un-aware that I need to use transformers v5.5.0+ so I have included that change since then. Will keep you posted as I keep rebuilding and what I find

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Gemma4ToolParser.__init__() missing tools parameter — 400 error on tool calls

6 participants