Skip to content

Support DeepSeek-V3.1 tool call#9446

Merged
zhyncs merged 14 commits intosgl-project:mainfrom
Xu-Wenqing:add_deepseek_v31_chat_template
Aug 27, 2025
Merged

Support DeepSeek-V3.1 tool call#9446
zhyncs merged 14 commits intosgl-project:mainfrom
Xu-Wenqing:add_deepseek_v31_chat_template

Conversation

@Xu-Wenqing
Copy link
Copy Markdown
Contributor

@Xu-Wenqing Xu-Wenqing commented Aug 21, 2025

Motivation

Support tool call for DeepSeek-V3.1

The tool call format of DeepSeek-V3.1 is different from DeepSeek-V3/R1:
DeepSeek-V3.1: <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|><|tool▁calls▁end|>
DeepSeek-R1/V3: <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>FUNCTION_NAME\n'json\n{"param1": "value1", "param2": "value2"}\n<|tool▁call▁end|><|tool▁calls▁end|>

So we can't use --tool-call-parser deepseekv3 for this DeepSeek-V3.1, we need a new tool call parser "deepseekv31"

Modifications

  1. Add tool parser "deepseekv31"
  2. Add a new chat template

Accuracy Tests

Test Script (Streaming):

from openai import OpenAI

openai_api_base = ""
openai_api_key = ""

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base + "/v1",
)

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
            "strict": True
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]


tool_calls_stream = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[
        {
            "role": "system",
            "content": "现在的日期是: 2024-09-30",
        },
        {
            "role": "user",
            "content": "北京今天的天气如何?明天呢?",
        },
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
    # extra_body={"chat_template_kwargs": {"thinking": True}},
    max_completion_tokens=8192
)

print("reasoning content(Blue) and content(Green):")
chunks = []
for chunk in tool_calls_stream:
    chunks.append(chunk)
    if hasattr(chunk.choices[0].delta, "reasoning_content"):
        reasoning_content = chunk.choices[0].delta.reasoning_content
        if reasoning_content:
            print(bcolors.OKBLUE + reasoning_content, end="", flush=True)
    elif hasattr(chunk.choices[0].delta, "content"):
        content = chunk.choices[0].delta.content
        if content:
            print(bcolors.OKGREEN + content, end="", flush=True)

print(bcolors.ENDC + "\n### end of reasoning content and content. ###\n")

arguments = []
tool_call_idx = -1
for chunk in chunks:
    if chunk.choices[0].delta.tool_calls:
        tool_call = chunk.choices[0].delta.tool_calls[0]

        if tool_call.index != tool_call_idx:
            if tool_call_idx >= 0:
                print(f"streamed tool call arguments: {arguments[tool_call_idx]}")
            tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
            arguments.append("")
        if tool_call.id:
            print(f"streamed tool call id: {tool_call.id} ")

        if tool_call.function:
            if tool_call.function.name:
                print(f"streamed tool call name: {tool_call.function.name}")

            if tool_call.function.arguments:
                arguments[tool_call_idx] += tool_call.function.arguments

if len(arguments):
    print(f"streamed tool call arguments: {arguments[-1]}")

Test Result (Streaming):

reasoning content(Blue) and content(Green):

### end of reasoning content and content. ###

streamed tool call id: call_27ae2109a9134855885ce1e9 
streamed tool call name: get_current_temperature
streamed tool call arguments: {"location": "Beijing, China", "unit": "celsius"}

Test Script (Non-Streaming):

from openai import OpenAI

openai_api_base = ""
openai_api_key = ""

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base + "/v1",
)


tools = [
    {
        "type": "function",
        "function": {
            "strict": True,
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]


response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[
        {
            "role": "system",
            "content": "现在的日期是: 2024-09-30",
        },
        {
            "role": "user",
            "content": "北京今天的天气如何?明天呢?",
        },
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

print(response)
tool_calls = response.choices[0].message.tool_calls
for c in tool_calls:
    print(c.function.name, c.function.arguments)

Test Result (Non-Streaming):

ChatCompletion(id='44cbe02d71604830acf43b631e52ab29', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='我目前只能帮您查询温度信息,无法提供完整的天气情况(比如降水、风力、湿度等)。\n\n让我为您查询北京当前和明天的温度:', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_2b46341f9f0d48e5b64351c5', function=Function(arguments='{"location": "Beijing, China", "unit": "celsius"}', name='get_current_temperature'), type='function', index=None), ChatCompletionMessageToolCall(id='call_dc0769f36f05479bb9f59107', function=Function(arguments='{"location": "Beijing, China", "date": "2024-01-22", "unit": "celsius"}', name='get_temperature_date'), type='function', index=None)], reasoning_content=None), matched_stop=None)], created=1755842914, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=90, prompt_tokens=378, total_tokens=468, completion_tokens_details=None, prompt_tokens_details=None, reasoning_tokens=0), metadata={'weight_version': 'default'})
get_current_temperature {"location": "Beijing, China", "unit": "celsius"}
get_temperature_date {"location": "Beijing, China", "date": "2024-01-22", "unit": "celsius"}

Benchmarking and Profiling

Checklist

Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
@jimmy-evo
Copy link
Copy Markdown
Contributor

does it work?

@freedomkk-qfeng
Copy link
Copy Markdown

does it work?

not work, here's an example:

request

{
  "messages": [
    {
      "role": "user",
      "content": "Shanghai weather"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Useful when you want to query the weather of a specified city.",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city or district, e.g., Beijing, Hangzhou, Yuhang District."
            }
          }
        },
        "required": [
          "location"
        ]
      }
    }
  ],
  "stream": false,
  "model": "deepseek-r1"
}

response

{
  "id": "708ff5439c474fda8f73f2656005a18c",
  "object": "chat.completion",
  "created": 1755798766,
  "model": "deepseek-r1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'll check the weather in Shanghai for you.<|tool▁calls▁begin|><|tool▁call▁begin|>get_current_weather<|tool▁sep|>{\"location\": \"Shanghai\"}<|tool▁call▁end|><|tool▁calls▁end|>",
        "reasoning_content": null,
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "tool_calls",
      "matched_stop": null
    }
  ],
  "usage": {
    "prompt_tokens": 176,
    "total_tokens": 203,
    "completion_tokens": 27,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0
  },
  "metadata": {
    "weight_version": "default"
  }
}

@Xu-Wenqing Xu-Wenqing changed the title [WIP] Add DeepSeek-V3.1 tool call chat template [WIP] Support DeepSeek-V3.1 tool call Aug 22, 2025
@Xu-Wenqing
Copy link
Copy Markdown
Contributor Author

Xu-Wenqing commented Aug 22, 2025

@jinmingyi1998 @freedomkk-qfeng
The tool call format of DeepSeek-V3.1 is different from DeepSeek-V3/R1:
DeepSeek-V3.1: <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|><|tool▁calls▁end|>
DeepSeek-R1/V3: <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>FUNCTION_NAME\n'```json\n{"param1": "value1", "param2": "value2"}\n```<|tool▁call▁end|><|tool▁calls▁end|>
So we can't use --tool-call-parser deepseekv3 for this DeepSeek-V3.1.

I will write a new tool parser to support DeepSeek-V3.1 CC @CatherineSue @zhyncs @JustinTong0323

@Xu-Wenqing Xu-Wenqing changed the title [WIP] Support DeepSeek-V3.1 tool call Support DeepSeek-V3.1 tool call Aug 22, 2025
@Xu-Wenqing Xu-Wenqing marked this pull request as ready for review August 22, 2025 06:06
@Xu-Wenqing
Copy link
Copy Markdown
Contributor Author

@CatherineSue @JustinTong0323 @zhyncs This PR is ready to review. Added tests results in PR descriptions.

@freedomkk-qfeng
Copy link
Copy Markdown

@jinmingyi1998 @freedomkk-qfeng The tool call format of DeepSeek-V3.1 is different from DeepSeek-V3/R1: DeepSeek-V3.1: <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|><|tool▁calls▁end|> DeepSeek-R1/V3: <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>FUNCTION_NAME\n'```json\n{"param1": "value1", "param2": "value2"}\n```<|tool▁call▁end|><|tool▁calls▁end|> So we can't use --tool-call-parser deepseekv3 for this DeepSeek-V3.1.

I will write a new tool parser to support DeepSeek-V3.1 CC @CatherineSue @zhyncs @JustinTong0323

When thinking mode is disabled, the new parser works!
But when thinking mode is enabled, the parser currently doesn't work correctly

        "reasoning_content": "Hmm, the user is asking about the weather in Shanghai. This is a straightforward request that I can handle with the weather tool. \n\nI need to call the get_current_weather function with Shanghai as the location parameter. The function should return the current weather information for Shanghai.\n\nThe response should be simple and direct - just provide the weather data without unnecessary commentary since the user didn't ask for anything beyond the basic weather information.<|tool▁calls▁begin|><|tool▁call▁begin|>get_current_weather<|tool▁sep|>{\"location\": \"Shanghai\"}<|tool▁call▁end|><|tool▁calls▁end|>"

@Xu-Wenqing
Copy link
Copy Markdown
Contributor Author

Xu-Wenqing commented Aug 25, 2025

@jinmingyi1998 @freedomkk-qfeng The tool call format of DeepSeek-V3.1 is different from DeepSeek-V3/R1: DeepSeek-V3.1: <|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|><|tool▁calls▁end|> DeepSeek-R1/V3: <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>FUNCTION_NAME\n'```json\n{"param1": "value1", "param2": "value2"}\n```<|tool▁call▁end|><|tool▁calls▁end|> So we can't use --tool-call-parser deepseekv3 for this DeepSeek-V3.1.
I will write a new tool parser to support DeepSeek-V3.1 CC @CatherineSue @zhyncs @JustinTong0323

When thinking mode is disabled, the new parser works! But when thinking mode is enabled, the parser currently doesn't work correctly

        "reasoning_content": "Hmm, the user is asking about the weather in Shanghai. This is a straightforward request that I can handle with the weather tool. \n\nI need to call the get_current_weather function with Shanghai as the location parameter. The function should return the current weather information for Shanghai.\n\nThe response should be simple and direct - just provide the weather data without unnecessary commentary since the user didn't ask for anything beyond the basic weather information.<|tool▁calls▁begin|><|tool▁call▁begin|>get_current_weather<|tool▁sep|>{\"location\": \"Shanghai\"}<|tool▁call▁end|><|tool▁calls▁end|>"

@freedomkk-qfeng
https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1 the official doc said tool call only support in non-thinking mode:
image

@zhyncs zhyncs merged commit b9683be into sgl-project:main Aug 27, 2025
80 of 96 checks passed
@wwj-2017-1117
Copy link
Copy Markdown

wwj-2017-1117 commented Aug 27, 2025

--reasoning-parser deepseek-r1
--tool-call-parser deepseekv31 \

curl http://127.0.0.1:30000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "deepseekv31",
"messages": [
{"role": "user", "content": "What is the weather in San Francisco? Use the weather tool and return a valid JSON tool call."}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather by city",
"parameters": {
"type": "object",
"properties": {"city":{"type":"string"}},
"required": ["city"]
}
}
}
],
"tool_choice": "auto",
"temperature": 0
}'

"tool_calls" still is null

@Kevin-XiongC
Copy link
Copy Markdown
Contributor

--reasoning-parser deepseek-r1

Maybe use --reasoning-parser deepseek-v3?

@guodongrui-gdr
Copy link
Copy Markdown

guodongrui-gdr commented Aug 27, 2025

多轮对话调用工具时报错:
openai.InternalServerError: Error code: 500 - {'object': 'error', 'message': 'Internal server error: can only concatenate str (not "dict") to str', 'type': 'InternalServerError', 'param': None, 'code': 500}
测试代码如下:

import json

from openai import OpenAI

client = OpenAI(
    api_key="",
    base_url="http://127.0.0.1:30000/v1" 
)
messages = [
    {
        "role": "user",
        "content": "北京今天的天气如何?",
    },
]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]
response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=messages,
    tools=tools,
    stream=False,
    extra_body={"chat_template_kwargs": {"thinking": False}},
)

message = response.choices[0].message
messages.append(message)
if message.tool_calls:
    messages.append({
        'role': 'tool',
        'tool_call_id': message.tool_calls[0].id,
        'content': json.dumps({'response': "25°C"}),
    })

response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=messages,
    tools=tools,
    stream=False,
    extra_body={"chat_template_kwargs": {"thinking": False}},
)
print(response.choices[0].message)

@taohui
Copy link
Copy Markdown

taohui commented Aug 28, 2025

多轮对话调用工具时报错: openai.InternalServerError: Error code: 500 - {'object': 'error', 'message': 'Internal server error: can only concatenate str (not "dict") to str', 'type': 'InternalServerError', 'param': None, 'code': 500} 测试代码如下:

import json

from openai import OpenAI

client = OpenAI(
    api_key="",
    base_url="http://127.0.0.1:30000/v1" 
)
messages = [
    {
        "role": "user",
        "content": "北京今天的天气如何?",
    },
]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "Get current temperature at a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "Get temperature at a location and date.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": 'The location to get the temperature for, in the format "City, State, Country".',
                    },
                    "date": {
                        "type": "string",
                        "description": 'The date to get the temperature for, in the format "Year-Month-Day".',
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": 'The unit to return the temperature in. Defaults to "celsius".',
                    },
                },
                "required": ["location", "date"],
            },
        },
    },
]
response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=messages,
    tools=tools,
    stream=False,
    extra_body={"chat_template_kwargs": {"thinking": False}},
)

message = response.choices[0].message
messages.append(message)
if message.tool_calls:
    messages.append({
        'role': 'tool',
        'tool_call_id': message.tool_calls[0].id,
        'content': json.dumps({'response': "25°C"}),
    })

response = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=messages,
    tools=tools,
    stream=False,
    extra_body={"chat_template_kwargs": {"thinking": False}},
)
print(response.choices[0].message)

I also encountered the same problem.

@dhh123
Copy link
Copy Markdown

dhh123 commented Aug 31, 2025

ExecStart=/apps/miniforge3/envs/sglangpro/bin/python3.13 -m sglang.launch_server
--model-path /llms/DeepSeek-V3.1/
--tool-call-parser deepseekv31
--tokenizer-mode auto
--trust-remote-code
--mem-fraction-static 0.7
--enable-torch-compile
--chunked-prefill-size 4096
--max-running-requests 16
--host 0.0.0.0
--port 40000
--allow-auto-truncate
--tp-size 8
--reasoning-parser deepseek-v3
--chat-template /llms/DeepSeek-V3.1/assets/chat_template.jinja

run deepseek v3.1 with this command . function call not work with openai sdk. in non-thinking @Xu-Wenqing

and tool_chat_template_deepseekv31.jinja this with jinja2 error

this is the wireshark packet
{"chat_template_kwargs":{"thinking":false},"messages":[{"content":" ...........................","role":"user"}],"model":"deepseek-v3.1","tool_choice":"auto","tools":[{"function":{"description":"Get weather of a location, the user should supply a location first.","name":"get_weather","parameters":{"properties":{"location":{"description":"The city and state, e.g. San Francisco, CA","type":"string"}},"required":["location"],"type":"object"}},"type":"function"}]}

T 172.16.8.50:40000 -> 172.16.8.199:50494 [AP] #6
HTTP/1.1 200 OK.
date: Sun, 31 Aug 2025 17:18:01 GMT.
server: uvicorn.
content-length: 1780.
content-type: application/json.
.
{"id":"5f41f79aa53b45cc8a4ea8e78074aecd","object":"chat.completion","created":1756660685,"model":"deepseek-v3.1","choices":[{"index":0,"message":{"role":"assistant","content":"..................................................................AI.......................................................................................................................................................................................\n\n1. ..........................................\n.........................................................\n* ..................App...................................................App........................................................................\n* ..........................................................................................\n * ...............https://weather.cma.cn/\n* ...........................................................................................................................................................................\n\n2. ........................\n....................................................................................................................................................................................................................................\n\n.................................................................................................................................................................................","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"stop","matched_stop":1}],"usage":{"prompt_tokens":11,"total_tokens":289,"completion_tokens":278,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}

@pezazc
Copy link
Copy Markdown

pezazc commented Sep 2, 2025

Is the multi-round dialogue tool call available now?

@JustinTong0323
Copy link
Copy Markdown
Collaborator

JustinTong0323 commented Sep 2, 2025

For those who encountered multi-turn FC issue, try this PR:

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.