Skip to content

OpenAIChatCompletionClient.count_tokens undercounts due to missing Tool-schema fields and emits Not supported field warnings #6980

@seunggil1

Description

@seunggil1

What happened?

Describe the bug
OpenAIChatCompletionClient.count_tokens (from autogen_ext.models.openai) ignores several JSON Schema fields of Tools (e.g., anyOf, default, title), printing Not supported field ... warnings and producing consistent gaps between the pre-send estimate and usage.prompt_tokens.

  • In a multi-tool agent, we rely on pre-send token estimates to avoid context-window overruns
# The entire code will be described below in To Reproduce.
token_estimate = self.model_client.count_tokens(messages=model_messages, tools=[get_current_time_tool])
create_result = await self.model_client.create(
    messages=model_messages,
    tools=[get_current_time_tool],
    cancellation_token=ctx.cancellation_token,
)
token_usage = create_result.usage.prompt_tokens

if token_usage != token_estimate:
    print(f"Token usage mismatch: estimated {token_estimate}, actual {token_usage}")
  • Result
Not supported field anyOf
Not supported field default
Not supported field title
Token usage mismatch: estimated 87, actual 99
Token usage mismatch: estimated 151, actual 83

To Reproduce
Environment

pip list | grep autogen
autogen-core   0.7.4
autogen-ext    0.7.4
  • Model: gpt-4o
  • Scenario: Agent makes a first call with tool-calls enabled, executes the tool, then makes a second call including the tool result.

Steps to Reproduce (Minimal)

  1. package import
import asyncio
import json
from datetime import datetime
from typing import Dict, Optional, List

import pytz
from autogen_core import (
    MessageContext,
    message_handler,
    RoutedAgent, SingleThreadedAgentRuntime, AgentId
)
from autogen_core.models import SystemMessage, UserMessage, FunctionExecutionResult, FunctionExecutionResultMessage, \
    AssistantMessage
from autogen_core.tools import FunctionTool
from autogen_ext.models.openai import OpenAIChatCompletionClient
from dotenv import load_dotenv
from typing_extensions import Annotated

load_dotenv()
  1. Define a simple function, convert to autogen_core.tools.FunctionTool
def get_current_time(
        timezone: Annotated[Optional[str], "Asia/Seoul"] = None
) -> str:
    """Get the current date and time in the specified timezone."""

    if timezone is None:
        timezone = "Asia/Seoul"

    tz = pytz.timezone(timezone)
    current_time = datetime.now(tz)
    return current_time.strftime("%Y-%m-%d %H:%M:%S")


get_current_time_tool = FunctionTool(
    get_current_time, description=get_current_time.__doc__
)
  1. Define DatetimeAgent, Register FunctionTool
  • Agent using gpt-4o, with tool calls
  • The first call accepts the tool call, and if a tool_call request comes in, sends a second request with the tool_result
  • LLM counts the number of tokens before and after each call
  • For each model call (before sending), compute count_tokens, then compare with usage.prompt_tokens returned by the API
class DatetimeAgent(RoutedAgent):
    def __init__(self) -> None:
        super().__init__("DatetimeAgent")
        self.model_client = OpenAIChatCompletionClient(model="gpt-4o")

    @message_handler
    async def handle_user_message(
            self, message: UserMessage, ctx: MessageContext
    ) -> dict:
        model_messages = [
            SystemMessage(
                content="You are a helpful assistant.\nYou don't rely on your own knowledge, but utilize the tools provided to deliver accurate information."),
            message
        ]
        # First response generation (with tools)
        token_estimate = self.model_client.count_tokens(messages=model_messages, tools=[get_current_time_tool])
        create_result = await self.model_client.create(
            messages=model_messages,
            tools=[get_current_time_tool],
            cancellation_token=ctx.cancellation_token,
        )
        token_usage = create_result.usage.prompt_tokens

        if token_usage != token_estimate:
            print(f"Token usage mismatch: estimated {token_estimate}, actual {token_usage}")

        # Direct response
        if isinstance(create_result.content, str):
            return {"result": create_result.content}
        else:
            # Tool execution
            tool_results: List[FunctionExecutionResult] = []
            for tool_call in create_result.content:
                if tool_call.name == "get_current_time":
                    args: Dict = json.loads(tool_call.arguments)
                    timezone = args.get("timezone", "Asia/Seoul")
                    current_time = get_current_time(timezone)
                    tool_results.append(
                        FunctionExecutionResult(call_id=tool_call.id, content=current_time, is_error=False,
                                                name=tool_call.name))

            # Append model response and tool results to messages
            model_messages.append(AssistantMessage(content=create_result.content, source="assistant"))
            model_messages.append(FunctionExecutionResultMessage(content=tool_results))

            # Final response generation (no tools)
            token_estimate = self.model_client.count_tokens(messages=model_messages)
            create_result = await self.model_client.create(
                messages=model_messages,
                cancellation_token=ctx.cancellation_token,
            )
            token_usage = create_result.usage.prompt_tokens

            if token_usage != token_estimate:
                print(f"Token usage mismatch: estimated {token_estimate}, actual {token_usage}")

            return {"result": create_result.content}
  1. Run agent
async def main():
    runtime = SingleThreadedAgentRuntime()
    await DatetimeAgent.register(
        runtime=runtime,
        type="datetime_agent",
        factory=lambda: DatetimeAgent()
    )
    runtime.start()

    user_message = UserMessage(content="What is the current time in Seoul?", source="user")
    response = await runtime.send_message(user_message, AgentId("datetime_agent", "default"))
    print("Response:", response)


if __name__ == '__main__':
    asyncio.run(main())
  1. Observed warnings & mismatches:
  • Both the first (with tools) and the second call (including tool results) show persistent divergence between estimate and actual usage.
  • Repeated Not supported field ... warnings for Tool schema fields such as anyOf, default, title.
Not supported field anyOf
Not supported field default
Not supported field title
Token usage mismatch: estimated 87, actual 99
Token usage mismatch: estimated 151, actual 83

Expected behavior

  • Token counting should include commonly used JSON Schema fields in Tool definitions without warnings and should closely approximate actual usage.prompt_tokens.

  • Root-Cause Hypotheses

  1. FunctionExecutionResultMessage path

    • The second call (with tool results appended) tends to drift more, suggesting the tokenization path for FunctionExecutionResultMessage might be incomplete or simplified.
    • I need to do a little more checking, and if I see a more detailed cause, I'll report it in this issue.
  2. Omitted Tool-schema fields

  • count_tokens_openai currently only accounts for type, description, and enum, ignoring others and logging warnings.

    • unsupported fields are skipped and warned, leading to undercount.
  • In the example code above, the get_current_time function is converted to json as shown below inside the count_tokens_openai function, and then the calculation is performed, where fields like anyOf, default, and title appear, which are missing from the token calculation and cause a warning

  • get_current_time tool description

[ {
  "type" : "function",
  "function" : {
    "name" : "get_current_time",
    "description" : "Get the current date and time in the specified timezone.",
    "parameters" : {
      "type" : "object",
      "properties" : {
        "timezone" : {
          "anyOf" : [ {
            "type" : "string"
          }, {
            "type" : "null"
          } ],
          "default" : null,
          "description" : "Asia/Seoul",
          "title" : "Timezone"
        }
      },
      "required" : [ ],
      "additionalProperties" : false
    },
    "strict" : false
  }
} ]
  • The anyOf, default, and title fields are missing from the token calculation and raise a warning
    # Tool tokens.
    oai_tools = convert_tools(tools)
    for tool in oai_tools:
    function = tool["function"]
    tool_tokens = len(encoding.encode(function["name"]))
    if "description" in function:
    tool_tokens += len(encoding.encode(function["description"]))
    tool_tokens -= 2
    if "parameters" in function:
    parameters = function["parameters"]
    if "properties" in parameters:
    assert isinstance(parameters["properties"], dict)
    for propertiesKey in parameters["properties"]: # pyright: ignore
    assert isinstance(propertiesKey, str)
    tool_tokens += len(encoding.encode(propertiesKey))
    v = parameters["properties"][propertiesKey] # pyright: ignore
    for field in v: # pyright: ignore
    if field == "type":
    tool_tokens += 2
    tool_tokens += len(encoding.encode(v["type"])) # pyright: ignore
    elif field == "description":
    tool_tokens += 2
    tool_tokens += len(encoding.encode(v["description"])) # pyright: ignore
    elif field == "enum":
    tool_tokens -= 3
    for o in v["enum"]: # pyright: ignore
    tool_tokens += 3
    tool_tokens += len(encoding.encode(o)) # pyright: ignore
    else:
    trace_logger.warning(f"Not supported field {field}")

Additional context

  • I'd like to open an issue instead of a PR, in case there is something that is intended and not a bug.
  • My English is not very good, so there might be some awkwardness in using the translator. Thank you for your understanding

Which packages was the bug in?

Python Extensions (autogen-ext)

AutoGen library version.

Python 0.7.4

Other library version.

No response

Model used

gpt-4o

Model provider

OpenAI

Other model provider

No response

Python version

3.12

.NET version

None

Operating system

MacOS

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions