Skip to content

Add JSON Schema Definition for gen_ai.tool.definitions#3378

Open
Cirilla-zmh wants to merge 5 commits intoopen-telemetry:mainfrom
Cirilla-zmh:minghui/tool_definitions
Open

Add JSON Schema Definition for gen_ai.tool.definitions#3378
Cirilla-zmh wants to merge 5 commits intoopen-telemetry:mainfrom
Cirilla-zmh:minghui/tool_definitions

Conversation

@Cirilla-zmh
Copy link
Member

Fixes #2721 #1835

Changes

This PR is a continuation of #2942 #2793. My apologies for accidentally closing the previous one.

Add JSON schema definition for gen_ai.tool.definitions.

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

Change-Id: I63f11ca8081f71b55b793ec88f35ef64bbace8e6
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: Ib6133b3019195e4fa9a54d329ff4b891a281d208
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: I5803322306a93b14f83b858b0deb632074e1d9c0
Co-developed-by: Cursor <noreply@cursor.com>
@Cirilla-zmh Cirilla-zmh requested review from a team as code owners February 3, 2026 06:36
@github-actions github-actions bot added enhancement New feature or request area:gen-ai labels Feb 3, 2026
@Cirilla-zmh Cirilla-zmh moved this from Untriaged to Needs More Approval in Semantic Conventions Triage Feb 3, 2026
@Cirilla-zmh
Copy link
Member Author

Cirilla-zmh commented Feb 3, 2026

I apologize that the previous PR (#2942) was closed due to an incorrect rebase. All comments from the original PR have been addressed, and I believe this PR is now ready to be merged.

Here is the change that I have made: 6838193
cc @lmolkova @DylanRussell @gyliu513 @alexmojaki

Copy link

@KalleOlaviNiemitalo KalleOlaviNiemitalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just my observations. I am not requesting any change.

@Cirilla-zmh Cirilla-zmh moved this from Needs More Approval to Ready to be Merged in Semantic Conventions Triage Feb 5, 2026
@lmolkova lmolkova moved this from Ready to be Merged to Needs More Approval in Semantic Conventions Triage Feb 5, 2026
@DylanRussell
Copy link

A few thoughts:

https://github.com/open-telemetry/opentelemetry-python-contrib/pull/4142/changes -- this shows how the GCP GenAi instrumentation records tool definitions currently... Seems we always have a name and description field. We didn't put the function params in, not sure why that is.. Should we add a name field to FunctionToolDefinition ?

Should we just put tool definitions behind the content capture flag (OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT) -- we did for Google genAI instrumentation.. Then we don't need to put function params behind an additional flag probably... See https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#full-buffered-content for context on this flag..

Should we have a schema specific to the Mcp.McpTool type ?

@Cirilla-zmh
Copy link
Member Author

Should we add a name field to FunctionToolDefinition ?

Actually, we've already done that. FunctionToolDefinition inherits from GenericToolDefinition, so it has a name field.

Should we just put tool definitions behind the content capture flag (OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT) -- we did for Google genAI instrumentation.. Then we don't need to put function params behind an additional flag probably...

I believe we should follow the content capture flag, but the behaviors will be somewhat different with chat messages. Please see the description of gen_ai.tool.definitions -- Since this attribute could be large, it's NOT RECOMMENDED to populate non-required properties by default. Instrumentations MAY provide a way to enable populating optional properties.

That's to say, instrumentations should always capture type and name. Once content capture is enabled, other optional properties such as description, parameters should get captured.

Should we have a schema specific to the Mcp.McpTool type ?

Of course! This is an initial PR, and I want to keep the scope limited. I'd rather not include additional definitions like Built-in or MCP Tools yet; we can follow up on those in a separate issue.

@DylanRussell
Copy link

Ok mostly SGTM..

That's to say, instrumentations should always capture type and name. Once content capture is enabled, other optional properties such as description, parameters should get captured.

Do we want to explicitly recommend instrumentations reuse the content capture flag for this purpose ? The language you have now doesn't really do that..

Also do we want a optional response type on the FunctionToolDefinition schema ? Gemini allows this: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1beta1/FunctionDeclaration

@Cirilla-zmh
Copy link
Member Author

Do we want to explicitly recommend instrumentations reuse the content capture flag for this purpose ? The language you have now doesn't really do that..

For the implementation, I believe we should reuse the content capture flag. Do you think we should add more relevant details to this description?

1. [Default] Don't record instructions, inputs, or outputs.
2. Record instructions, inputs, and outputs on the GenAI spans using corresponding
attributes (`gen_ai.system_instructions`, `gen_ai.input.messages`,
`gen_ai.output.messages`).
This approach is best suited for situations where telemetry volume is manageable
and either privacy regulations do not apply or the telemetry storage complies
with them, for example, in pre-production environments.
See [Recording content on attributes](#recording-content-on-attributes)
section for more details.
3. Store content externally and record references on the spans.
This pattern is recommended in production environments where telemetry volume
is a concern or sensitive data needs to be handled securely. Using external
storage enables separate access controls.
See [Uploading content to external storage](#uploading-content-to-external-storage)
section for more details.

Also do we want a optional response type on the FunctionToolDefinition schema ? Gemini allows this: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/reference/rest/v1beta1/FunctionDeclaration

Good point. @lmolkova and I have discussed this before and we decided to ignore this field for now, considering that most providers other than Gemini do not offer this definition. See #2942 (comment)

},
"parameters": {
"anyOf": [
{},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the parameters field to be anything and not sth more defined like a dict?

If it is Any, how would one parse this field if there is nothing known about the structure of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on from that

  1. Are there any real world systems that don't use JsonSchema for this? Maybe we can make it a MUST.
  2. If there are, can we have a mutually exclusive variant that is strictly required to be JsonSchema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think there should be a way to make pydantic reference the external schema http://json-schema.org/draft-07/schema# so that the generated schema is fully defined

" description=(\n",
" \"Schema that defines the parameters accepted by the tool. \"\n",
" \"The RECOMMENDED format is JSON Schema.\"\n",
" \"Since this attribute could be large, it's NOT RECOMMENDED to be populated by default.\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add something along the lines of:

If the instrumentation already uses the ContentCapture flag, then the instrumentation should use that flag to enable parameters in the telemetry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't (yet) define configuration options in semconv (including capture content). From implementation standpoint, I agree, we should start with one flag, but if we want to start documenting config options, this should rather be a follow up

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, NP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file a follow up issue somewhere? We could at least document how python instrumentation behaves on opentelemetry.io

"source": [
"class GenericToolDefinition(BaseModel):\n",
" \"\"\"\n",
" Represents a tool definition in any forms.\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in any form instead of forms

" type: str = Field(description=\"The type of the tool.\")\n",
" name: str = Field(description=\"The name of the tool.\")\n",
"\n",
" class Config:\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this class config extra=allow thing actually do ? Does this mean I can add arbitrary key/value pairs for this type ? It seems to be specified on every type ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it allows additional properties. This essentially keeps conventions future-proof and also add system-specific fields. All consumers should be ready to receive additional properties and should not fail on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:gen-ai enhancement New feature or request

Projects

Status: Needs More Approval

Development

Successfully merging this pull request may close these issues.

Define schema for Tool Definitions for Single and Multi-Agent Spans

6 participants