Add JSON Schema Definition for gen_ai.tool.definitions#3378
Add JSON Schema Definition for gen_ai.tool.definitions#3378Cirilla-zmh wants to merge 5 commits intoopen-telemetry:mainfrom
Conversation
Change-Id: Ib6133b3019195e4fa9a54d329ff4b891a281d208 Co-developed-by: Cursor <noreply@cursor.com>
|
I apologize that the previous PR (#2942) was closed due to an incorrect rebase. All comments from the original PR have been addressed, and I believe this PR is now ready to be merged. Here is the change that I have made: 6838193 |
KalleOlaviNiemitalo
left a comment
There was a problem hiding this comment.
These are just my observations. I am not requesting any change.
|
A few thoughts: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/4142/changes -- this shows how the GCP GenAi instrumentation records tool definitions currently... Seems we always have a Should we just put tool definitions behind the content capture flag ( Should we have a schema specific to the |
Actually, we've already done that.
I believe we should follow the content capture flag, but the behaviors will be somewhat different with chat messages. Please see the description of That's to say, instrumentations should always capture
Of course! This is an initial PR, and I want to keep the scope limited. I'd rather not include additional definitions like Built-in or MCP Tools yet; we can follow up on those in a separate issue. |
|
Ok mostly SGTM..
Do we want to explicitly recommend instrumentations reuse the content capture flag for this purpose ? The language you have now doesn't really do that.. Also do we want a optional response type on the |
For the implementation, I believe we should reuse the content capture flag. Do you think we should add more relevant details to this description? semantic-conventions/docs/gen-ai/gen-ai-spans.md Lines 691 to 709 in d562045
Good point. @lmolkova and I have discussed this before and we decided to ignore this field for now, considering that most providers other than Gemini do not offer this definition. See #2942 (comment) |
| }, | ||
| "parameters": { | ||
| "anyOf": [ | ||
| {}, |
There was a problem hiding this comment.
What's the reason for the parameters field to be anything and not sth more defined like a dict?
If it is Any, how would one parse this field if there is nothing known about the structure of it?
There was a problem hiding this comment.
Following on from that
- Are there any real world systems that don't use JsonSchema for this? Maybe we can make it a MUST.
- If there are, can we have a mutually exclusive variant that is strictly required to be JsonSchema?
There was a problem hiding this comment.
And I think there should be a way to make pydantic reference the external schema http://json-schema.org/draft-07/schema# so that the generated schema is fully defined
| " description=(\n", | ||
| " \"Schema that defines the parameters accepted by the tool. \"\n", | ||
| " \"The RECOMMENDED format is JSON Schema.\"\n", | ||
| " \"Since this attribute could be large, it's NOT RECOMMENDED to be populated by default.\"\n", |
There was a problem hiding this comment.
I think we should add something along the lines of:
If the instrumentation already uses the ContentCapture flag, then the instrumentation should use that flag to enable parameters in the telemetry
There was a problem hiding this comment.
we don't (yet) define configuration options in semconv (including capture content). From implementation standpoint, I agree, we should start with one flag, but if we want to start documenting config options, this should rather be a follow up
There was a problem hiding this comment.
Can we file a follow up issue somewhere? We could at least document how python instrumentation behaves on opentelemetry.io
| "source": [ | ||
| "class GenericToolDefinition(BaseModel):\n", | ||
| " \"\"\"\n", | ||
| " Represents a tool definition in any forms.\n", |
There was a problem hiding this comment.
nit: in any form instead of forms
| " type: str = Field(description=\"The type of the tool.\")\n", | ||
| " name: str = Field(description=\"The name of the tool.\")\n", | ||
| "\n", | ||
| " class Config:\n", |
There was a problem hiding this comment.
What does this class config extra=allow thing actually do ? Does this mean I can add arbitrary key/value pairs for this type ? It seems to be specified on every type ?
There was a problem hiding this comment.
yes, it allows additional properties. This essentially keeps conventions future-proof and also add system-specific fields. All consumers should be ready to receive additional properties and should not fail on them.
Fixes #2721 #1835
Changes
This PR is a continuation of #2942 #2793. My apologies for accidentally closing the previous one.
Add JSON schema definition for gen_ai.tool.definitions.
Important
Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.
Merge requirement checklist
[chore]