-
Notifications
You must be signed in to change notification settings - Fork 137
feat(weave): Refactor otel parsing to standardize fields #4143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=c61c2e8ae492c3eaa051af5426c61b96c1c8c70c |
|
|
||
| # Options: set | ||
| start_call = tsi.StartedCallSchemaForInsert( | ||
| project_id=project_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zacharyblasczyk : I came across this PR as I was implementing the pydantic_ai otel integration, one quick request I had was if it was possible to set the display_name: Optional[str] = None in StartedCallSchemaForInsert. We can map it to an attribute like weave.display_name from the attributes dict of the span. This can help with having nice display names for the calls in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant @zbirenbaum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's possible but I think the better route would be to try and set display name based on the attribute and fall back to the field we currently use to determine the name if it isn't there. Otherwise we would end up with traces that have a perfectly fine name to use being unnamed if that attribute isn't set.
Would that satisfy your use case?
I reviewed it again and saw what you mean, got op_name and display_name confused. Display name isn't being set at all right now. I'll modify it to try and load the name from attributes, nice catch
|
Hey @zbirenbaum and @tssweeney when are we anticipating merging this PR and deploy the trace server with updates? |
It was ideally going to be merged friday but it got stuck in CI due to some change in dspy integration tests. It should be merged today |
| # System prompt/instructions | ||
| "system": [ | ||
| "gen_ai.system", # OpenTelemetry AI | ||
| "llm.system", # OpenInference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely not a big deal, but in our Input / Output keys OpenInference is before OT AI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch it real quick for consistency but it shouldn't actually make a difference here, the same information is recorded in each key so even if they have both it will work properly.
| for k in value.__dataclass_fields__ | ||
| } | ||
| else: | ||
| raise ValueError(f"Unsupported type for JSON serialization: {type(value)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how common any of these would be but I went ahead and asked ChatGPT for other common python native types we might encounter when attempting to serialize. I know in our wandb sdk things like Infinity or NaN would bite us from time to time.
import math
from datetime import datetime, date, time, timedelta
from uuid import UUID
from enum import Enum
from decimal import Decimal
import base64
def to_json_serializable(value: Any) -> Any:
"""
Transform common data types into JSON-serializable values.
"""
if value is None:
return None
elif isinstance(value, (str, bool)):
return value
elif isinstance(value, int):
return value
elif isinstance(value, float):
# Handle special floats: NaN, inf, -inf
if math.isnan(value) or math.isinf(value):
return str(value)
return value
elif isinstance(value, (list, tuple)):
return [to_json_serializable(item) for item in value]
elif isinstance(value, dict):
return {str(k): to_json_serializable(v) for k, v in value.items()}
elif isinstance(value, datetime):
return value.isoformat()
elif isinstance(value, date): # date without time
return value.isoformat()
elif isinstance(value, time): # time without date
return value.isoformat()
elif isinstance(value, timedelta):
return value.total_seconds()
elif isinstance(value, UUID):
return str(value)
elif isinstance(value, Enum):
return value.value
elif isinstance(value, Decimal):
# Convert Decimal to float or str, depending on requirements.
return float(value)
elif isinstance(value, (set, frozenset)):
return [to_json_serializable(item) for item in value]
elif isinstance(value, complex):
return {"real": value.real, "imag": value.imag}
elif isinstance(value, (bytes, bytearray)):
return base64.b64encode(value).decode('ascii')
elif hasattr(value, "__dataclass_fields__"):
return {k: to_json_serializable(getattr(value, k)) for k in value.__dataclass_fields__}
else:
raise ValueError(f"Unsupported type for JSON serialization: {type(value)}")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif isinstance(value, (bytes, bytearray)):
return base64.b64encode(value).decode('ascii')
Thanks for pointing those out, this should make it a bit more robust. Only note: is this the way we want to handle bytes values?
|
/cherry-pick server-release-0.68.0 |
|
/cherry-pick server-release-0.68.x |
1 similar comment
|
/cherry-pick server-release-0.68.x |
Description
This adjusts the parsing logic to make it simple to generate and keep docs in line with the parser code. This adjusts the parser such that a set of standard fields are created for each of input, output, summary and attributes. Each field is assigned a list of keys which are checked until a valid entry is discovered or the possible key entries are exhausted.
Also includes cost calculation fix
PRs into master which should be reviewed and merged before reviewing this:
PRs into this branch which should be reviewed and merged before reviewing this: