Skip to content

Conversation

@zbirenbaum
Copy link
Contributor

@zbirenbaum zbirenbaum commented Apr 16, 2025

Description

  • Fixes WB-24513
  • Fixes WB-24514

This adjusts the parsing logic to make it simple to generate and keep docs in line with the parser code. This adjusts the parser such that a set of standard fields are created for each of input, output, summary and attributes. Each field is assigned a list of keys which are checked until a valid entry is discovered or the possible key entries are exhausted.

Also includes cost calculation fix

PRs into master which should be reviewed and merged before reviewing this:

PRs into this branch which should be reviewed and merged before reviewing this:

@zbirenbaum zbirenbaum requested a review from a team as a code owner April 16, 2025 03:31
@circle-job-mirror
Copy link

circle-job-mirror bot commented Apr 16, 2025


# Options: set
start_call = tsi.StartedCallSchemaForInsert(
project_id=project_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zacharyblasczyk : I came across this PR as I was implementing the pydantic_ai otel integration, one quick request I had was if it was possible to set the display_name: Optional[str] = None in StartedCallSchemaForInsert. We can map it to an attribute like weave.display_name from the attributes dict of the span. This can help with having nice display names for the calls in the UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant @zbirenbaum.

Copy link
Contributor Author

@zbirenbaum zbirenbaum Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parambharat

That's possible but I think the better route would be to try and set display name based on the attribute and fall back to the field we currently use to determine the name if it isn't there. Otherwise we would end up with traces that have a perfectly fine name to use being unnamed if that attribute isn't set.

Would that satisfy your use case?

I reviewed it again and saw what you mean, got op_name and display_name confused. Display name isn't being set at all right now. I'll modify it to try and load the name from attributes, nice catch

@ayulockin
Copy link
Member

Hey @zbirenbaum and @tssweeney when are we anticipating merging this PR and deploy the trace server with updates?

@zbirenbaum
Copy link
Contributor Author

Hey @zbirenbaum and @tssweeney when are we anticipating merging this PR and deploy the trace server with updates?

It was ideally going to be merged friday but it got stuck in CI due to some change in dspy integration tests. It should be merged today

# System prompt/instructions
"system": [
"gen_ai.system", # OpenTelemetry AI
"llm.system", # OpenInference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely not a big deal, but in our Input / Output keys OpenInference is before OT AI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it real quick for consistency but it shouldn't actually make a difference here, the same information is recorded in each key so even if they have both it will work properly.

for k in value.__dataclass_fields__
}
else:
raise ValueError(f"Unsupported type for JSON serialization: {type(value)}")
Copy link
Collaborator

@vanpelt vanpelt Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how common any of these would be but I went ahead and asked ChatGPT for other common python native types we might encounter when attempting to serialize. I know in our wandb sdk things like Infinity or NaN would bite us from time to time.

import math
from datetime import datetime, date, time, timedelta
from uuid import UUID
from enum import Enum
from decimal import Decimal
import base64

def to_json_serializable(value: Any) -> Any:
    """
    Transform common data types into JSON-serializable values.
    """
    if value is None:
        return None
    elif isinstance(value, (str, bool)):
        return value
    elif isinstance(value, int):
        return value
    elif isinstance(value, float):
        # Handle special floats: NaN, inf, -inf
        if math.isnan(value) or math.isinf(value):
            return str(value)
        return value
    elif isinstance(value, (list, tuple)):
        return [to_json_serializable(item) for item in value]
    elif isinstance(value, dict):
        return {str(k): to_json_serializable(v) for k, v in value.items()}
    elif isinstance(value, datetime):
        return value.isoformat()
    elif isinstance(value, date):  # date without time
        return value.isoformat()
    elif isinstance(value, time):  # time without date
        return value.isoformat()
    elif isinstance(value, timedelta):
        return value.total_seconds()
    elif isinstance(value, UUID):
        return str(value)
    elif isinstance(value, Enum):
        return value.value
    elif isinstance(value, Decimal):
        # Convert Decimal to float or str, depending on requirements.
        return float(value)
    elif isinstance(value, (set, frozenset)):
        return [to_json_serializable(item) for item in value]
    elif isinstance(value, complex):
        return {"real": value.real, "imag": value.imag}
    elif isinstance(value, (bytes, bytearray)):
        return base64.b64encode(value).decode('ascii')
    elif hasattr(value, "__dataclass_fields__"):
        return {k: to_json_serializable(getattr(value, k)) for k in value.__dataclass_fields__}
    else:
        raise ValueError(f"Unsupported type for JSON serialization: {type(value)}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    elif isinstance(value, (bytes, bytearray)):
        return base64.b64encode(value).decode('ascii')

Thanks for pointing those out, this should make it a bit more robust. Only note: is this the way we want to handle bytes values?

@zbirenbaum zbirenbaum merged commit c5774b2 into master Apr 21, 2025
140 checks passed
@zbirenbaum zbirenbaum deleted the zach/otel-parsing-rewrite branch April 21, 2025 19:43
@github-actions github-actions bot locked and limited conversation to collaborators Apr 21, 2025
@zbirenbaum
Copy link
Contributor Author

zbirenbaum commented Apr 21, 2025

/cherry-pick server-release-0.68.0

@zbirenbaum
Copy link
Contributor Author

/cherry-pick server-release-0.68.x

1 similar comment
@zbirenbaum
Copy link
Contributor Author

/cherry-pick server-release-0.68.x

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants