[Enhancement] Traces And Metrics

# Motivation
We want to implement and refine tracing and metrics using OpenTelemetry to ensure accurate analysis of key performance indicators.  

These metrics will help us monitor and better understand system performance, cost, latency, and identify bottlenecks, while guiding ongoing improvements.

# Solution Proposal

## Step One – OpenTelemetry Implementation

Currently, we use the `Logger` and `ContextualCorrelator` classes internally (in-memory) to manage engine scope and propagate correlation IDs to generated events and logs.  

We propose the following changes:

a. Extract `ContextualCorrelator` into an abstract base class (ABC).  
b. Create `BasicContextualCorrelator`, which replicates today’s functionality.  
c. Create a new class: `class OpenTelemetry(Logger, ContextualCorrelator, Meter)` (i.e., implementing three interfaces).  
d. In this class, implement `scope()` using OTel spans, and have all logger calls (`info()`, `debug()`, etc.) emit OTel events.  
e. Generate `tracing_id` ourselves, making it equal to the `correlation_id` for easier searching.  
f. Since most logs in Parlant are essentially events—and because OTel logs are less widely supported—it makes sense to implement all logs as events.  
g. For capturing metrics, introduce a `Meter` class as follows:
```python
AttributeValue: TypeAlias = Union[
    str,
    bool,
    int,
    float,
    Sequence[str],
    Sequence[bool],
    Sequence[int],
    Sequence[float],
]

class Meter(ABC):
    @abstractmethod
    async def record_counter(
        self,
        name: str,
        value: int = 1,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> None: ...

    @abstractmethod
    async def record_histogram(
        self,
        name: str,
        value: float,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> None: ...

    @asynccontextmanager
    async def measure_duration(
        self,
        name: str,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> AsyncGenerator[None, None]:
        """
        Measure the duration of a block of code.
        Usage:
            async with meter.measure_duration("my_duration"):
                # Code to measure
        """
        start_time = asyncio.get_event_loop().time()
        try:
            yield
        finally:
            duration = asyncio.get_event_loop().time() - start_time
            await self.record_histogram(name, duration, attributes)

```
h. Implement NoOpMeter to preserve the current behavior when metrics are disabled.
i. In bin/server.py, add a new flag --open-telemetry. When enabled, the following environment variables will be used:
	1.	OTEL_SERVICE_NAME (optional, defaults to "parlant")
	2.	OTEL_TRACES_EXPORTER (required)
	3.	OTEL_METRICS_EXPORTER (required)
	4.	OTEL_EXPORTER_OTLP_INSECURE (optional, defaults to False)

## Step Two - Add metrics
TBD

# Discussion
@kichanyurd  What your thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Traces And Metrics #506

Motivation

Solution Proposal

Step One – OpenTelemetry Implementation

Step Two - Add metrics

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] Traces And Metrics #506

Description

Motivation

Solution Proposal

Step One – OpenTelemetry Implementation

Step Two - Add metrics

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions