Skip to content

[Enhancement] Traces And Metrics #506

@mc-dorzo

Description

@mc-dorzo

Motivation

We want to implement and refine tracing and metrics using OpenTelemetry to ensure accurate analysis of key performance indicators.

These metrics will help us monitor and better understand system performance, cost, latency, and identify bottlenecks, while guiding ongoing improvements.

Solution Proposal

Step One – OpenTelemetry Implementation

Currently, we use the Logger and ContextualCorrelator classes internally (in-memory) to manage engine scope and propagate correlation IDs to generated events and logs.

We propose the following changes:

a. Extract ContextualCorrelator into an abstract base class (ABC).
b. Create BasicContextualCorrelator, which replicates today’s functionality.
c. Create a new class: class OpenTelemetry(Logger, ContextualCorrelator, Meter) (i.e., implementing three interfaces).
d. In this class, implement scope() using OTel spans, and have all logger calls (info(), debug(), etc.) emit OTel events.
e. Generate tracing_id ourselves, making it equal to the correlation_id for easier searching.
f. Since most logs in Parlant are essentially events—and because OTel logs are less widely supported—it makes sense to implement all logs as events.
g. For capturing metrics, introduce a Meter class as follows:

AttributeValue: TypeAlias = Union[
    str,
    bool,
    int,
    float,
    Sequence[str],
    Sequence[bool],
    Sequence[int],
    Sequence[float],
]

class Meter(ABC):
    @abstractmethod
    async def record_counter(
        self,
        name: str,
        value: int = 1,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> None: ...

    @abstractmethod
    async def record_histogram(
        self,
        name: str,
        value: float,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> None: ...

    @asynccontextmanager
    async def measure_duration(
        self,
        name: str,
        attributes: Mapping[str, AttributeValue] | None = None,
    ) -> AsyncGenerator[None, None]:
        """
        Measure the duration of a block of code.
        Usage:
            async with meter.measure_duration("my_duration"):
                # Code to measure
        """
        start_time = asyncio.get_event_loop().time()
        try:
            yield
        finally:
            duration = asyncio.get_event_loop().time() - start_time
            await self.record_histogram(name, duration, attributes)

h. Implement NoOpMeter to preserve the current behavior when metrics are disabled.
i. In bin/server.py, add a new flag --open-telemetry. When enabled, the following environment variables will be used:
1. OTEL_SERVICE_NAME (optional, defaults to "parlant")
2. OTEL_TRACES_EXPORTER (required)
3. OTEL_METRICS_EXPORTER (required)
4. OTEL_EXPORTER_OTLP_INSECURE (optional, defaults to False)

Step Two - Add metrics

TBD

Discussion

@kichanyurd What your thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions