-
Notifications
You must be signed in to change notification settings - Fork 911
Description
Motivation
We want to implement and refine tracing and metrics using OpenTelemetry to ensure accurate analysis of key performance indicators.
These metrics will help us monitor and better understand system performance, cost, latency, and identify bottlenecks, while guiding ongoing improvements.
Solution Proposal
Step One – OpenTelemetry Implementation
Currently, we use the Logger
and ContextualCorrelator
classes internally (in-memory) to manage engine scope and propagate correlation IDs to generated events and logs.
We propose the following changes:
a. Extract ContextualCorrelator
into an abstract base class (ABC).
b. Create BasicContextualCorrelator
, which replicates today’s functionality.
c. Create a new class: class OpenTelemetry(Logger, ContextualCorrelator, Meter)
(i.e., implementing three interfaces).
d. In this class, implement scope()
using OTel spans, and have all logger calls (info()
, debug()
, etc.) emit OTel events.
e. Generate tracing_id
ourselves, making it equal to the correlation_id
for easier searching.
f. Since most logs in Parlant are essentially events—and because OTel logs are less widely supported—it makes sense to implement all logs as events.
g. For capturing metrics, introduce a Meter
class as follows:
AttributeValue: TypeAlias = Union[
str,
bool,
int,
float,
Sequence[str],
Sequence[bool],
Sequence[int],
Sequence[float],
]
class Meter(ABC):
@abstractmethod
async def record_counter(
self,
name: str,
value: int = 1,
attributes: Mapping[str, AttributeValue] | None = None,
) -> None: ...
@abstractmethod
async def record_histogram(
self,
name: str,
value: float,
attributes: Mapping[str, AttributeValue] | None = None,
) -> None: ...
@asynccontextmanager
async def measure_duration(
self,
name: str,
attributes: Mapping[str, AttributeValue] | None = None,
) -> AsyncGenerator[None, None]:
"""
Measure the duration of a block of code.
Usage:
async with meter.measure_duration("my_duration"):
# Code to measure
"""
start_time = asyncio.get_event_loop().time()
try:
yield
finally:
duration = asyncio.get_event_loop().time() - start_time
await self.record_histogram(name, duration, attributes)
h. Implement NoOpMeter to preserve the current behavior when metrics are disabled.
i. In bin/server.py, add a new flag --open-telemetry. When enabled, the following environment variables will be used:
1. OTEL_SERVICE_NAME (optional, defaults to "parlant")
2. OTEL_TRACES_EXPORTER (required)
3. OTEL_METRICS_EXPORTER (required)
4. OTEL_EXPORTER_OTLP_INSECURE (optional, defaults to False)
Step Two - Add metrics
TBD
Discussion
@kichanyurd What your thoughts?