Skip to content

Comments

feat: add spanRequestHeaderAttributes and ENV mapping for CLI#1269

Merged
mathetake merged 3 commits intoenvoyproxy:mainfrom
codefromthecrypt:otel-attributes
Oct 3, 2025
Merged

feat: add spanRequestHeaderAttributes and ENV mapping for CLI#1269
mathetake merged 3 commits intoenvoyproxy:mainfrom
codefromthecrypt:otel-attributes

Conversation

@codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Oct 3, 2025

Description

This uses the same header-to-attribute mapping approach for both OpenTelemetry spans and metrics, enabling session tracking and custom attribute propagation without requiring code instrumentation.

Specifically, this deprecates metricsRequestHeaderLabels which was artificially Prometheus specific in favor of:

Kubernetes/extproc

  • spanRequestHeaderAttributes: "x-session-id:session.id,x-user-id:user.id"
  • metricsRequestHeaderAttributes: "x-team-id:team.id,x-user-id:user.id"

Note: Before we told people to lower_snake_case label values, but you don't need to do that because the prometheus exporter already does that. We shouldn't make mapping decisions like this in aigw as it interferes with non-prometheus metrics systems. This is particularly highlighted in "session.id" which is handled where if you made it "session_id" it wouldn't because the latter isn't an otel convention.

CLI (aigw run)
To match convention of existing otel config, use ENV vars

  • OTEL_AIGW_SPAN_REQUEST_HEADER_ATTRIBUTES="x-session-id:session.id,x-user-id:user.id"
  • OTEL_AIGW_METRICS_REQUEST_HEADER_ATTRIBUTES="x-team-id:team.id,x-user-id:user.id"

Example App

I built from scratch like this

make build.aigw GOOS_LIST=linux
cd cmd/aigw
COMPOSE_PROFILES=phoenix docker compose -f docker-compose-otel.yaml up --build --wait -d

Then I ran this

# run like this: uv run --exact -q --env-file .env main.py
#
# # customizing .env like:
# OPENAI_BASE_URL=http://localhost:1975/v1
# OPENAI_API_KEY=unused
# CHAT_MODEL=qwen3:4b
#
# # Soon we can do this maybe...
# MCP_URL=https://localhost:1975/mcp
#
# /// script
# dependencies = [
#     "openai-agents~=0.3.2",
#     "httpx~=0.28.1",
#     "mcp~=1.15.0",
# ]
# ///
import asyncio
import os

import httpx
from openai import AsyncOpenAI

from agents import (
    Agent,
    OpenAIProvider,
    RunConfig,
    Runner,
    Tool,
    get_current_trace,
    set_trace_processors,
)
from agents.mcp import MCPServerStreamableHttp, MCPUtil

# Disable OpenAI Platform trace callbacks to avoid 401s
set_trace_processors([])


async def add_session_id_header(request: httpx.Request):
    """Event hook to add x-session-id header from current trace context"""
    trace = get_current_trace()
    if trace and trace.trace_id:
        request.headers["x-session-id"] = trace.trace_id


async def run_agent(tools: list[Tool]):
    model_name = os.getenv("CHAT_MODEL", "gpt-4o-mini")

    # Create custom HTTP client with session ID header injection
    http_client = httpx.AsyncClient(event_hooks={"request": [add_session_id_header]})

    openai_client = AsyncOpenAI(http_client=http_client)
    provider = OpenAIProvider(openai_client=openai_client, use_responses=False)
    model = provider.get_model(model_name)

    agent = Agent(
        name="code_agent",
        model=model,
        tools=tools,
    )

    result = await Runner.run(
        starting_agent=agent,
        input="Create envoy yaml that changes the admin port to 9999. use context7",
        run_config=RunConfig(workflow_name="postgres"),
    )
    print(result.final_output)


async def main():
    # Connect to an MCP server that has context7 tools registered
    mcp_url = os.getenv("MCP_URL", "https://mcp.context7.com/mcp")
    async with MCPServerStreamableHttp(
        {
            "url": mcp_url,
            "timeout": 30.0,
        },
        cache_tools_list=True,
    ) as server:
        tools = await server.list_tools()
        util = MCPUtil()
        tools = [util.to_function_tool(tool, server, False) for tool in tools]
        await run_agent(tools)


if __name__ == "__main__":
    asyncio.run(main())

Then, my phoenix session had the expected session ID, mapped to the openai agent trace ID:

Screenshot 2025-10-03 at 7 27 12 PM

This includes all 3 LLM spans. This shows you can group the whole conversation even when normal tracing isn't setup on the client.

Related Issues/PRs

Fixes #1221

Special notes for reviewers

Backward compatibility for the previous metricsRequestHeaderLabels flag:

  • Old flag: --metricsRequestHeaderLabels / controller.metricsRequestHeaderLabels
  • New flag: --metricsRequestHeaderAttributes / controller.metricsRequestHeaderAttributes
  • Fallback logic: If new flag is unset, old flag value is used

**Description**

This adds header-to-attribute mapping for both OpenTelemetry spans and metrics, enabling session tracking and custom attribute propagation without requiring code instrumentation.

Specifically, this deprecates `metricsRequestHeaderLabels` which was artificially
Prometheus specific in favor of:

**Kubernetes/extproc**
- `spanRequestHeaderAttributes`: "x-session-id:session.id,x-user-id:user.id"
- `metricsRequestHeaderAttributes`: "x-team-id:team.id,x-user-id:user.id"

**CLI (aigw run)**
To match convention of existing otel config, use ENV vars

- OTEL_AIGW_SPAN_REQUEST_HEADER_ATTRIBUTES="x-session-id:session.id,x-user-id:user.id"
- OTEL_AIGW_METRICS_REQUEST_HEADER_ATTRIBUTES="x-team-id:team.id,x-user-id:user.id"

**Backward Compatibility**

Full backward compatibility for the previous metricsRequestHeaderLabels flag:
- Old flag: --metricsRequestHeaderLabels / controller.metricsRequestHeaderLabels
- New flag: --metricsRequestHeaderAttributes / controller.metricsRequestHeaderAttributes
- Fallback logic: If new flag is unset, old flag value is used
- Deprecation warnings: Logged when old flag is detected
- Removal timeline: Deprecated flag will be removed after v0.4

**Documentation**

- site/docs/capabilities/observability/tracing.md#session-tracking
- site/docs/cli/run.md#header-mapping
- cmd/aigw/docker-compose-otel.yaml

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.64%. Comparing base (9425769) to head (57a9542).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (77.64%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1269      +/-   ##
==========================================
+ Coverage   77.60%   77.64%   +0.03%     
==========================================
  Files         116      116              
  Lines       15192    15205      +13     
==========================================
+ Hits        11790    11806      +16     
+ Misses       2812     2808       -4     
- Partials      590      591       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup & improvement as well on the existing header mapping!

@mathetake mathetake merged commit 6a2d917 into envoyproxy:main Oct 3, 2025
31 checks passed
missBerg pushed a commit to missBerg/ai-gateway that referenced this pull request Dec 20, 2025
…roxy#1269)

**Description**

This uses the same header-to-attribute mapping approach for both
OpenTelemetry spans and metrics, enabling session tracking and custom
attribute propagation without requiring code instrumentation.

Specifically, this deprecates `metricsRequestHeaderLabels` which was
artificially Prometheus specific in favor of:

**Kubernetes/extproc**
- `spanRequestHeaderAttributes`:
"x-session-id:session.id,x-user-id:user.id"
- `metricsRequestHeaderAttributes`:
"x-team-id:team.id,x-user-id:user.id"

Note: Before we told people to lower_snake_case label values, but you
don't need to do that because the prometheus exporter already does that.
We shouldn't make mapping decisions like this in aigw as it interferes
with non-prometheus metrics systems. This is particularly highlighted in
"session.id" which is handled where if you made it "session_id" it
wouldn't because the latter isn't an otel convention.

**Related Issues/PRs**

Fixes envoyproxy#1221

---------

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support emitting session.id on LLM spans in Envoy AI Gateway’s OpenTelemetry export for Phoenix session grouping

3 participants