Skip to content

feat: Initial commit of OpenTelemetry MCP Server#2

Merged
doronkopit5 merged 8 commits intomainfrom
dk/mcp-server
Nov 17, 2025
Merged

feat: Initial commit of OpenTelemetry MCP Server#2
doronkopit5 merged 8 commits intomainfrom
dk/mcp-server

Conversation

@doronkopit5
Copy link
Copy Markdown
Member

@doronkopit5 doronkopit5 commented Nov 16, 2025

Fixes TLP-1126
Fixes TLP-1128

Summary by CodeRabbit

  • New Features

    • OpenTelemetry MCP Server: multi-backend support (Jaeger, Tempo, Traceloop), rich trace/span search with hybrid server/client multi-filtering, LLM-focused tools (model listing, usage/metrics, expensive/slow traces, tool-call discovery), CLI and HTTP runtimes, and health checks.
  • Documentation

    • Full project documentation and usage guide added; example environment file provided.
  • Chores

    • Container/Dockerfile, CI workflow, Python version pin, packaging config, and local start script added.
  • Tests

    • Extensive unit and integration suites with VCR cassettes and fixtures.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Nov 16, 2025

Walkthrough

Adds a new OpenTelemetry MCP Server project: packaging, CI, Dockerfile, configuration, data models, backend abstraction and three concrete backends (Jaeger, Tempo, Traceloop), a client-side FilterEngine, server tools (FastMCP), extensive unit/integration tests with VCR cassettes, and documentation.

Changes

Cohort / File(s) Summary
Project infra & packaging
\.dockerignore, \.env.example, \.python-version, pyproject.toml, Dockerfile, start_locally.sh
Add packaging and infra: dockerignore, example env, python version pin, pyproject with deps and scripts, multi-stage Dockerfile, and local start script.
CI workflow
.github/workflows/ci.yml
Add GitHub Actions workflow: lint, test, and matrix build-test (Python 3.12, ruff, mypy, pytest, docker/build-push-action).
Top-level package
src/opentelemetry_mcp/__init__.py
Package initializer with docstring and __version__ = "0.1.0".
Configuration & utils
src/opentelemetry_mcp/config.py, src/opentelemetry_mcp/utils.py
Add Pydantic BackendConfig/ServerConfig with env loading/validators and CLI overrides; ISO timestamp parsing helper.
Constants
src/opentelemetry_mcp/constants.py
Add GenAI/LLM/Traceloop/Service semantic attribute constants and aliases.
Data models & attributes
src/opentelemetry_mcp/models.py, src/opentelemetry_mcp/attributes.py
Add Filter types/operators, Trace/Span models, LLMSpanAttributes, UsageMetrics, summaries, TraceQuery/SpanQuery, and SpanAttributes/SpanEvent/HealthCheckResponse with helpers.
Backend core & engine
src/opentelemetry_mcp/backends/base.py, src/opentelemetry_mcp/backends/filter_engine.py, src/opentelemetry_mcp/backends/__init__.py
Add BaseBackend abstract class (async client lifecycle), FilterEngine (client-side filter evaluation with dotted-path extraction), and re-export BaseBackend.
Backend implementations
src/opentelemetry_mcp/backends/jaeger.py, .../tempo.py, .../traceloop.py
Implement JaegerBackend, TempoBackend, TraceloopBackend: header/auth, native vs client-side filter partitioning, trace/span retrieval and parsing, service discovery, health checks, and parsing helpers.
Server & MCP tools
src/opentelemetry_mcp/server.py, src/opentelemetry_mcp/tools/*, src/opentelemetry_mcp/tools/__init__.py
Add FastMCP server entrypoint, backend lazy-init and health-check, many tools (search_traces/search_spans/get_trace/get_llm_usage/list_models/model_stats/expensive_traces/slow_traces/list_llm_tools/find_errors/services, etc.), and CLI.
Documentation
CLAUDE.md
Add comprehensive project documentation covering architecture, conventions, tools, development workflow, and examples.
Tests — unit & fixtures
tests/__init__.py, tests/conftest.py, tests/test_models.py, tests/test_traceloop.py, tests/integration/conftest.py
Add unit tests, pytest fixtures, and integration VCR config for Jaeger/Tempo/Traceloop backends.
Integration tests & cassettes
tests/integration/*, tests/integration/cassettes/...
Add extensive integration suites and many VCR cassettes covering health, search, spans, operations, errors and LLM scenarios for Jaeger, Tempo, and Traceloop.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Server as FastMCP
    participant Backend as BaseBackend
    participant Remote as Tracing API

    Client->>Server: invoke tool (e.g. search_traces)
    Server->>Server: build TraceQuery / parse filters
    Server->>Backend: await search_traces(query)

    alt Backend supports native filters
        Backend->>Remote: request with native filters
        Remote-->>Backend: traces response
        Backend->>Backend: apply remaining client-side filters (FilterEngine)
    else Backend lacks native filters
        Backend->>Remote: broader request (fewer filters)
        Remote-->>Backend: traces response
        Backend->>Backend: apply all filters (FilterEngine)
    end

    Backend-->>Server: List[TraceData]
    Server->>Server: aggregate/format (summaries, LLM metrics)
    Server-->>Client: JSON result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas needing extra attention:

  • src/opentelemetry_mcp/backends/{jaeger.py, tempo.py, traceloop.py} — parsing logic, hybrid filter partitioning, HTTP request/response shaping, error handling and edge cases.
  • src/opentelemetry_mcp/backends/filter_engine.py — operator semantics, dotted-path extraction, and positive vs negative operator behavior across traces/spans.
  • src/opentelemetry_mcp/models.py & attributes.py — Pydantic validation, legacy vs new attribute mappings, token attribution and aggregation correctness.
  • src/opentelemetry_mcp/server.py and src/opentelemetry_mcp/tools/* — input parsing, backend lifecycle, concurrency and consistent JSON error/response shapes.
  • tests/integration + VCR config — cassette matching rules, request/response scrubbing, and timestamp normalization.

Poem

🐇 I hopped through traces, nose in the code,
Jaeger, Tempo, Traceloop on the road.
Filters and tokens counted, tidy and neat,
A tiny server returns what it meets.
Hop, hop — queries answered, carrot treat.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and clearly describes the main change: an initial commit introducing the OpenTelemetry MCP Server.
Linked Issues check ✅ Passed The PR references TLP-1126 (Initial server setup) and TLP-1128 (Tests including recordings), and the changeset comprehensively addresses both with complete server implementation and test infrastructure with VCR cassettes.
Out of Scope Changes check ✅ Passed All changes are in-scope: configuration files, backend implementations, MCP tools/server, data models, utilities, tests, and documentation directly support the OpenTelemetry MCP Server objectives.
Docstring Coverage ✅ Passed Docstring coverage is 98.83% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dk/mcp-server

Comment @coderabbitai help to get the list of available commands and usage tips.

@ellipsis-dev
Copy link
Copy Markdown

ellipsis-dev Bot commented Nov 16, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (21)
Dockerfile (2)

56-60: Consider validation or documentation for empty environment variable defaults.

BACKEND_TYPE and BACKEND_URL default to empty strings, which will likely cause the server to fail at startup if these are required configuration values. Consider either:

  • Setting sensible defaults if possible
  • Documenting that these must be provided at runtime
  • Adding validation at server startup to fail fast with a clear error message

63-64: Healthcheck doesn't verify server functionality.

The current healthcheck only verifies that Python can execute, not that the MCP server is actually running or responding to requests. Consider implementing a proper HTTP health endpoint and updating the healthcheck to query it:

-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD python -c "import sys; sys.exit(0)"
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1

Note: This assumes a /health endpoint exists or will be added. Alternatively, you could use a Python script that makes an HTTP request to verify the server is responsive.

src/opentelemetry_mcp/tools/slow_traces.py (1)

10-113: Avoid redundant get_trace calls and reuse TraceData from search_traces.

BaseBackend.search_traces already returns full TraceData with spans, so calling backend.get_trace(summary.trace_id) for every result adds up to limit * 10 extra network round-trips and duplicates token aggregation logic. You can work directly on the traces returned from search_traces and reuse TraceData.total_llm_tokens and len(trace.llm_spans).

A possible refactor:

-        # Search traces
-        trace_summaries = await backend.search_traces(query)
-
-        # Collect trace data with durations
-        slow_traces = []
-
-        for summary in trace_summaries:
-            # Get full trace to access LLM spans
-            trace = await backend.get_trace(summary.trace_id)
+        # Search traces (already returns full TraceData objects)
+        traces = await backend.search_traces(query)
+
+        # Collect trace data with durations
+        slow_traces: list[dict[str, object]] = []
+
+        for trace in traces:
             # Only include traces that have LLM spans
             if not trace.llm_spans:
                 continue
-
-            total_tokens = 0
-            models_used = set()
-
-            for span in trace.llm_spans:
-                llm_attrs = LLMSpanAttributes.from_span(span)
-                if not llm_attrs:
-                    continue
-
-                # Calculate total tokens
-                if llm_attrs.total_tokens:
-                    total_tokens += llm_attrs.total_tokens
-
-                # Track models
-                model = llm_attrs.response_model or llm_attrs.request_model
-                if model:
-                    models_used.add(model)
+            models_used: set[str] = set()
+            for span in trace.llm_spans:
+                llm_attrs = LLMSpanAttributes.from_span(span)
+                if not llm_attrs:
+                    continue
+                model = llm_attrs.response_model or llm_attrs.request_model
+                if model:
+                    models_used.add(model)
 
             slow_traces.append(
                 {
                     "trace_id": trace.trace_id,
                     "service_name": trace.service_name,
                     "operation_name": trace.root_operation,
                     "start_time": trace.start_time.isoformat(),
                     "duration_ms": round(trace.duration_ms, 2),
-                    "models": sorted(list(models_used)),
-                    "total_tokens": total_tokens,
-                    "llm_span_count": len(trace.llm_spans),
+                    "models": sorted(models_used),
+                    "total_tokens": trace.total_llm_tokens,
+                    "llm_span_count": len(trace.llm_spans),
                     "status": trace.status,
                     "has_errors": trace.has_errors,
                 }
             )

This keeps behavior the same while significantly reducing backend load and simplifying the aggregation.

Please run the existing slow-traces tests (and integration tests against Jaeger/Tempo/Traceloop) after this change to confirm the behavior and payload shape remain compatible.

tests/__init__.py (1)

1-1: Optional: Align capitalization with “OpenTelemetry”.

The docstring uses Opentelemetry, while other modules use OpenTelemetry. For consistency you might want:

"""Tests for OpenTelemetry MCP Server."""

Purely cosmetic; no functional impact.

tests/test_traceloop.py (1)

43-137: Consider testing the actual backend method instead of replicating logic.

This test manually replicates the filter building logic rather than calling the actual backend method. This creates maintenance burden if the implementation changes.

Consider refactoring to test the actual backend method:

async def test_build_filters_for_search() -> None:
    """Test filter building for search_traces."""
    backend = TraceloopBackend(
        url="https://api.traceloop.com/v2",
        api_key="test_key",
    )
    
    query = TraceQuery(
        service_name="my-service",
        operation_name="my-operation",
        gen_ai_system="openai",
        gen_ai_request_model="gpt-4",
        min_duration_ms=1000,
        max_duration_ms=5000,
        has_error=True,
        tags={"custom.tag": "value"},
        limit=50,
    )
    
    # Mock the HTTP client to capture the request
    with patch.object(backend.client, 'post') as mock_post:
        mock_post.return_value = Mock(json=lambda: {"data": []})
        await backend.search_traces(query)
        
        # Verify the filters in the actual API call
        call_args = mock_post.call_args
        filters = call_args[1]['json']['filters']
        
        # Assert expected filters are present
        assert len(filters) == 8
        # ... rest of assertions
src/opentelemetry_mcp/tools/usage.py (1)

36-44: Consider validating time range ordering

You already parse timestamps and surface format errors; as an extra guard, consider returning a clear error when start_dt and end_dt are both set but start_dt > end_dt, instead of letting backends handle an inverted range.

src/opentelemetry_mcp/tools/list_models.py (2)

41-61: Avoid redundant get_trace calls; reuse traces from search_traces

Here you treat search_traces results as summaries and re-fetch each trace via get_trace, but get_llm_usage (in usage.py) already consumes trace.llm_spans directly from backend.search_traces(query). That suggests backends are returning trace objects with spans, making the extra per-trace round trips unnecessary and potentially expensive for large limit values.

You can simplify and speed this up by iterating directly over the traces returned by search_traces, mirroring get_llm_usage:

-        # Build query to fetch traces
+        # Build query to fetch traces
         query = TraceQuery(
             start_time=start_dt,
             end_time=end_dt,
             service_name=service_name,
             gen_ai_system=gen_ai_system,
             limit=limit,
         )
 
-        # Search traces
-        trace_summaries = await backend.search_traces(query)
+        # Search traces (full trace objects with spans)
+        traces = await backend.search_traces(query)
@@
-        for summary in trace_summaries:
-            # Get full trace to access LLM spans
-            trace = await backend.get_trace(summary.trace_id)
-
-            for span in trace.llm_spans:
+        for trace in traces:
+            for span in trace.llm_spans:

This keeps network usage bounded to what search_traces already does and aligns the behavior with get_llm_usage.


32-40: Optional: validate time range consistency

As with get_llm_usage, you parse and validate timestamp formats but don’t guard against start_dt > end_dt. Adding a simple check and returning a clear JSON error would make this tool’s behavior more predictable for callers supplying bad ranges.

src/opentelemetry_mcp/tools/trace.py (1)

61-75: Avoid recomputing llm_spans multiple times

trace.llm_spans is evaluated three times (truthiness check, len, and the set comprehension), each re-scanning spans. Caching once keeps behavior the same and avoids extra work on large traces.

-        # Add LLM-specific summary
-        if trace.llm_spans:
-            result["llm_summary"] = {
-                "llm_span_count": len(trace.llm_spans),
-                "total_tokens": trace.total_llm_tokens,
-                "models_used": list(
-                    {
-                        span.attributes.gen_ai_request_model
-                        or span.attributes.gen_ai_response_model
-                        for span in trace.llm_spans
-                        if span.attributes.gen_ai_request_model
-                        or span.attributes.gen_ai_response_model
-                    }
-                ),
-            }
+        # Add LLM-specific summary
+        llm_spans = trace.llm_spans
+        if llm_spans:
+            result["llm_summary"] = {
+                "llm_span_count": len(llm_spans),
+                "total_tokens": trace.total_llm_tokens,
+                "models_used": list(
+                    {
+                        span.attributes.gen_ai_request_model
+                        or span.attributes.gen_ai_response_model
+                        for span in llm_spans
+                        if span.attributes.gen_ai_request_model
+                        or span.attributes.gen_ai_response_model
+                    }
+                ),
+            }
src/opentelemetry_mcp/tools/expensive_traces.py (1)

55-64: Avoid double-fetching traces in get_expensive_traces

search_traces already returns full TraceData objects with spans per the BaseBackend contract; calling get_trace again for each result adds unnecessary network calls and latency.

-        # Search traces
-        trace_summaries = await backend.search_traces(query)
-
-        # Collect trace data with token counts
-        expensive_traces = []
-
-        for summary in trace_summaries:
-            # Get full trace to access LLM spans
-            trace = await backend.get_trace(summary.trace_id)
+        # Search traces (returns full TraceData objects)
+        traces = await backend.search_traces(query)
+
+        # Collect trace data with token counts
+        expensive_traces = []
+
+        for trace in traces:
src/opentelemetry_mcp/tools/model_stats.py (1)

79-106: Avoid redundant get_trace calls; reuse full traces from search_traces

search_traces already returns full TraceData objects with spans (as exercised in tests/integration/test_jaeger_integration.py where trace.spans is asserted), so calling get_trace again for each summary introduces avoidable N+1 backend calls without adding data. You can simplify and improve performance by iterating the traces directly:

-        # Search traces
-        trace_summaries = await backend.search_traces(query)
+        # Search traces
+        traces = await backend.search_traces(query)
@@
-        for summary in trace_summaries:
-            # Get full trace to access LLM spans
-            trace = await backend.get_trace(summary.trace_id)
+        for trace in traces:
             for span in trace.llm_spans:
                 ...

This keeps the external behavior identical while removing extra network round-trips.

CLAUDE.md (1)

111-112: Minor doc polish: wording and fenced code language

Two small tweaks to align with style/linting:

  • Around Line 111: “FastMCP application, CLI interface, tool handlers” – the “I” in CLI already means interface; consider shortening to “FastMCP application, CLI, tool handlers”.
  • Around Lines 452–469: the fenced block showing the ValueError currently has no language; add one (e.g., ```text or ```bash) to satisfy markdownlint and improve rendering.

Purely cosmetic, so feel free to tackle later.

Also applies to: 452-469

tests/integration/test_jaeger_integration.py (1)

63-190: Strong Jaeger coverage; a couple of small cleanups possible

These integration tests do a nice job exercising:

  • Service and operation discovery,
  • Limits, duration/error filters, and generic filter behavior,
  • Span search (including LLM spans) and model filters.

Two optional nits you might consider:

  • The pattern:

    services = await jaeger_backend.list_services()
    service_name = services[0]

    appears in several tests; a small helper/fixture returning a “default test service” would DRY this up.

  • In test_search_traces_with_llm_model_filter, llm_spans is already [s for s in trace.spans if s.is_llm_span], so the extra if s.is_llm_span inside the list comprehension that builds model_names is redundant.

None of this affects correctness; tests look good as-is.

Also applies to: 228-367

src/opentelemetry_mcp/backends/base.py (1)

15-42: Backend abstraction and client lifecycle look good; tiny robustness tweak possible

The BaseBackend interface is well-shaped for the Jaeger/Tempo/Traceloop backends, and the lazy AsyncClient plus async context manager give you nice connection pooling and an easy async with story.

A small optional tweak in close would avoid calling aclose() on an already-closed client:

     async def close(self) -> None:
         """Close HTTP client connections."""
-        if self._client:
-            await self._client.aclose()
-            self._client = None
+        if self._client and not self._client.is_closed:
+            await self._client.aclose()
+        self._client = None

Functionally it should behave the same today, but this is a bit more defensive if subclasses ever close the client themselves.

Also applies to: 136-160

src/opentelemetry_mcp/backends/filter_engine.py (2)

24-55: Simplify duplicated filtering loop in apply_filters

Both branches of apply_filters run the same loop; only type-checking differs. You can remove the duplication and rely on the overloads plus _matches_all_filters to handle trace vs span items.

Example refactor:

     @staticmethod
     def apply_filters(
         items: list[TraceData] | list[SpanData], filters: list[Filter]
     ) -> list[TraceData] | list[SpanData]:
@@
-        filtered_items: list[TraceData] | list[SpanData]
-        if isinstance(items[0] if items else None, TraceData):
-            filtered_items = []
-            for item in items:
-                if FilterEngine._matches_all_filters(item, filters):
-                    filtered_items.append(item)  # type: ignore[arg-type]
-        else:
-            filtered_items = []
-            for item in items:
-                if FilterEngine._matches_all_filters(item, filters):
-                    filtered_items.append(item)  # type: ignore[arg-type]
+        filtered_items: list[TraceData] | list[SpanData] = [
+            item for item in items if FilterEngine._matches_all_filters(item, filters)
+        ]

200-253: Tighten boolean coercion to avoid surprising matches

_compare_value currently does actual_bool = bool(actual) and expected_bool = bool(expected). For string inputs, "false" and "0" both evaluate to True, which makes filters like value_type=BOOLEAN, value="false" behave unexpectedly.

Consider normalizing booleans more explicitly (e.g., handling True/False and common string/int representations) and treating unparseable values as non-matches instead of relying on bare bool(...).

src/opentelemetry_mcp/backends/tempo.py (1)

463-475: Escape values in TraceQL CONTAINS/IN conditions to avoid malformed regex/queries

_filter_to_traceql interpolates raw values into TraceQL, e.g.:

  • CONTAINSfield =~ ".*{value}.*"
  • IN on strings → OR of field = "{v}"

If value or entries in values contain quotes or regex/meta characters, this can produce invalid or surprising TraceQL.

Consider escaping values before interpolation (e.g. via re.escape for CONTAINS and proper quoting/escaping for string literals) so that filter values are treated as data, not regex/query syntax.

src/opentelemetry_mcp/backends/jaeger.py (1)

85-104: Clarify or remove unused native_filters parameter

_search_service_traces_with_filters accepts native_filters but does not use it; all server-side filtering is driven by query.to_backend_params(). This can be misleading given the hybrid-filtering comment above.

Either wire native_filters into the Jaeger request (if/when you add tag-based native filtering) or drop the parameter for now to keep the API surface honest.

src/opentelemetry_mcp/attributes.py (1)

60-83: Broaden attribute value types and simplify serialization using model_dump.

SpanAttributes can legitimately hold list-valued fields (e.g., gen_ai_response_finish_reasons, llm_response_finish_reasons) and arbitrary extras via extra="allow", but:

  • to_dict() is annotated as returning dict[str, str | int | float | bool] and initializes result with the same narrow value type.
  • get() / __getitem__() are also typed to return only scalar primitives.

At runtime these methods will happily return lists (and possibly other shapes from __pydantic_extra__), so the current signatures are misleading and will fight static analysis.

Consider widening the types and delegating to Pydantic’s own dumping rather than hand-rolling the alias/extra logic, e.g.:

-from typing import Literal
+from typing import Any, Literal
@@
-    def to_dict(self) -> dict[str, str | int | float | bool]:
+    def to_dict(self) -> dict[str, Any]:
@@
-        result: dict[str, str | int | float | bool] = {}
-
-        # Map field names back to their aliases for proper serialization
-        for field_name, field_info in self.__class__.model_fields.items():
-            value = getattr(self, field_name)
-            if value is not None:
-                # Use the alias if available, otherwise use field name
-                key = field_info.alias or field_name
-                result[key] = value
-
-        # Add extra fields that were allowed through ConfigDict(extra='allow')
-        if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
-            for key, value in self.__pydantic_extra__.items():
-                if value is not None:
-                    result[key] = value
-
-        return result
+        # Rely on Pydantic's dumping to handle aliases, extras and None filtering.
+        return self.model_dump(by_alias=True, exclude_none=True)
@@
-    def get(
-        self, key: str, default: str | int | float | bool | None = None
-    ) -> str | int | float | bool | None:
+    def get(self, key: str, default: Any = None) -> Any:
@@
-    def __getitem__(self, key: str) -> str | int | float | bool:
+    def __getitem__(self, key: str) -> Any:

This keeps runtime behavior the same but makes the API honest about the possible value types and reduces maintenance overhead.

Also applies to: 84-133

src/opentelemetry_mcp/server.py (2)

95-115: Tighten config check and consider behavior when backend health is not “healthy”.

_get_backend() currently checks configuration via if not _config: and logs health problems but always proceeds:

if not _config:
    raise RuntimeError("Server configuration not set")
...
if _backend is None:
    ...
    health = await _backend.health_check()
    ...
    if health.status != "healthy":
        logger.warning("Backend is not healthy, but continuing...")

Two suggestions:

  1. Use an explicit is None check to avoid any surprises if ServerConfig ever defines __bool__:
-    if not _config:
+    if _config is None:
         raise RuntimeError("Server configuration not set")
  1. Decide whether you want fail-fast semantics when health is not "healthy" (e.g., raising a RuntimeError or making this behavior configurable via an env/flag). Right now the server will start and tools will later fail at call time, which can be harder to debug.

These aren’t correctness blockers but will make behavior more predictable in misconfigured environments.


588-626: Reconsider default HTTP bind address for security.

The HTTP transport currently defaults to host="0.0.0.0", exposing the MCP endpoint on all interfaces:

@click.option(
    "--host",
    type=str,
    default="0.0.0.0",
    help="Host to bind HTTP server to (only for --transport http, default: 0.0.0.0)",
)

For local development and typical MCP usage, a safer default is 127.0.0.1, reducing accidental exposure on shared machines or CI agents. You can still allow 0.0.0.0 explicitly via the CLI when needed.

-    default="0.0.0.0",
-    help="Host to bind HTTP server to (only for --transport http, default: 0.0.0.0)",
+    default="127.0.0.1",
+    help="Host to bind HTTP server to (only for --transport http, default: 127.0.0.1)",

This is operational rather than functional, but worth considering before the default becomes part of the public behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4e9d22 and f7b8071.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (70)
  • .dockerignore (1 hunks)
  • .env.example (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .python-version (1 hunks)
  • CLAUDE.md (1 hunks)
  • Dockerfile (1 hunks)
  • pyproject.toml (1 hunks)
  • src/opentelemetry_mcp/__init__.py (1 hunks)
  • src/opentelemetry_mcp/attributes.py (1 hunks)
  • src/opentelemetry_mcp/backends/__init__.py (1 hunks)
  • src/opentelemetry_mcp/backends/base.py (1 hunks)
  • src/opentelemetry_mcp/backends/filter_engine.py (1 hunks)
  • src/opentelemetry_mcp/backends/jaeger.py (1 hunks)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/config.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/__init__.py (1 hunks)
  • src/opentelemetry_mcp/tools/errors.py (1 hunks)
  • src/opentelemetry_mcp/tools/expensive_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_llm_tools.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_models.py (1 hunks)
  • src/opentelemetry_mcp/tools/model_stats.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
  • src/opentelemetry_mcp/tools/search_spans.py (1 hunks)
  • src/opentelemetry_mcp/tools/services.py (1 hunks)
  • src/opentelemetry_mcp/tools/slow_traces.py (1 hunks)
  • src/opentelemetry_mcp/tools/trace.py (1 hunks)
  • src/opentelemetry_mcp/tools/usage.py (1 hunks)
  • src/opentelemetry_mcp/utils.py (1 hunks)
  • start_locally.sh (1 hunks)
  • tests/__init__.py (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/integration/__init__.py (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerBackendHealth.test_health_check_healthy.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerGetTrace.test_get_trace_by_id.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerGetTrace.test_get_trace_invalid_id.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerLLMSpans.test_search_llm_spans.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerLLMSpans.test_search_traces_with_llm_model_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerListServices.test_list_services.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_basic.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_generic_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_limit.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_operation.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_basic.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_duration_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_error_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_generic_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_limit.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_operation.yaml (1 hunks)
  • tests/integration/cassettes/test_jaeger_integration/TestJaegerServiceOperations.test_get_service_operations.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoBackendHealth.test_health_check_healthy.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoGetTrace.test_get_trace_by_id.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoGetTrace.test_get_trace_invalid_id.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoLLMSpans.test_search_traces_with_llm_model_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoListServices.test_list_services.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchSpans.test_search_spans_with_generic_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchSpans.test_search_spans_with_operation.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_generic_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_operation.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoServiceOperations.test_get_service_operations.yaml (1 hunks)
  • tests/integration/cassettes/test_tempo_integration/TestTempoTraceQL.test_search_with_multiple_filters.yaml (1 hunks)
  • tests/integration/conftest.py (1 hunks)
  • tests/integration/test_jaeger_integration.py (1 hunks)
  • tests/integration/test_tempo_integration.py (1 hunks)
  • tests/integration/test_traceloop_integration.py (1 hunks)
  • tests/test_models.py (1 hunks)
  • tests/test_traceloop.py (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml
  • tests/integration/cassettes/test_tempo_integration/TestTempoLLMSpans.test_search_traces_with_llm_model_filter.yaml
  • tests/integration/__init__.py
  • tests/integration/conftest.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Re-record VCR cassettes when API interactions change

Applied to files:

  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml
  • tests/integration/__init__.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to **/cassettes/**/*.{yaml,yml,json} : Never commit secrets or PII in VCR cassettes; scrub sensitive data

Applied to files:

  • tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.

Applied to files:

  • tests/__init__.py
  • pyproject.toml
  • src/opentelemetry_mcp/config.py
  • src/opentelemetry_mcp/__init__.py
  • src/opentelemetry_mcp/server.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording

Applied to files:

  • tests/integration/__init__.py
  • tests/integration/conftest.py
🧬 Code graph analysis (19)
tests/conftest.py (3)
src/opentelemetry_mcp/attributes.py (1)
  • SpanAttributes (8-133)
src/opentelemetry_mcp/config.py (1)
  • BackendConfig (16-65)
src/opentelemetry_mcp/models.py (2)
  • SpanData (226-253)
  • TraceData (425-454)
src/opentelemetry_mcp/backends/__init__.py (1)
src/opentelemetry_mcp/backends/base.py (1)
  • BaseBackend (12-160)
src/opentelemetry_mcp/tools/expensive_traces.py (2)
src/opentelemetry_mcp/models.py (4)
  • LLMSpanAttributes (256-382)
  • TraceQuery (457-556)
  • from_span (283-382)
  • from_span (666-682)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/tools/search_spans.py (4)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_spans (80-92)
src/opentelemetry_mcp/models.py (7)
  • Filter (45-80)
  • SpanQuery (590-647)
  • SpanSummary (650-682)
  • gen_ai_system (246-248)
  • has_error (251-253)
  • from_span (283-382)
  • from_span (666-682)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/backends/tempo.py (1)
  • search_spans (141-260)
src/opentelemetry_mcp/tools/model_stats.py (2)
src/opentelemetry_mcp/models.py (5)
  • LLMSpanAttributes (256-382)
  • TraceQuery (457-556)
  • from_span (283-382)
  • from_span (666-682)
  • has_error (251-253)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/attributes.py (1)
src/opentelemetry_mcp/models.py (1)
  • gen_ai_system (246-248)
src/opentelemetry_mcp/tools/trace.py (5)
src/opentelemetry_mcp/backends/base.py (1)
  • get_trace (95-107)
src/opentelemetry_mcp/models.py (6)
  • LLMSpanAttributes (256-382)
  • is_llm_span (241-243)
  • from_span (283-382)
  • from_span (666-682)
  • llm_spans (437-439)
  • total_llm_tokens (447-454)
src/opentelemetry_mcp/backends/jaeger.py (1)
  • get_trace (202-228)
src/opentelemetry_mcp/backends/tempo.py (1)
  • get_trace (262-271)
src/opentelemetry_mcp/server.py (1)
  • get_trace (200-217)
tests/integration/test_traceloop_integration.py (3)
src/opentelemetry_mcp/models.py (7)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanQuery (590-647)
  • TraceQuery (457-556)
  • has_error (251-253)
  • gen_ai_system (246-248)
tests/integration/conftest.py (1)
  • traceloop_backend (213-225)
src/opentelemetry_mcp/backends/base.py (6)
  • health_check (137-146)
  • list_services (110-119)
  • get_service_operations (122-134)
  • search_traces (65-77)
  • get_trace (95-107)
  • search_spans (80-92)
tests/integration/test_jaeger_integration.py (2)
src/opentelemetry_mcp/backends/jaeger.py (6)
  • health_check (268-294)
  • list_services (230-246)
  • get_service_operations (248-266)
  • search_traces (42-83)
  • get_trace (202-228)
  • search_spans (123-200)
src/opentelemetry_mcp/models.py (6)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanQuery (590-647)
  • TraceQuery (457-556)
  • has_error (251-253)
src/opentelemetry_mcp/backends/tempo.py (4)
src/opentelemetry_mcp/attributes.py (4)
  • HealthCheckResponse (155-167)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • get (84-115)
src/opentelemetry_mcp/backends/base.py (10)
  • BaseBackend (12-160)
  • _create_headers (45-51)
  • get_supported_operators (54-62)
  • search_traces (65-77)
  • client (29-42)
  • search_spans (80-92)
  • get_trace (95-107)
  • list_services (110-119)
  • get_service_operations (122-134)
  • health_check (137-146)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-284)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (10)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (226-253)
  • SpanQuery (590-647)
  • TraceData (425-454)
  • TraceQuery (457-556)
  • get_all_filters (497-514)
  • get_all_filters (630-647)
  • has_error (251-253)
src/opentelemetry_mcp/backends/base.py (2)
src/opentelemetry_mcp/attributes.py (1)
  • HealthCheckResponse (155-167)
src/opentelemetry_mcp/models.py (5)
  • FilterOperator (12-34)
  • SpanData (226-253)
  • SpanQuery (590-647)
  • TraceData (425-454)
  • TraceQuery (457-556)
src/opentelemetry_mcp/tools/slow_traces.py (6)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-160)
  • search_traces (65-77)
  • get_trace (95-107)
src/opentelemetry_mcp/models.py (4)
  • LLMSpanAttributes (256-382)
  • TraceQuery (457-556)
  • from_span (283-382)
  • from_span (666-682)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/backends/jaeger.py (2)
  • search_traces (42-83)
  • get_trace (202-228)
src/opentelemetry_mcp/backends/tempo.py (2)
  • search_traces (58-139)
  • get_trace (262-271)
src/opentelemetry_mcp/backends/traceloop.py (2)
  • search_traces (85-174)
  • get_trace (299-332)
src/opentelemetry_mcp/backends/filter_engine.py (1)
src/opentelemetry_mcp/models.py (8)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (226-253)
  • TraceData (425-454)
  • llm_spans (437-439)
  • total_llm_tokens (447-454)
  • has_errors (442-444)
tests/test_models.py (3)
src/opentelemetry_mcp/attributes.py (1)
  • SpanAttributes (8-133)
src/opentelemetry_mcp/models.py (6)
  • LLMSpanAttributes (256-382)
  • SpanData (226-253)
  • TraceData (425-454)
  • from_span (283-382)
  • from_span (666-682)
  • llm_spans (437-439)
tests/conftest.py (1)
  • sample_trace_data (36-68)
src/opentelemetry_mcp/tools/usage.py (8)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (8)
  • LLMSpanAttributes (256-382)
  • TraceQuery (457-556)
  • UsageMetrics (385-422)
  • gen_ai_system (246-248)
  • llm_spans (437-439)
  • from_span (283-382)
  • from_span (666-682)
  • add_span (399-422)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/server.py (2)
  • get_llm_usage (221-261)
  • search_traces (119-196)
src/opentelemetry_mcp/backends/jaeger.py (1)
  • search_traces (42-83)
src/opentelemetry_mcp/backends/tempo.py (1)
  • search_traces (58-139)
src/opentelemetry_mcp/backends/traceloop.py (1)
  • search_traces (85-174)
src/opentelemetry_mcp/tools/search.py (1)
  • search_traces (13-117)
src/opentelemetry_mcp/tools/errors.py (8)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • TraceQuery (457-556)
  • TraceSummary (559-587)
  • has_error (251-253)
  • from_trace (574-587)
  • is_llm_span (241-243)
  • gen_ai_system (246-248)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/server.py (2)
  • find_errors (281-312)
  • search_traces (119-196)
src/opentelemetry_mcp/backends/jaeger.py (1)
  • search_traces (42-83)
src/opentelemetry_mcp/backends/tempo.py (1)
  • search_traces (58-139)
src/opentelemetry_mcp/backends/traceloop.py (1)
  • search_traces (85-174)
src/opentelemetry_mcp/tools/search.py (1)
  • search_traces (13-117)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
src/opentelemetry_mcp/tools/services.py (4)
src/opentelemetry_mcp/backends/base.py (3)
  • BaseBackend (12-160)
  • list_services (110-119)
  • get_service_operations (122-134)
src/opentelemetry_mcp/backends/jaeger.py (2)
  • list_services (230-246)
  • get_service_operations (248-266)
src/opentelemetry_mcp/backends/tempo.py (2)
  • list_services (273-312)
  • get_service_operations (314-344)
src/opentelemetry_mcp/server.py (1)
  • list_services (265-277)
src/opentelemetry_mcp/backends/traceloop.py (3)
src/opentelemetry_mcp/attributes.py (3)
  • HealthCheckResponse (155-167)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-284)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (5)
  • Filter (45-80)
  • FilterOperator (12-34)
  • FilterType (37-42)
  • SpanData (226-253)
  • TraceData (425-454)
🪛 GitHub Actions: CI
pyproject.toml

[warning] 1-1: The tool.uv.dev-dependencies field (used in pyproject.toml) is deprecated and will be removed in a future release; use dependency-groups.dev instead

tests/integration/test_traceloop_integration.py

[error] 37-37: Health check failed: traceloop_backend.health_check() returned 'unhealthy' instead of 'healthy'.

tests/integration/test_tempo_integration.py

[error] 395-395: Argument "values" to "Filter" has incompatible type "list[str]"; expected "list[str | int | float | bool] | None" [arg-type].

tests/integration/conftest.py

[error] 140-140: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].


[error] 144-144: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].


[error] 169-169: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].


[error] 173-173: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].


[error] 209-209: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].


[error] 213-213: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].

🪛 LanguageTool
CLAUDE.md

[style] ~111-~111: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...y_mcp/server.py) - FastMCP application, CLI interface, tool handlers - [config.py](openteleme...

(ACRONYM_TAUTOLOGY)


[style] ~452-~452: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ueries - Querying all services would be very expensive (especially for spans) - Users can easi...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
CLAUDE.md

466-466: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment thread .env.example
Comment thread .github/workflows/ci.yml
Comment on lines +48 to +60
build-test:
needs: test
runs-on: ubuntu-latest
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Clarify matrix configuration for platform-specific runners.

The build-test job specifies runs-on: ubuntu-latest at the job level (line 50), but the matrix includes different runs-on values for each platform (lines 56, 59). The matrix's runs-on field won't override the job-level setting in this configuration.

To properly use different runners per platform, you need to reference the matrix value in the job's runs-on:

 build-test:
     needs: test
-    runs-on: ubuntu-latest
+    runs-on: ${{ matrix.runs-on }}
     strategy:
         matrix:
             platform: [amd64, arm64]
             include:
                 - platform: amd64
                   runs-on: ubuntu-latest
                   docker-platform: linux/amd64
                 - platform: arm64
                   runs-on: Linux-ARM64
                   docker-platform: linux/arm64

Also note that "Linux-ARM64" runners are typically self-hosted and may not be available in all GitHub organizations.

Verify that the Linux-ARM64 runner is available in your GitHub organization, or consider using GitHub's hosted ARM runners if available in your plan.


I'll verify the review comment's claims about GitHub Actions matrix configuration and available runners.




🌐 Web query:

GitHub Actions matrix runs-on override job level configuration

💡 Result:

Short answer: GitHub Actions doesn't automatically override a job's runs-on from the matrix — you must explicitly use a matrix expression for runs-on (e.g. runs-on: ${{ matrix.os }}) or put runner labels into matrix include values and reference them. For self‑hosted runners some runners (e.g. CodeBuild-hosted) accept label-style overrides inside runs-on entries. [1][2][3]

How to do it:

  • Standard: define an os (or runner-label) in your matrix and set runs-on: ${{ matrix.os }}. This makes each matrix leg run on that runner. [2]
  • Use include/exclude to add or remove specific matrix combinations (and extra per-leg variables you can reference in runs-on). [2][3]
  • Self-hosted/third-party runners: some providers support label overrides inside the runs-on array (provider-specific syntax). Example: CodeBuild-hosted runner supports label overrides like image:${{ matrix.os }}. Check your runner provider docs. [4]

Example minimal snippet:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }} <-- required to override per-matrix

Sources: GitHub Docs (matrix usage) [2], community Q&A explaining runs-on behavior [1], matrix include/exclude examples [3], and provider-specific label overrides (CodeBuild example) [4].


🌐 Web query:

GitHub Actions available runners Linux ARM64

💡 Result:

Available GitHub-hosted Linux ARM64 runner labels (current GitHub docs):

  • ubuntu-24.04-arm. [1]
  • ubuntu-22.04-arm. [1]

Notes:

  • Arm64 standard runners are available for public repositories (and GitHub’s larger/private options exist — see docs for pricing/availability). [2][3]

Sources: [1] GitHub-hosted runners reference; [2] GitHub Changelog (arm64 runners GA); [3] actions/runner-images repo.


Matrix configuration prevents platform-specific runners from being used.

The job-level runs-on: ubuntu-latest (line 50) is not overridden by the matrix include values. The matrix entries for runs-on will be ignored, causing both amd64 and arm64 builds to run on ubuntu-latest.

To use different runners per platform, explicitly reference the matrix value:

 build-test:
     needs: test
-    runs-on: ubuntu-latest
+    runs-on: ${{ matrix.runs-on }}
     strategy:
         matrix:
             platform: [amd64, arm64]
             include:
                 - platform: amd64
                   runs-on: ubuntu-latest
                   docker-platform: linux/amd64
                 - platform: arm64
                   runs-on: Linux-ARM64
                   docker-platform: linux/arm64

Note: "Linux-ARM64" is not a standard GitHub-hosted runner. Verify it exists as a self-hosted runner in your organization, or consider using GitHub's hosted ARM64 runners (ubuntu-24.04-arm or ubuntu-22.04-arm) if available in your plan.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
build-test:
needs: test
runs-on: ubuntu-latest
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
build-test:
needs: test
runs-on: ${{ matrix.runs-on }}
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64
🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 48 to 60: the job-level runs-on:
ubuntu-latest is overriding matrix include values so matrix-specific runners are
ignored; replace the static runs-on with the matrix value (e.g. set runs-on: ${{
matrix.runs-on }}), remove or adjust the top-level runs-on, and ensure the
matrix includes use valid runner labels (verify "Linux-ARM64" exists as a
self-hosted runner or use a GitHub-hosted ARM label like
ubuntu-24.04-arm/ubuntu-22.04-arm); keep docker-platform in the matrix and use
that value in the workflow steps when starting containers.

Comment thread pyproject.toml
Comment on lines +621 to +696
def _convert_spans_to_trace(self, trace_id: str, spans_data: list[dict[str, Any]]) -> TraceData | None:
"""Convert Traceloop spans array to TraceData.

Args:
trace_id: Trace identifier
spans_data: List of span data from API

Returns:
Complete TraceData with all spans
"""
spans = []

for span_data in spans_data:
# Extract and transform span attributes from llm.* to gen_ai.* format
raw_attrs = span_data.get("span_attributes", {})
transformed_attrs = self._transform_llm_attributes_to_gen_ai(raw_attrs)

# Create strongly-typed SpanAttributes
span_attributes = SpanAttributes(**transformed_attrs)

# Transform events if present
events_data = span_data.get("events", [])
events = [
SpanEvent(
name=event.get("name", ""),
timestamp=event.get("timestamp", 0),
attributes=event.get("attributes", {}),
)
for event in events_data
]

span = SpanData(
trace_id=span_data["trace_id"],
span_id=span_data["span_id"],
parent_span_id=span_data.get("parent_span_id"),
operation_name=span_data["span_name"],
service_name=raw_attrs.get("service.name", ""),
start_time=datetime.fromtimestamp(span_data["timestamp"] / 1000), # Convert ms to seconds
duration_ms=float(span_data["duration"]),
status=self._status_code_to_status(span_data.get("status_code", "UNSET")),
attributes=span_attributes,
events=events,
)

spans.append(span)

# Find root span
root_spans = [s for s in spans if not s.parent_span_id]
root_span = root_spans[0] if root_spans else spans[0]

# Calculate trace duration
if spans:
start_times = [s.start_time for s in spans]
trace_start = min(start_times)
# Find the maximum end time
end_times = [datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) for s in spans]
trace_end = max(end_times)
trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
else:
trace_start = root_span.start_time
trace_duration_ms = root_span.duration_ms

# Determine status
trace_status = "OK"
if any(span.has_error for span in spans):
trace_status = "ERROR"

return TraceData(
trace_id=trace_id,
spans=spans,
start_time=trace_start,
duration_ms=trace_duration_ms,
service_name=root_span.service_name,
root_operation=root_span.operation_name,
status=self._status_code_to_status(trace_status),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard _convert_spans_to_trace against empty parsed span list

If spans_data is non-empty but every span fails to parse (e.g., schema drift), spans ends up empty. The code then does spans[0] when computing root_span, which raises IndexError instead of failing gracefully.

Add an early check after the loop to bail out when no spans were parsed:

     def _convert_spans_to_trace(self, trace_id: str, spans_data: list[dict[str, Any]]) -> TraceData | None:
@@
-        spans = []
-
-        for span_data in spans_data:
+        spans: list[SpanData] = []
+
+        for span_data in spans_data:
@@
-            spans.append(span)
-
-        # Find root span
-        root_spans = [s for s in spans if not s.parent_span_id]
-        root_span = root_spans[0] if root_spans else spans[0]
+            spans.append(span)
+
+        if not spans:
+            logger.warning(f"No valid spans in trace {trace_id}")
+            return None
+
+        # Find root span
+        root_spans = [s for s in spans if not s.parent_span_id]
+        root_span = root_spans[0] if root_spans else spans[0]
@@
-        # Calculate trace duration
-        if spans:
-            start_times = [s.start_time for s in spans]
-            trace_start = min(start_times)
-            # Find the maximum end time
-            end_times = [datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) for s in spans]
-            trace_end = max(end_times)
-            trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
-        else:
-            trace_start = root_span.start_time
-            trace_duration_ms = root_span.duration_ms
+        # Calculate trace duration
+        start_times = [s.start_time for s in spans]
+        trace_start = min(start_times)
+        end_times = [
+            datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
+            for s in spans
+        ]
+        trace_end = max(end_times)
+        trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000

This keeps get_trace’s None-on-failure behavior consistent with the other backends.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 621-696, the
conversion loop can produce an empty spans list if all incoming spans fail
parsing, but the code later indexes spans[0] causing IndexError; add an early
guard immediately after the for-loop that checks if not spans and return None
(preserving get_trace’s None-on-failure behavior) so the function bails out
gracefully when no spans were parsed and avoids later references to spans[0].

Comment thread src/opentelemetry_mcp/models.py
Comment thread src/opentelemetry_mcp/tools/search_spans.py
Comment thread tests/integration/conftest.py
Comment thread tests/integration/conftest.py
Comment thread tests/integration/test_tempo_integration.py
Comment thread tests/integration/test_traceloop_integration.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/opentelemetry_mcp/models.py (1)

84-156: Use is not None for max_duration_ms to avoid silently dropping a valid 0 filter.

In _convert_legacy_params_to_filters, max_duration_ms is checked via truthiness:

if max_duration_ms:
    ...

Given the field is declared with ge=0 on the query models, 0 is a valid value but will be treated as “no filter” and skipped here, unlike min_duration_ms which explicitly checks is not None. To keep semantics predictable, mirror the min_duration_ms check:

-    if max_duration_ms:
+    if max_duration_ms is not None:
         all_filters.append(
             Filter(
                 field="duration",
                 operator=FilterOperator.LTE,
                 value=max_duration_ms,
                 value_type=FilterType.NUMBER,
             )
         )

This keeps legacy param conversion consistent and ensures max_duration_ms=0 is handled as an actual upper-bound filter.

src/opentelemetry_mcp/backends/traceloop.py (1)

633-715: Guard against empty spans list to prevent IndexError.

If all spans fail to parse in the loop (lines 647-681), the spans list remains empty, but line 685 attempts spans[0] when no root span exists, causing an IndexError. The if spans: check at line 688 comes too late to prevent this.

Add an early guard after the loop:

         for span_data in spans_data:
             # Extract and transform span attributes from llm.* to gen_ai.* format
             raw_attrs = span_data.get("span_attributes", {})
             transformed_attrs = self._transform_llm_attributes_to_gen_ai(raw_attrs)
 
             # Create strongly-typed SpanAttributes
             span_attributes = SpanAttributes(**transformed_attrs)
 
             # Transform events if present
             events_data = span_data.get("events", [])
             events = [
                 SpanEvent(
                     name=event.get("name", ""),
                     timestamp=event.get("timestamp", 0),
                     attributes=event.get("attributes", {}),
                 )
                 for event in events_data
             ]
 
             span = SpanData(
                 trace_id=span_data["trace_id"],
                 span_id=span_data["span_id"],
                 parent_span_id=span_data.get("parent_span_id"),
                 operation_name=span_data["span_name"],
                 service_name=raw_attrs.get("service.name", ""),
                 start_time=datetime.fromtimestamp(
                     span_data["timestamp"] / 1000
                 ),  # Convert ms to seconds
                 duration_ms=float(span_data["duration"]),
                 status=self._status_code_to_status(span_data.get("status_code", "UNSET")),
                 attributes=span_attributes,
                 events=events,
             )
 
             spans.append(span)
 
+        if not spans:
+            logger.warning(f"No valid spans parsed for trace {trace_id}")
+            return None
+
         # Find root span
         root_spans = [s for s in spans if not s.parent_span_id]
         root_span = root_spans[0] if root_spans else spans[0]
 
         # Calculate trace duration
-        if spans:
-            start_times = [s.start_time for s in spans]
-            trace_start = min(start_times)
-            # Find the maximum end time
-            end_times = [
-                datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
-                for s in spans
-            ]
-            trace_end = max(end_times)
-            trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
-        else:
-            trace_start = root_span.start_time
-            trace_duration_ms = root_span.duration_ms
+        start_times = [s.start_time for s in spans]
+        trace_start = min(start_times)
+        # Find the maximum end time
+        end_times = [
+            datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
+            for s in spans
+        ]
+        trace_end = max(end_times)
+        trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
tests/integration/test_traceloop_integration.py (1)

33-40: Make health check test resilient to unhealthy responses.

The test unconditionally asserts health.status == "healthy", which makes CI brittle when the backend (or VCR cassette) reports an unhealthy state. Other tests in this file use pytest.skip() for unavailable data; apply the same pattern here.

Add a skip before assertions:

     async def test_health_check_healthy(self, traceloop_backend: TraceloopBackend) -> None:
         """Test health check returns healthy status when backend is accessible."""
         health = await traceloop_backend.health_check()
+        if health.status != "healthy":
+            pytest.skip(f"Backend reported unhealthy status: {health.error or health.status}")
+
         assert health.status == "healthy"
         assert health.backend == "traceloop"
         assert health.url is not None
🧹 Nitpick comments (10)
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml (1)

1-394: Very large cassette; consider trimming or normalizing if it becomes a maintenance burden.

This fixture captures rich, realistic LLM prompts/completions, which is great for exercising parsing and filtering. The tradeoff is cassette size and noisy diffs when payloads change. If test execution or PR noise becomes an issue, you might:

  • Trim prompt/completion content to shorter strings before recording.
  • Or normalize dynamic fields (timestamps, itineraries) via VCR matchers/custom serializers.

Not urgent, just something to keep in mind as the suite grows.

src/opentelemetry_mcp/models.py (1)

483-496: Consider aligning has_filters / backend-param checks with the explicit is not None semantics for duration fields.

TraceQuery.has_filters, SpanQuery.has_filters, and TraceQuery.to_backend_params all use truthiness checks for min_duration_ms / max_duration_ms, while the models declare these fields with ge=0 and _convert_legacy_params_to_filters already uses is not None for min_duration_ms.

This means 0 is effectively treated as “no filter” in these helpers, which is inconsistent with the type constraints and with the fix you’re applying in _convert_legacy_params_to_filters. If you care about strict semantics, consider:

  • Checking is not None in has_filters and to_backend_params for the duration fields, or
  • Documenting that 0 is treated as “unset” for those helpers.

Not blocking, but tightening this would make behavior more predictable.

Also applies to: 534-539, 616-629

src/opentelemetry_mcp/backends/tempo.py (1)

374-383: _build_traceql_from_filters doesn’t use its query argument.

The docstring mentions using query (for time range), but the implementation only walks filters and ignores query. Given time range is handled via start/end request params elsewhere, you might:

  • Drop the unused query parameter from _build_traceql_from_filters and its call sites, or
  • Actually incorporate information from query if you intend to use it here.

Not functionally wrong, but cleaning this up will make the API clearer.

tests/integration/test_jaeger_integration.py (7)

1-10: Module setup and VCR usage look good (minor redundancy only)

The module-level pytestmark = [pytest.mark.integration, pytest.mark.vcr] plus per-test @pytest.mark.vcr ensures everything is recorded/replayed correctly. The per-test VCR marks are slightly redundant given the module-level mark, but that’s purely stylistic and not worth changing unless you want to simplify.


66-87: Consider asserting that at least one trace is returned in the “basic” search test

test_search_traces_basic currently only asserts that traces is a list and then iterates it; if no traces are returned, the field assertions in the loop never run, so the test still passes. If the cassettes are guaranteed to include at least one trace, adding assert traces (or assert len(traces) > 0) would turn this into a stronger regression check.


135-165: Strengthen duration/error filter tests by asserting non-empty results

Both test_search_traces_with_duration_filter and test_search_traces_with_error_filter only assert conditions inside a for trace in traces loop. If filtering ever yielded zero traces, these tests would pass without actually validating filtering. Consider adding assert traces (or a minimum length) before the loop so the tests fail if the filters stop matching anything under the recorded fixture.


191-221: Narrow the expected exception type for invalid trace IDs

In test_get_trace_invalid_id, pytest.raises((ValueError, Exception)) effectively just asserts that some Exception is raised, since Exception already covers ValueError. If the backend has (or can have) a stable error type for this case (e.g., ValueError or a specific HTTPX error), it would be better to assert that type explicitly and optionally match part of the message.


227-247: Basic span search test is solid; consider asserting at least one span

test_search_spans_basic thoroughly checks required fields on each returned span, which is great. Similar to the basic trace test, adding a simple assert spans (assuming the cassette guarantees results) would prevent the test from passing when the backend unexpectedly returns an empty list.


317-338: LLM span search test is correct but could be made stricter

test_search_llm_spans nicely exercises the EXISTS operator on gen_ai.system and asserts that returned spans are LLM spans with gen_ai_system set. As with other filter tests, you might optionally assert that spans is non-empty (if your cassette guarantees LLM spans) so the test doesn’t silently succeed when no LLM spans are found.


340-361: LLM model filter test is flexible but may pass vacuously for some models

In test_search_traces_with_llm_model_filter, iterating over several model names and checking matches only inside the for trace in traces / if llm_spans blocks is pragmatic. However, if none of the models are present in the fixture, the test will pass without asserting anything. If you want stronger guarantees, consider:

  • Restricting to a model you know exists in the Jaeger data, and
  • Asserting that at least one trace and one LLM span are returned for that model before checking the gen_ai_request_model.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7b8071 and 340e238.

📒 Files selected for processing (15)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopFilters.test_search_with_comparison_operators.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchSpans.test_search_spans_with_limit.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_basic.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_duration_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_error_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_generic_filter.yaml (1 hunks)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_limit.yaml (1 hunks)
  • tests/integration/conftest.py (1 hunks)
  • tests/integration/test_jaeger_integration.py (1 hunks)
  • tests/integration/test_traceloop_integration.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_limit.yaml
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml
  • tests/integration/conftest.py
  • tests/integration/test_traceloop_integration.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Re-record VCR cassettes when API interactions change

Applied to files:

  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording

Applied to files:

  • tests/integration/conftest.py
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk

Applied to files:

  • tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml
🧬 Code graph analysis (6)
tests/integration/test_jaeger_integration.py (5)
src/opentelemetry_mcp/backends/jaeger.py (6)
  • health_check (268-294)
  • list_services (230-246)
  • get_service_operations (248-266)
  • search_traces (42-83)
  • get_trace (202-228)
  • search_spans (123-200)
src/opentelemetry_mcp/models.py (9)
  • Filter (46-81)
  • FilterOperator (13-35)
  • FilterType (38-43)
  • SpanQuery (591-648)
  • TraceQuery (458-557)
  • has_error (252-254)
  • is_llm_span (242-244)
  • gen_ai_system (247-249)
  • llm_spans (438-440)
src/opentelemetry_mcp/backends/tempo.py (6)
  • health_check (347-372)
  • list_services (274-313)
  • get_service_operations (315-345)
  • search_traces (58-139)
  • get_trace (263-272)
  • search_spans (141-261)
src/opentelemetry_mcp/tools/trace.py (1)
  • get_trace (10-80)
src/opentelemetry_mcp/tools/search_spans.py (1)
  • search_spans (13-116)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • to_dict (60-82)
  • get (84-115)
tests/integration/conftest.py (1)
src/opentelemetry_mcp/config.py (1)
  • BackendConfig (16-65)
src/opentelemetry_mcp/backends/traceloop.py (3)
src/opentelemetry_mcp/attributes.py (4)
  • HealthCheckResponse (155-167)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • get (84-115)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-284)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (9)
  • Filter (46-81)
  • FilterOperator (13-35)
  • FilterType (38-43)
  • SpanData (227-254)
  • TraceData (426-455)
  • TraceQuery (458-557)
  • get_all_filters (498-515)
  • get_all_filters (631-648)
  • has_error (252-254)
tests/integration/test_traceloop_integration.py (8)
src/opentelemetry_mcp/backends/traceloop.py (7)
  • TraceloopBackend (23-723)
  • health_check (420-449)
  • list_services (340-385)
  • get_service_operations (387-418)
  • search_traces (85-176)
  • search_spans (178-303)
  • get_trace (305-338)
src/opentelemetry_mcp/models.py (7)
  • Filter (46-81)
  • FilterOperator (13-35)
  • FilterType (38-43)
  • SpanQuery (591-648)
  • TraceQuery (458-557)
  • has_error (252-254)
  • gen_ai_system (247-249)
tests/integration/conftest.py (1)
  • traceloop_backend (265-279)
src/opentelemetry_mcp/backends/tempo.py (6)
  • health_check (347-372)
  • list_services (274-313)
  • get_service_operations (315-345)
  • search_traces (58-139)
  • search_spans (141-261)
  • get_trace (263-272)
src/opentelemetry_mcp/backends/base.py (6)
  • health_check (137-146)
  • list_services (110-119)
  • get_service_operations (122-134)
  • search_traces (65-77)
  • search_spans (80-92)
  • get_trace (95-107)
src/opentelemetry_mcp/tools/search.py (1)
  • search_traces (13-117)
src/opentelemetry_mcp/tools/search_spans.py (1)
  • search_spans (13-116)
src/opentelemetry_mcp/tools/trace.py (1)
  • get_trace (10-80)
src/opentelemetry_mcp/backends/tempo.py (3)
src/opentelemetry_mcp/attributes.py (4)
  • HealthCheckResponse (155-167)
  • SpanAttributes (8-133)
  • SpanEvent (136-152)
  • get (84-115)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-284)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (10)
  • Filter (46-81)
  • FilterOperator (13-35)
  • FilterType (38-43)
  • SpanData (227-254)
  • TraceData (426-455)
  • TraceQuery (458-557)
  • get_all_filters (498-515)
  • get_all_filters (631-648)
  • has_error (252-254)
  • gen_ai_system (247-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (16)
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_basic.yaml (1)

1-40: Cassette structure and data look appropriate for integration testing.

Request/response payloads are well-formed, scoped to localhost, and don’t appear to leak credentials or sensitive runtime configuration. This is fine to keep as a stable fixture.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_duration_filter.yaml (1)

1-40: Duration-filter cassette is consistent and safe to commit.

The recorded request correctly exercises a numeric duration filter, and the response data is similar to the basic case without exposing secrets. Looks good as an integration fixture.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_error_filter.yaml (1)

1-40: Cassette structure and contents look consistent and safe

The recorded request/response for this Traceloop trace search is well‑formed, uses only localhost endpoints, and does not expose any obvious secrets or PII; it should be fine to keep as‑is for the error‑filter integration scenario.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_generic_filter.yaml (1)

1-40: Generic duration-filter cassette looks correct

The cassette cleanly captures a numeric duration filter request and a representative root_spans response against localhost, with no sensitive data embedded. It’s suitable for exercising the generic filter path in the Traceloop backend.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml (1)

1-32: Invalid trace ID cassette accurately reflects null-result behavior

This cassette clearly records the “invalid trace ID” scenario as a 200 with {"spans":null}, which is useful to exercise the Traceloop backend’s handling of non-existent traces without leaking any sensitive data.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopFilters.test_search_with_comparison_operators.yaml (1)

1-40: Comparison-operator cassette correctly captures a >= duration filter

The recorded interaction for greater_than_or_equal duration filtering is structurally sound and free of sensitive data, and should give solid coverage for Traceloop’s comparison-operator handling.

tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchSpans.test_search_spans_with_limit.yaml (1)

1-754: Rich span-limit cassette provides good LLM/tool coverage without leaking secrets

This cassette gives a detailed spans payload (including LLM/tool attributes and external HTTP calls) for a limit‑10 query, which is great for exercising span search and parsing logic. The data is synthetic/public enough to avoid obvious secret/PII exposure.

tests/integration/test_jaeger_integration.py (2)

167-186: Generic trace filter test aligns well with Filter/FilterOperator API

test_search_traces_with_generic_filter correctly constructs a Filter with field="duration_ms", operator=FilterOperator.GT, and value_type=FilterType.NUMBER, then verifies trace.duration_ms > 50. This is a good end‑to‑end check of the generic filter pipeline for traces.


293-311: Generic span filter test mirrors the trace filter test correctly

test_search_spans_with_generic_filter uses a Filter on duration_ms with FilterOperator.GT and FilterType.NUMBER, matching the models, and validates span.duration_ms > 10. This is a good integration check of the span-level filter path.

tests/integration/conftest.py (1)

1-279: LGTM! Past type issues have been resolved.

The integration test fixtures are well-structured with proper VCR configuration, timestamp filtering, and header sanitization. The previous mypy errors regarding BackendConfig.url type mismatches and async generator return types have been correctly addressed using TypeAdapter(HttpUrl).validate_python() and AsyncGenerator type annotations.

src/opentelemetry_mcp/backends/traceloop.py (3)

31-56: LGTM!

The initialization properly validates the required API key and sets sensible defaults for project_id and environments.


85-176: Well-implemented hybrid filtering strategy.

The method correctly separates API-supported filters from those requiring client-side filtering, with clear logging and sensible defaults for time ranges. The conversion and fallback logic is sound.


451-536: LGTM! Robust filter conversion with proper fallback.

The conversion logic correctly maps between internal filter representations and Traceloop's API format, including the important gen_ai.systemllm.vendor special case. Unsupported filters properly return None to trigger client-side filtering.

tests/integration/test_traceloop_integration.py (3)

48-61: LGTM!

The test gracefully handles empty service lists while properly validating structure, uniqueness, and sorting when data is available.


93-109: LGTM!

Basic trace search test with appropriate structure validation for trace data.


257-273: LGTM!

Basic span search test with appropriate structure validation.

Comment on lines +374 to +397
def _build_traceql_from_filters(self, filters: list[Filter], query: TraceQuery) -> str:
"""Build TraceQL query from Filter objects.

Args:
filters: List of Filter conditions
query: Original query (for time range)

Returns:
TraceQL query string
"""
conditions = []

for filter_obj in filters:
condition = self._filter_to_traceql(filter_obj)
if condition:
conditions.append(condition)

# If no filters, match all traces
if not conditions:
return "{}"

# Combine with AND logic
return "{ " + " && ".join(conditions) + " }"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

has_error=False / status NOT_EQUALS filters are currently ignored for Tempo (native but not translated).

In search_traces / search_spans, filters are split by operator+field into native_filters vs client_filters based on get_supported_operators and trace_level_fields. For has_error=False, _convert_legacy_params_to_filters produces:

  • field="status", operator=FilterOperator.NOT_EQUALS, value="ERROR".

Because NOT_EQUALS is in get_supported_operators and status is in trace_level_fields, this filter is treated as native and removed from client_filters. However, in _filter_to_traceql the status branch only handles EQUALS and returns None for other operators, so the filter is silently dropped and never applied on either side.

This means Tempo queries with has_error=False won’t actually exclude error traces/spans, which is a behavior bug rather than just an optimization issue.

You can fix this by handling NOT_EQUALS for status explicitly:

-        elif field == "status":
-            # Special handling for status
-            if operator == FilterOperator.EQUALS:
-                if value == "ERROR":
-                    return "status = error"
-                elif value == "OK":
-                    return "status = ok"
-            return None
+        elif field == "status":
+            # Special handling for status (OK/ERROR) with both equality and inequality
+            if operator in (FilterOperator.EQUALS, FilterOperator.NOT_EQUALS):
+                op = "=" if operator == FilterOperator.EQUALS else "!="
+                if value == "ERROR":
+                    return f"status {op} error"
+                if value == "OK":
+                    return f"status {op} ok"
+            return None

This preserves the “native” classification while ensuring both has_error=True and has_error=False are respected when querying Tempo.

Also applies to: 398-427, 438-485

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/tempo.py around lines 374-397, the status
branch in _filter_to_traceql only emits TraceQL for EQUALS and returns None for
other operators, causing native NOT_EQUALS (used for has_error=False -> status
!= "ERROR") to be dropped; update _filter_to_traceql to explicitly handle
FilterOperator.NOT_EQUALS for the status field and emit the corresponding
TraceQL condition (e.g., status != "ERROR"), and apply the same NOT_EQUALS
handling to the parallel status-handling blocks at lines 398-427 and 438-485 so
has_error=True and has_error=False are both respected by Tempo queries.

Comment thread tests/integration/test_traceloop_integration.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/integration/conftest.py (1)

94-118: Robust VCR header scrubbing; consider case‑insensitive matching for response headers.

The VCR config and filter_sensitive_headers do a good job of keeping auth data and cookies out of cassettes, matching the project’s VCR/PII guidelines. As a small hardening, you could normalize header keys (e.g., iterate over response_headers.items() with .lower()) so both "set-cookie" and "Set-Cookie" variants are filtered consistently.

src/opentelemetry_mcp/tools/errors.py (1)

13-121: find_errors implementation is consistent and well‑defended.

Timestamp parsing, TraceQuery validation, and the error‑span formatting all mirror patterns in other tools, and wrapping TraceQuery in a ValidationError handler keeps the contract of always returning JSON. If you want to be extra defensive, you could skip traces where error_spans ends up empty, but with has_error=True in the query this is mostly a safeguard.

src/opentelemetry_mcp/backends/traceloop.py (1)

31-56: Backend logic looks robust; just align a couple of docstrings/comments.

The Traceloop backend’s hybrid filtering, attribute transformation, and the new empty‑spans guard in _convert_spans_to_trace all look good and address the earlier failure mode. Two tiny nits you might clean up later:

  • __init__ docstring says the default environments are ["production"], but the code uses ["prd"].
  • The health_check docstring mentions using a “projects endpoint”, while the implementation probes "/v2/warehouse/spans".

These don’t affect behavior but updating them would avoid confusion for future readers.

Also applies to: 420-449, 633-720

tests/integration/test_traceloop_integration.py (1)

202-217: Move test to the appropriate class and fix terminology.

This test method is in TestTraceloopSearchTraces but tests span search functionality (uses SpanQuery, calls search_spans). It should be moved to the TestTraceloopSearchSpans class. Additionally, the docstring and comments incorrectly reference "traces" when they should say "spans".

Move this test to TestTraceloopSearchSpans class and update the terminology:

-class TestTraceloopSearchTraces:
-    """Test Traceloop trace search functionality."""
-
-    # ... other tests ...
-
-    @pytest.mark.vcr
-    async def test_search_spans_with_llm_model_filter(
-        self, traceloop_backend: TraceloopBackend
-    ) -> None:
-        """Test trace search filtered by LLM model."""
-        query = SpanQuery(gen_ai_request_model="gpt-4o-mini", limit=200)
-
-        spans = await traceloop_backend.search_spans(query)
-
-        # Skip if no matching traces found (test data may not have gpt-4 traces)
-        if len(spans) == 0:
-            pytest.skip("No traces with gpt-4o-mini model found in test data")
-
-        # Verify all returned traces have the correct model
-        for span in spans:
-            # Check if any span has the requested model
-            assert span.attributes.gen_ai_request_model == "gpt-4o-mini"

And add it to the TestTraceloopSearchSpans class with corrected terminology:

class TestTraceloopSearchSpans:
    """Test Traceloop span search functionality."""
    
    # ... other tests ...
    
    @pytest.mark.vcr
    async def test_search_spans_with_llm_model_filter(
        self, traceloop_backend: TraceloopBackend
    ) -> None:
        """Test span search filtered by LLM model."""
        query = SpanQuery(gen_ai_request_model="gpt-4o-mini", limit=200)

        spans = await traceloop_backend.search_spans(query)

        # Skip if no matching spans found (test data may not have gpt-4 spans)
        if len(spans) == 0:
            pytest.skip("No spans with gpt-4o-mini model found in test data")

        # Verify all returned spans have the correct model
        for span in spans:
            assert span.attributes.gen_ai_request_model == "gpt-4o-mini"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 340e238 and a19d316.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/errors.py (1 hunks)
  • src/opentelemetry_mcp/tools/search_spans.py (1 hunks)
  • tests/integration/conftest.py (1 hunks)
  • tests/integration/test_traceloop_integration.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/opentelemetry_mcp/tools/search_spans.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.

Applied to files:

  • src/opentelemetry_mcp/server.py
  • pyproject.toml
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording

Applied to files:

  • tests/integration/conftest.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

  • tests/integration/conftest.py
🧬 Code graph analysis (4)
src/opentelemetry_mcp/tools/errors.py (3)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • TraceQuery (458-557)
  • TraceSummary (560-588)
  • has_error (252-254)
  • from_trace (575-588)
  • is_llm_span (242-244)
  • gen_ai_system (247-249)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/server.py (8)
src/opentelemetry_mcp/tools/list_llm_tools.py (1)
  • list_llm_tools (24-135)
src/opentelemetry_mcp/tools/list_models.py (1)
  • list_models (11-106)
src/opentelemetry_mcp/tools/search_spans.py (1)
  • search_spans (13-119)
src/opentelemetry_mcp/tools/search.py (1)
  • search_traces (13-117)
src/opentelemetry_mcp/tools/trace.py (1)
  • get_trace (10-80)
src/opentelemetry_mcp/tools/usage.py (1)
  • get_llm_usage (11-114)
src/opentelemetry_mcp/tools/services.py (1)
  • list_services (8-25)
src/opentelemetry_mcp/tools/slow_traces.py (1)
  • get_slow_traces (10-114)
tests/integration/test_traceloop_integration.py (3)
src/opentelemetry_mcp/models.py (7)
  • Filter (46-81)
  • FilterOperator (13-35)
  • FilterType (38-43)
  • SpanQuery (591-648)
  • TraceQuery (458-557)
  • has_error (252-254)
  • gen_ai_system (247-249)
tests/integration/conftest.py (1)
  • traceloop_backend (269-283)
src/opentelemetry_mcp/backends/base.py (1)
  • health_check (137-146)
tests/integration/conftest.py (1)
src/opentelemetry_mcp/config.py (1)
  • BackendConfig (16-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (3)
pyproject.toml (1)

30-39: Dev dependency configuration looks correct and fixes the previous uv deprecation.

Using [dependency-groups].dev instead of tool.uv.dev-dependencies aligns with current uv semantics, and the dev stack (pytest, mypy, ruff, etc.) is coherent with the rest of the project.

src/opentelemetry_mcp/server.py (1)

37-363: Centralized tool error handling and MCP wrappers look solid.

_handle_tool_error fixes the earlier invalid‑JSON pattern and gives you consistent logging, and each @mcp.tool wrapper simply fetches the backend and delegates to the corresponding tool module, keeping the server thin and testable. The lazy _get_backend initialization with a soft‑fail health check also matches the “single FastMCP server per process” constraint the team has documented.

tests/integration/test_traceloop_integration.py (1)

1-26: Clear documentation and appropriate imports.

The module docstring provides excellent guidance on using VCR cassettes for recording and replay, making it easy for developers to work with these integration tests.

@linear
Copy link
Copy Markdown

linear Bot commented Nov 16, 2025

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/opentelemetry_mcp/backends/traceloop.py (1)

634-707: Trace conversion now guards empty spans; simplify duration logic and status aggregation

The new guard on spans (Lines [684-687]) addresses the earlier IndexError risk when all spans fail to parse—nice improvement. However:

  • The subsequent if spans: ... else: duration block (Lines [693-706]) still contains an else that is now unreachable due to the early if not spans: return None. This can be simplified to just the non-empty branch.
  • trace_status is derived as "OK" if no span has has_error, and then passed through _status_code_to_status (Lines [709-721]). That means a trace whose spans are all "UNSET" is classified as "OK" rather than "UNSET".

If you want to preserve “UNSET” semantics, you could compute status along these lines:

-        trace_status = "OK"
-        if any(span.has_error for span in spans):
-            trace_status = "ERROR"
+        if any(span.has_error for span in spans):
+            trace_status = "ERROR"
+        elif all(span.status == "UNSET" for span in spans):
+            trace_status = "UNSET"
+        else:
+            trace_status = "OK"

and drop the unreachable else branch in the duration calculation.

Also applies to: 709-721

src/opentelemetry_mcp/models.py (1)

85-157: Handle min_duration_ms/max_duration_ms using is not None instead of truthiness

min_duration_ms/max_duration_ms are declared with ge=0, so 0 is a valid value. Currently:

  • _convert_legacy_params_to_filters checks if max_duration_ms: (Line [149]).
  • TraceQuery.has_filters and SpanQuery.has_filters treat min_duration_ms/max_duration_ms as truthy (Lines [490-492], [623-625]).
  • TraceQuery.to_backend_params only sends minDuration/maxDuration when the values are truthy (Lines [536-540]).

This means a caller passing 0 will have that filter silently ignored and omitted from backend params and filter lists, which is inconsistent with the field’s type constraints and with the min_duration_ms is not None check already used in _convert_legacy_params_to_filters.

You can fix this by switching to explicit is not None checks in all these places, for example:

-    if max_duration_ms:
+    if max_duration_ms is not None:
         all_filters.append(
             Filter(
                 field=Fields.DURATION,
                 operator=FilterOperator.LTE,
                 value=max_duration_ms,
                 value_type=FilterType.NUMBER,
             )
         )

and similarly:

-            or self.min_duration_ms
-            or self.max_duration_ms
+            or self.min_duration_ms is not None
+            or self.max_duration_ms is not None

and:

-        if self.min_duration_ms:
+        if self.min_duration_ms is not None:
             params["minDuration"] = f"{self.min_duration_ms}ms"

-        if self.max_duration_ms:
+        if self.max_duration_ms is not None:
             params["maxDuration"] = f"{self.max_duration_ms}ms"

The same pattern should be applied in SpanQuery.has_filters.

Also applies to: 463-498, 519-541, 593-631

🧹 Nitpick comments (5)
src/opentelemetry_mcp/attributes.py (2)

64-87: Broaden value types in SpanAttributes.to_dict/get to match actual data

SpanAttributes includes list-valued fields (for example gen_ai_response_finish_reasons, llm_response_finish_reasons) and allows arbitrary extras via extra="allow", but:

  • to_dict is annotated as returning dict[str, str | int | float | bool] while it will happily emit lists and possibly other types from __pydantic_extra__.
  • get/__getitem__ have similarly narrow return annotations, yet can return any type present in attributes/extras.

This is more of a typing mismatch than a runtime bug, but it will confuse type-checkers and future readers. Consider widening the value type (e.g. to Any or to include Sequence/Mapping) so the annotations reflect actual behavior.

Also applies to: 88-120


140-157: Consider allowing array-valued event attributes for better OTEL compatibility

SpanEvent.attributes is currently typed as dict[str, str | int | float | bool]. Many OpenTelemetry exporters and GenAI semantics use array-valued attributes on events (lists of strings/ints). With the strict union, any such event from a backend will raise a pydantic validation error and cause the whole span to be dropped in callers that wrap construction in a broad try/except.

If you expect to handle generic OTEL payloads, consider extending the union to include list types (for example list[str], list[int]) or loosening it to Any, and then normalizing where needed.

src/opentelemetry_mcp/constants.py (1)

31-57: Centralizing semantic conventions here looks good; consider pushing more raw strings into GenAIAttributes

This module nicely aligns with the pattern of keeping AI/LLM semantic conventions centralized (using semconv_ai.SpanAttributes plus your own GenAIAttributes/GenAI/LegacyLLM/Traceloop/Fields wrappers), which matches the team’s guidelines around OpenTelemetry GenAI semantics and avoiding scattered magic strings. Based on learnings

For consistency, you could also move the cache token and event-name strings currently hardcoded in GenAI (for example gen_ai.usage.cache_creation_input_tokens, gen_ai.content.prompt) into GenAIAttributes and reference them there, so all GenAI attribute keys live in a single namespace.

Also applies to: 74-112, 172-196

src/opentelemetry_mcp/models.py (1)

284-329: Be explicit about 0-token semantics in LLMSpanAttributes and summaries

LLMSpanAttributes.from_span and downstream usage currently treat 0 tokens as falsy:

  • prompt_tokens / completion_tokens / total_tokens are built with or chains and default to 0 (Lines [292-306]).
  • total_tokens is recomputed when if not total_tokens: (Line [314]), so an explicit 0 total is treated as “missing” and recomputed.
  • When constructing LLMSpanAttributes, fields are set as prompt_tokens=prompt_tokens if prompt_tokens else None and similarly for completion_tokens/total_tokens (Lines [380-382]), so an explicit 0 becomes None.

This is not a hard bug—aggregation via UsageMetrics.add_span still uses or 0 and will count None and 0 equivalently—but it does conflate “no data” with “explicitly zero tokens”, and can change how totals are derived when both aggregated and explicit totals are present.

If you care about distinguishing “0 tokens” from “no metric reported”, consider:

  • Using is not None checks instead of truthiness when deciding whether to recompute totals.
  • Passing prompt_tokens/completion_tokens/total_tokens through as-is (including 0) rather than converting them to None.

This would make the model’s behavior more predictable and match the ge=0 semantics used elsewhere.

Also applies to: 380-382, 388-425, 450-457, 669-685

src/opentelemetry_mcp/backends/traceloop.py (1)

388-417: Clarify or use service_name parameter in get_service_operations

get_service_operations accepts service_name but does not use it in the Traceloop request; it returns workflow names across all services for the last 7 days.

If the Traceloop API doesn’t support service-level filtering, consider:

  • Adjusting the docstring to say it returns operations across all services, or
  • Adding a comment explaining why service_name is intentionally unused, or
  • If the API supports it, adding a filter (for example via query params or filters) so the result is scoped to the given service.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a19d316 and 9c72563.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • src/opentelemetry_mcp/attributes.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/constants.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/tools/list_llm_tools.py (1 hunks)
  • src/opentelemetry_mcp/tools/search_spans.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/opentelemetry_mcp/tools/search_spans.py
  • src/opentelemetry_mcp/tools/list_llm_tools.py
  • pyproject.toml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings

Applied to files:

  • src/opentelemetry_mcp/constants.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification

Applied to files:

  • src/opentelemetry_mcp/constants.py
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/ai-semantic-conventions/src/SemanticAttributes.ts : Define all AI/LLM span attribute constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts

Applied to files:

  • src/opentelemetry_mcp/constants.py
🧬 Code graph analysis (3)
src/opentelemetry_mcp/constants.py (1)
src/opentelemetry_mcp/attributes.py (1)
  • SpanAttributes (12-137)
src/opentelemetry_mcp/models.py (2)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (12-137)
  • SpanEvent (140-157)
  • to_dict (64-86)
  • get (88-119)
src/opentelemetry_mcp/constants.py (3)
  • Fields (186-196)
  • GenAI (75-111)
  • Service (172-176)
src/opentelemetry_mcp/attributes.py (2)
src/opentelemetry_mcp/constants.py (4)
  • GenAI (75-111)
  • LegacyLLM (114-140)
  • Service (172-176)
  • Status (179-182)
src/opentelemetry_mcp/models.py (1)
  • gen_ai_system (248-250)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (2)
src/opentelemetry_mcp/backends/traceloop.py (2)

195-222: Hybrid filtering flow looks solid

The split into API-supported vs client-side filters, plus the fallback when _filter_to_traceloop returns None, provides a clear and predictable hybrid filtering path. Logging of client_filters also makes debugging easier when filters are silently downgraded to client-side.

Also applies to: 259-305


421-450: Health check implementation is straightforward and sufficient

Using a simple GET against /v2/warehouse/spans as a proxy health check and mapping the result into HealthCheckResponse with project_id and error fields gives a clear healthy/unhealthy signal without over-complicating things. Logging on failure is also appropriate.

Comment on lines +452 to +537
def _filter_to_traceloop(self, filter_obj: Filter) -> dict[str, Any] | None:
"""Convert a Filter object to Traceloop API filter format.

Args:
filter_obj: Filter to convert

Returns:
Traceloop filter dict or None if not convertible
"""
field = filter_obj.field
operator = filter_obj.operator
value = filter_obj.value

# Convert gen_ai.* to llm.* for Traceloop backend
# Traceloop uses legacy llm.* naming convention instead of gen_ai.*
if field.startswith("gen_ai."):
# Map gen_ai.system -> llm.vendor (special case)
if field == GenAI.SYSTEM:
field = LegacyLLM.VENDOR
else:
# General mapping: gen_ai.* -> llm.*
field = field.replace("gen_ai.", "llm.", 1)
logger.debug(
f"Converted filter field to Traceloop format: {filter_obj.field} -> {field}"
)

# Map field names to Traceloop API fields
if field == Service.NAME:
traceloop_field = Service.NAME
elif field == "name" or field == Fields.OPERATION_NAME:
traceloop_field = "span_name"
elif field == Fields.DURATION:
traceloop_field = Fields.DURATION
elif field == "duration_ms":
traceloop_field = Fields.DURATION
elif field == Fields.STATUS:
# Status filtering is not supported by Traceloop API - filter client-side
logger.debug("Status filter not supported by Traceloop, will apply client-side")
return None
elif field.startswith("traceloop."):
# Traceloop-specific attributes (like traceloop.span.kind) are supported directly
# without span_attributes. prefix
traceloop_field = field
elif field.startswith("llm."):
# LLM attributes in Traceloop don't need span_attributes. prefix
traceloop_field = field
else:
# Assume it's a span attribute - prefix with span_attributes.
traceloop_field = f"span_attributes.{field}"

# Map operators to Traceloop format
operator_map = {
FilterOperator.EQUALS: "equals",
FilterOperator.NOT_EQUALS: "not_equals",
FilterOperator.GT: "greater_than",
FilterOperator.LT: "less_than",
FilterOperator.GTE: "greater_than_or_equal",
FilterOperator.LTE: "less_than_or_equal",
}

traceloop_operator = operator_map.get(operator)
if not traceloop_operator:
logger.warning(
f"Operator {operator} not supported by Traceloop, will filter client-side"
)
return None

# Map value_type to Traceloop format
value_type_map = {
FilterType.STRING: "string",
FilterType.NUMBER: "number",
FilterType.BOOLEAN: "boolean",
}
traceloop_value_type = value_type_map.get(filter_obj.value_type, "string")

if traceloop_value_type == "boolean":
serialized_value = "true" if value else "false"
else:
serialized_value = str(value)

return {
"field": traceloop_field,
"operator": traceloop_operator,
"value": serialized_value,
"value_type": traceloop_value_type,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Tighten boolean handling in _filter_to_traceloop to avoid surprises

The mapping logic generally looks good, but the boolean serialization has a couple of edge cases:

  • For FilterType.BOOLEAN, serialized_value = "true" if value else "false" relies on truthiness. If a caller ever passes "false" (string) with value_type=BOOLEAN, it will serialize as "true".
  • There’s no explicit validation that value is actually a bool when value_type is BOOLEAN.

Since Filter is public and could be constructed manually, consider:

-        if traceloop_value_type == "boolean":
-            serialized_value = "true" if value else "false"
+        if traceloop_value_type == "boolean":
+            if isinstance(value, bool):
+                serialized_value = "true" if value else "false"
+            elif isinstance(value, str):
+                v = value.strip().lower()
+                serialized_value = "true" if v in ("1", "true", "yes") else "false"
+            else:
+                raise ValueError(f"Boolean filter value must be bool or 'true'/'false', got {type(value)}")

This keeps the API robust against mis-typed filters and avoids silently wrong queries.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 452-537, tighten
boolean handling: when filter_obj.value_type is FilterType.BOOLEAN, validate and
normalize the value instead of using truthiness. Specifically, if value is a
bool serialize to "true"/"false"; if value is a string accept only
"true"/"false" (case-insensitive) and normalize; otherwise log a warning about
invalid boolean value and return None so the filter will be applied client-side
(avoid silently coercing arbitrary truthy values).

@doronkopit5 doronkopit5 merged commit e12167b into main Nov 17, 2025
4 of 5 checks passed
@doronkopit5 doronkopit5 deleted the dk/mcp-server branch November 17, 2025 10:13
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/opentelemetry_mcp/models.py (1)

85-157: Treat 0 ms durations as valid filters across helpers and query models.

min_duration_ms / max_duration_ms are declared with ge=0, so 0 is a valid value, but several places treat 0 as “no filter” due to truthiness checks:

  • Line 149: _convert_params_to_filters uses if max_duration_ms:.
  • Lines 487–492, 620–625: TraceQuery.has_filters / SpanQuery.has_filters use or self.min_duration_ms / or self.max_duration_ms.
  • Lines 536–541: TraceQuery.to_backend_params uses if self.min_duration_ms: / if self.max_duration_ms: before emitting minDuration / maxDuration.

This means max_duration_ms=0 is silently dropped in filter conversion and backend params, and both queries’ has_filters() ignore 0, which is inconsistent with the type constraints and with the already-correct min_duration_ms is not None check in _convert_params_to_filters.

Consider tightening all these to explicit is not None checks so 0 is treated as an actual bound:

@@ def _convert_params_to_filters(
-    if max_duration_ms:
+    if max_duration_ms is not None:
@@ class TraceQuery(BaseModel):
-            or self.min_duration_ms
-            or self.max_duration_ms
+            or self.min_duration_ms is not None
+            or self.max_duration_ms is not None
@@     def to_backend_params(self) -> dict[str, str | int]:
-        if self.min_duration_ms:
+        if self.min_duration_ms is not None:
             params["minDuration"] = f"{self.min_duration_ms}ms"
 
-        if self.max_duration_ms:
+        if self.max_duration_ms is not None:
             params["maxDuration"] = f"{self.max_duration_ms}ms"
@@ class SpanQuery(BaseModel):
-            or self.min_duration_ms
-            or self.max_duration_ms
+            or self.min_duration_ms is not None
+            or self.max_duration_ms is not None

This extends the earlier feedback about max_duration_ms to all call sites that currently use truthiness for these fields.

Also applies to: 485-498, 536-541, 596-603, 618-631, 639-650

src/opentelemetry_mcp/backends/tempo.py (1)

58-76: NOT_EQUALS status filters (e.g. has_error=False) are silently dropped for Tempo.

In both search_traces (Lines 70–77) and search_spans (Lines 173–181), filters are split into:

  • native_filters: f.operator in get_supported_operators() (and trace-level field for spans),
  • client_filters: everything else.

Because FilterOperator.NOT_EQUALS is in get_supported_operators() and "status" is a trace-level field, a filter produced from has_error=False:

  • field="status", operator=FilterOperator.NOT_EQUALS, value="ERROR"

is classified as native and removed from client_filters.

However, _filter_to_traceql (Lines 418–427, 438–443) only handles status for FilterOperator.EQUALS and returns None for all other operators, including NOT_EQUALS. When None is returned, the filter is simply skipped and never re-applied client-side, so has_error=False queries do not actually exclude error traces/spans.

You can fix this by handling NOT_EQUALS in the status branch and emitting the corresponding TraceQL, e.g.:

-        elif field == "status":
-            # Special handling for status
-            if operator == FilterOperator.EQUALS:
-                if value == "ERROR":
-                    return "status = error"
-                elif value == "OK":
-                    return "status = ok"
-            return None
+        elif field == "status":
+            # Special handling for status (OK/ERROR) with both equality and inequality
+            if operator in (FilterOperator.EQUALS, FilterOperator.NOT_EQUALS):
+                op = "=" if operator == FilterOperator.EQUALS else "!="
+                if value == "ERROR":
+                    return f"status {op} error"
+                if value == "OK":
+                    return f"status {op} ok"
+            return None

This keeps the native classification while ensuring both has_error=True and has_error=False are respected for Tempo queries.

Also applies to: 141-181, 398-485

src/opentelemetry_mcp/backends/traceloop.py (1)

452-537: Tighten boolean value handling in _filter_to_traceloop to avoid truthiness surprises.

When value_type is FilterType.BOOLEAN, the current serialization:

if traceloop_value_type == "boolean":
    serialized_value = "true" if value else "false"

uses Python truthiness. If a caller constructs Filter(value="false", value_type=FilterType.BOOLEAN), this will serialize as "true", and non-bool, non-string values are silently coerced.

To make this more robust for public Filter usage, consider validating and normalizing booleans explicitly:

-        if traceloop_value_type == "boolean":
-            serialized_value = "true" if value else "false"
-        else:
-            serialized_value = str(value)
+        if traceloop_value_type == "boolean":
+            if isinstance(value, bool):
+                serialized_value = "true" if value else "false"
+            elif isinstance(value, str):
+                v = value.strip().lower()
+                serialized_value = "true" if v in ("1", "true", "yes") else "false"
+            else:
+                raise ValueError(
+                    f"Boolean filter value must be bool or string 'true'/'false', got {type(value)}"
+                )
+        else:
+            serialized_value = str(value)

This keeps the API predictable and avoids silently wrong Traceloop queries when boolean filters are built manually.

🧹 Nitpick comments (2)
src/opentelemetry_mcp/backends/traceloop.py (1)

634-707: Remove unreachable else branch after empty-spans guard in _convert_spans_to_trace.

After building spans, you guard against the empty case:

if not spans:
    logger.error(f"Failed to convert any spans for trace {trace_id}")
    return None

Given this early return, the subsequent if spans: ... else: ... block (Lines 694–706) will always take the if spans path; the else is unreachable.

You can simplify the duration calculation and avoid dead code:

-        # Calculate trace duration
-        if spans:
-            start_times = [s.start_time for s in spans]
-            trace_start = min(start_times)
-            # Find the maximum end time
-            end_times = [
-                datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
-                for s in spans
-            ]
-            trace_end = max(end_times)
-            trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
-        else:
-            trace_start = root_span.start_time
-            trace_duration_ms = root_span.duration_ms
+        # Calculate trace duration
+        start_times = [s.start_time for s in spans]
+        trace_start = min(start_times)
+        end_times = [
+            datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
+            for s in spans
+        ]
+        trace_end = max(end_times)
+        trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000

This makes the control flow clearer without changing behavior.

src/opentelemetry_mcp/server.py (1)

53-56: Consider explicit backend shutdown to close HTTP resources

_backend is created once and then kept for the process lifetime, but it’s never closed, even though BaseBackend.close() is available and will clean up underlying HTTP resources. This is fine for short‑lived CLIs, but for long‑running MCP servers it’s cleaner to hook shutdown into FastMCP’s lifecycle (if supported) or process teardown and call await _backend.close() once.

This is not a correctness bug today, just a lifecycle improvement to avoid relying on implicit GC/exit for connection cleanup.

Also applies to: 101-133

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c72563 and 30eecb6.

📒 Files selected for processing (7)
  • src/opentelemetry_mcp/attributes.py (1 hunks)
  • src/opentelemetry_mcp/backends/jaeger.py (1 hunks)
  • src/opentelemetry_mcp/backends/tempo.py (1 hunks)
  • src/opentelemetry_mcp/backends/traceloop.py (1 hunks)
  • src/opentelemetry_mcp/models.py (1 hunks)
  • src/opentelemetry_mcp/server.py (1 hunks)
  • src/opentelemetry_mcp/tools/search.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.

Applied to files:

  • src/opentelemetry_mcp/server.py
🧬 Code graph analysis (5)
src/opentelemetry_mcp/server.py (9)
src/opentelemetry_mcp/backends/base.py (6)
  • BaseBackend (12-160)
  • search_spans (80-92)
  • health_check (137-146)
  • search_traces (65-77)
  • get_trace (95-107)
  • list_services (110-119)
src/opentelemetry_mcp/tools/list_models.py (1)
  • list_models (11-106)
src/opentelemetry_mcp/tools/search_spans.py (1)
  • search_spans (13-119)
src/opentelemetry_mcp/tools/search.py (1)
  • search_traces (13-117)
src/opentelemetry_mcp/tools/trace.py (1)
  • get_trace (10-80)
src/opentelemetry_mcp/tools/usage.py (1)
  • get_llm_usage (11-114)
src/opentelemetry_mcp/tools/services.py (1)
  • list_services (8-25)
src/opentelemetry_mcp/tools/model_stats.py (1)
  • get_model_stats (43-168)
src/opentelemetry_mcp/tools/slow_traces.py (1)
  • get_slow_traces (10-114)
src/opentelemetry_mcp/attributes.py (2)
src/opentelemetry_mcp/constants.py (1)
  • GenAI (75-111)
src/opentelemetry_mcp/models.py (1)
  • gen_ai_system (248-250)
src/opentelemetry_mcp/models.py (2)
src/opentelemetry_mcp/attributes.py (4)
  • SpanAttributes (12-137)
  • SpanEvent (140-157)
  • to_dict (64-86)
  • get (88-119)
src/opentelemetry_mcp/constants.py (3)
  • Fields (186-196)
  • GenAI (75-111)
  • Service (172-176)
src/opentelemetry_mcp/tools/search.py (3)
src/opentelemetry_mcp/backends/base.py (2)
  • BaseBackend (12-160)
  • search_traces (65-77)
src/opentelemetry_mcp/models.py (6)
  • Filter (47-82)
  • TraceQuery (460-559)
  • TraceSummary (562-590)
  • gen_ai_system (248-250)
  • has_error (253-255)
  • from_trace (577-590)
src/opentelemetry_mcp/utils.py (1)
  • parse_iso_timestamp (6-23)
src/opentelemetry_mcp/backends/jaeger.py (3)
src/opentelemetry_mcp/attributes.py (4)
  • HealthCheckResponse (160-172)
  • SpanAttributes (12-137)
  • SpanEvent (140-157)
  • get (88-119)
src/opentelemetry_mcp/backends/filter_engine.py (4)
  • FilterEngine (13-284)
  • apply_filters (18-18)
  • apply_filters (22-22)
  • apply_filters (25-55)
src/opentelemetry_mcp/models.py (10)
  • FilterOperator (14-36)
  • SpanData (228-255)
  • SpanQuery (593-650)
  • TraceData (428-457)
  • TraceQuery (460-559)
  • get_all_filters (500-517)
  • get_all_filters (633-650)
  • to_backend_params (519-559)
  • has_error (253-255)
  • gen_ai_system (248-250)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test (arm64)
🔇 Additional comments (2)
src/opentelemetry_mcp/tools/search.py (1)

13-115: search_traces wrapper looks solid and idiomatic.

Nice layering from raw MCP params → validated TraceQuery → backend call → TraceSummary JSON, with clear error handling for timestamp/filter/query/backend failures. I don’t see changes needed here.

src/opentelemetry_mcp/backends/jaeger.py (1)

248-267: ****

The review comment's assumption about the Jaeger operations response format is incorrect. VCR cassettes from the integration tests show the actual /api/services/{service}/operations endpoint returns {"data": [...]} where data contains bare strings, not objects:

{"data": ["/api/metrics/latencies", "/api/traces/{traceID}", "/api/services/{service}/operations", "GET", ...]}

The current implementation correctly handles this:

return [str(op) for op in data.get("data", [])]

Since data already contains strings, calling str() on each item returns it unchanged—the code works as intended. The proposed fix would introduce unnecessary complexity without addressing any actual bug, and the tests pass against recorded Jaeger responses confirming the current behavior is correct.

Likely an incorrect or invalid review comment.

Comment on lines +64 to +87
def to_dict(self) -> dict[str, str | int | float | bool]:
"""
Convert to dictionary representation with dotted keys.

Returns only non-None values with their original dotted notation (e.g., "gen_ai.system").
"""
result: dict[str, str | int | float | bool] = {}

# Map field names back to their aliases for proper serialization
for field_name, field_info in self.__class__.model_fields.items():
value = getattr(self, field_name)
if value is not None:
# Use the alias if available, otherwise use field name
key = field_info.alias or field_name
result[key] = value

# Add extra fields that were allowed through ConfigDict(extra='allow')
if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
for key, value in self.__pydantic_extra__.items():
if value is not None:
result[key] = value

return result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Broaden attribute value types and avoid relying on pydantic internals

The helper methods assume attribute values are only str | int | float | bool, but the model defines list fields (e.g., gen_ai_response_finish_reasons, llm_response_finish_reasons). At runtime to_dict, get, and __getitem__ can all return lists, so the current type hints are inaccurate and will confuse static checking.

You’re also accessing __pydantic_extra__ directly, which is an internal pydantic detail. In pydantic v2 the public surface is model_extra; relying on the dunder name may break with upgrades.

Consider something along these lines:

-from typing import Literal
+from typing import Literal, TypeAlias
@@
-class SpanAttributes(BaseModel):
+SpanAttributeValue: TypeAlias = str | int | float | bool | list[str]
+
+
+class SpanAttributes(BaseModel):
@@
-    def to_dict(self) -> dict[str, str | int | float | bool]:
+    def to_dict(self) -> dict[str, SpanAttributeValue]:
@@
-        result: dict[str, str | int | float | bool] = {}
+        result: dict[str, SpanAttributeValue] = {}
@@
-        # Add extra fields that were allowed through ConfigDict(extra='allow')
-        if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
-            for key, value in self.__pydantic_extra__.items():
-                if value is not None:
-                    result[key] = value
+        # Add extra fields that were allowed through ConfigDict(extra='allow')
+        extra = getattr(self, "model_extra", None)
+        if extra:
+            for key, value in extra.items():
+                if value is not None:
+                    result[key] = value  # type: ignore[assignment]
@@
-    def get(
-        self, key: str, default: str | int | float | bool | None = None
-    ) -> str | int | float | bool | None:
+    def get(
+        self, key: str, default: SpanAttributeValue | None = None
+    ) -> SpanAttributeValue | None:
@@
-        # Try extra fields (dotted notation)
-        if (
-            hasattr(self, "__pydantic_extra__")
-            and self.__pydantic_extra__
-            and key in self.__pydantic_extra__
-        ):
-            return self.__pydantic_extra__[key]  # type: ignore[no-any-return]
+        # Try extra fields (dotted notation)
+        extra = getattr(self, "model_extra", None)
+        if extra and key in extra:
+            return extra[key]  # type: ignore[no-any-return]
@@
-    def __getitem__(self, key: str) -> str | int | float | bool:
+    def __getitem__(self, key: str) -> SpanAttributeValue:

This keeps behavior the same while aligning the types with the actual data and using pydantic’s public API for extras. Please double‑check the exact attribute name (model_extra) against your pinned pydantic version and adjust if it differs.

Also applies to: 88-137


🌐 Web query:

In pydantic v2, what is the documented public attribute or property for accessing extra fields on a `BaseModel` instance, and is `model_extra` the recommended way over using `__pydantic_extra__` directly?

💡 Result:

  • The documented public property is BaseModel.model_extra (a read-only property returning the extra fields). [1]
  • Yes — use model_extra (the public API). pydantic_extra is the internal attribute that model_extra returns, so prefer model_extra instead of accessing pydantic_extra directly. [1][2]

Sources:
[1] Pydantic BaseModel API — model_extra property.
[2] Pydantic Models docs — explanation that extra fields are stored on the instance (in pydantic_extra).



I need to run scripts in the repository to verify the actual structure. Let me do that now:




---

🏁 Script executed:

```shell
#!/bin/bash
# Find the SpanAttributes model and inspect its field definitions
rg -n "finish_reasons|gen_ai_response_finish_reasons|llm_response_finish_reasons" src/opentelemetry_mcp/attributes.py

Length of output: 224


🏁 Script executed:

#!/bin/bash
# Inspect the attributes.py file, particularly around the specified lines and for list fields
head -150 src/opentelemetry_mcp/attributes.py | tail -100

Length of output: 3760


🏁 Script executed:

#!/bin/bash
# Search for all list-typed fields in SpanAttributes
grep -n "list\[" src/opentelemetry_mcp/attributes.py

Length of output: 168



---

🏁 Script executed:

```shell
#!/bin/bash
# Check the pydantic version requirement
grep -i pydantic pyproject.toml requirements.txt 2>/dev/null | head -5

Length of output: 38


Broaden attribute value types and use pydantic's public API for extra fields

The model defines list-typed fields (gen_ai_response_finish_reasons, llm_response_finish_reasons at lines 38, 52), but to_dict(), get(), and __getitem__() all have type hints claiming only str | int | float | bool. At runtime these methods can return lists, making the type signatures inaccurate for static checking.

Additionally, all three methods access __pydantic_extra__ directly, which is an internal pydantic detail. In pydantic v2.12.0+ the public API is model_extra.

Apply the following pattern to to_dict() (lines 64–87), get() (lines 88–137), and __getitem__():

-from typing import Literal
+from typing import Literal, TypeAlias
@@
-class SpanAttributes(BaseModel):
+SpanAttributeValue: TypeAlias = str | int | float | bool | list[str]
+
+
+class SpanAttributes(BaseModel):
@@
-    def to_dict(self) -> dict[str, str | int | float | bool]:
+    def to_dict(self) -> dict[str, SpanAttributeValue]:
@@
-        result: dict[str, str | int | float | bool] = {}
+        result: dict[str, SpanAttributeValue] = {}
@@
-        # Add extra fields that were allowed through ConfigDict(extra='allow')
-        if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
-            for key, value in self.__pydantic_extra__.items():
-                if value is not None:
-                    result[key] = value
+        # Add extra fields that were allowed through ConfigDict(extra='allow')
+        extra = getattr(self, "model_extra", None)
+        if extra:
+            for key, value in extra.items():
+                if value is not None:
+                    result[key] = value  # type: ignore[assignment]
@@
-    def get(
-        self, key: str, default: str | int | float | bool | None = None
-    ) -> str | int | float | bool | None:
+    def get(
+        self, key: str, default: SpanAttributeValue | None = None
+    ) -> SpanAttributeValue | None:
@@
-        # Try extra fields (dotted notation)
-        if (
-            hasattr(self, "__pydantic_extra__")
-            and self.__pydantic_extra__
-            and key in self.__pydantic_extra__
-        ):
-            return self.__pydantic_extra__[key]  # type: ignore[no-any-return]
+        # Try extra fields (dotted notation)
+        extra = getattr(self, "model_extra", None)
+        if extra and key in extra:
+            return extra[key]  # type: ignore[no-any-return]
@@
-    def __getitem__(self, key: str) -> str | int | float | bool:
+    def __getitem__(self, key: str) -> SpanAttributeValue:

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +32 to +57
def __init__(
self,
url: str,
api_key: str | None = None,
timeout: float = 30.0,
environments: list[str] | None = None,
):
"""Initialize Traceloop backend.

Args:
url: Traceloop API base URL (e.g., https://api.traceloop.com)
api_key: API key for authentication (required, contains project info)
timeout: Request timeout in seconds
environments: List of environments to query (default: ["production"])
"""
super().__init__(url, api_key, timeout)

if not self.api_key:
raise ValueError("Traceloop backend requires an API key")

# Use "default" as project_id - Traceloop resolves actual project from API key
self.project_id = "default"

# Store environments for all API requests
self.environments = environments if environments else ["prd"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align environments default in docstring with implementation.

The __init__ docstring says the default environments are ["production"], but the code actually uses ["prd"] when environments is None:

self.environments = environments if environments else ["prd"]

Either update the docstring to reflect ["prd"] or change the default value to ["production"] so behavior matches the documentation.

🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 32 to 57, the
docstring states the default environments is ["production"] but the code sets
self.environments to ["prd"] when environments is None; make them consistent by
either updating the docstring to indicate the default is ["prd"] or changing the
fallback in code to ["production"]. Pick one approach and apply it throughout
(docstring and code) so the documented default matches the actual runtime
default.

doronkopit5 added a commit that referenced this pull request Feb 16, 2026
Addresses Dependabot alert #2 (HIGH severity) for cryptography subgroup
attack vulnerability. Also upgrades fastmcp to latest to keep transitive
dependency chain current for diskcache alert #3 (no fix available yet).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants