feat: Initial commit of OpenTelemetry MCP Server#2
Conversation
WalkthroughAdds a new OpenTelemetry MCP Server project: packaging, CI, Dockerfile, configuration, data models, backend abstraction and three concrete backends (Jaeger, Tempo, Traceloop), a client-side FilterEngine, server tools (FastMCP), extensive unit/integration tests with VCR cassettes, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Server as FastMCP
participant Backend as BaseBackend
participant Remote as Tracing API
Client->>Server: invoke tool (e.g. search_traces)
Server->>Server: build TraceQuery / parse filters
Server->>Backend: await search_traces(query)
alt Backend supports native filters
Backend->>Remote: request with native filters
Remote-->>Backend: traces response
Backend->>Backend: apply remaining client-side filters (FilterEngine)
else Backend lacks native filters
Backend->>Remote: broader request (fewer filters)
Remote-->>Backend: traces response
Backend->>Backend: apply all filters (FilterEngine)
end
Backend-->>Server: List[TraceData]
Server->>Server: aggregate/format (summaries, LLM metrics)
Server-->>Client: JSON result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Areas needing extra attention:
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
Generated with ❤️ by ellipsis.dev |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (21)
Dockerfile (2)
56-60: Consider validation or documentation for empty environment variable defaults.
BACKEND_TYPEandBACKEND_URLdefault to empty strings, which will likely cause the server to fail at startup if these are required configuration values. Consider either:
- Setting sensible defaults if possible
- Documenting that these must be provided at runtime
- Adding validation at server startup to fail fast with a clear error message
63-64: Healthcheck doesn't verify server functionality.The current healthcheck only verifies that Python can execute, not that the MCP server is actually running or responding to requests. Consider implementing a proper HTTP health endpoint and updating the healthcheck to query it:
-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ - CMD python -c "import sys; sys.exit(0)" +HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ + CMD curl -f http://localhost:8000/health || exit 1Note: This assumes a
/healthendpoint exists or will be added. Alternatively, you could use a Python script that makes an HTTP request to verify the server is responsive.src/opentelemetry_mcp/tools/slow_traces.py (1)
10-113: Avoid redundantget_tracecalls and reuseTraceDatafromsearch_traces.
BaseBackend.search_tracesalready returns fullTraceDatawith spans, so callingbackend.get_trace(summary.trace_id)for every result adds up tolimit * 10extra network round-trips and duplicates token aggregation logic. You can work directly on the traces returned fromsearch_tracesand reuseTraceData.total_llm_tokensandlen(trace.llm_spans).A possible refactor:
- # Search traces - trace_summaries = await backend.search_traces(query) - - # Collect trace data with durations - slow_traces = [] - - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) + # Search traces (already returns full TraceData objects) + traces = await backend.search_traces(query) + + # Collect trace data with durations + slow_traces: list[dict[str, object]] = [] + + for trace in traces: # Only include traces that have LLM spans if not trace.llm_spans: continue - - total_tokens = 0 - models_used = set() - - for span in trace.llm_spans: - llm_attrs = LLMSpanAttributes.from_span(span) - if not llm_attrs: - continue - - # Calculate total tokens - if llm_attrs.total_tokens: - total_tokens += llm_attrs.total_tokens - - # Track models - model = llm_attrs.response_model or llm_attrs.request_model - if model: - models_used.add(model) + models_used: set[str] = set() + for span in trace.llm_spans: + llm_attrs = LLMSpanAttributes.from_span(span) + if not llm_attrs: + continue + model = llm_attrs.response_model or llm_attrs.request_model + if model: + models_used.add(model) slow_traces.append( { "trace_id": trace.trace_id, "service_name": trace.service_name, "operation_name": trace.root_operation, "start_time": trace.start_time.isoformat(), "duration_ms": round(trace.duration_ms, 2), - "models": sorted(list(models_used)), - "total_tokens": total_tokens, - "llm_span_count": len(trace.llm_spans), + "models": sorted(models_used), + "total_tokens": trace.total_llm_tokens, + "llm_span_count": len(trace.llm_spans), "status": trace.status, "has_errors": trace.has_errors, } )This keeps behavior the same while significantly reducing backend load and simplifying the aggregation.
Please run the existing slow-traces tests (and integration tests against Jaeger/Tempo/Traceloop) after this change to confirm the behavior and payload shape remain compatible.
tests/__init__.py (1)
1-1: Optional: Align capitalization with “OpenTelemetry”.The docstring uses
Opentelemetry, while other modules useOpenTelemetry. For consistency you might want:"""Tests for OpenTelemetry MCP Server."""Purely cosmetic; no functional impact.
tests/test_traceloop.py (1)
43-137: Consider testing the actual backend method instead of replicating logic.This test manually replicates the filter building logic rather than calling the actual backend method. This creates maintenance burden if the implementation changes.
Consider refactoring to test the actual backend method:
async def test_build_filters_for_search() -> None: """Test filter building for search_traces.""" backend = TraceloopBackend( url="https://api.traceloop.com/v2", api_key="test_key", ) query = TraceQuery( service_name="my-service", operation_name="my-operation", gen_ai_system="openai", gen_ai_request_model="gpt-4", min_duration_ms=1000, max_duration_ms=5000, has_error=True, tags={"custom.tag": "value"}, limit=50, ) # Mock the HTTP client to capture the request with patch.object(backend.client, 'post') as mock_post: mock_post.return_value = Mock(json=lambda: {"data": []}) await backend.search_traces(query) # Verify the filters in the actual API call call_args = mock_post.call_args filters = call_args[1]['json']['filters'] # Assert expected filters are present assert len(filters) == 8 # ... rest of assertionssrc/opentelemetry_mcp/tools/usage.py (1)
36-44: Consider validating time range orderingYou already parse timestamps and surface format errors; as an extra guard, consider returning a clear error when
start_dtandend_dtare both set butstart_dt > end_dt, instead of letting backends handle an inverted range.src/opentelemetry_mcp/tools/list_models.py (2)
41-61: Avoid redundantget_tracecalls; reuse traces fromsearch_tracesHere you treat
search_tracesresults as summaries and re-fetch each trace viaget_trace, butget_llm_usage(inusage.py) already consumestrace.llm_spansdirectly frombackend.search_traces(query). That suggests backends are returning trace objects with spans, making the extra per-trace round trips unnecessary and potentially expensive for largelimitvalues.You can simplify and speed this up by iterating directly over the traces returned by
search_traces, mirroringget_llm_usage:- # Build query to fetch traces + # Build query to fetch traces query = TraceQuery( start_time=start_dt, end_time=end_dt, service_name=service_name, gen_ai_system=gen_ai_system, limit=limit, ) - # Search traces - trace_summaries = await backend.search_traces(query) + # Search traces (full trace objects with spans) + traces = await backend.search_traces(query) @@ - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) - - for span in trace.llm_spans: + for trace in traces: + for span in trace.llm_spans:This keeps network usage bounded to what
search_tracesalready does and aligns the behavior withget_llm_usage.
32-40: Optional: validate time range consistencyAs with
get_llm_usage, you parse and validate timestamp formats but don’t guard againststart_dt > end_dt. Adding a simple check and returning a clear JSON error would make this tool’s behavior more predictable for callers supplying bad ranges.src/opentelemetry_mcp/tools/trace.py (1)
61-75: Avoid recomputingllm_spansmultiple times
trace.llm_spansis evaluated three times (truthiness check,len, and the set comprehension), each re-scanning spans. Caching once keeps behavior the same and avoids extra work on large traces.- # Add LLM-specific summary - if trace.llm_spans: - result["llm_summary"] = { - "llm_span_count": len(trace.llm_spans), - "total_tokens": trace.total_llm_tokens, - "models_used": list( - { - span.attributes.gen_ai_request_model - or span.attributes.gen_ai_response_model - for span in trace.llm_spans - if span.attributes.gen_ai_request_model - or span.attributes.gen_ai_response_model - } - ), - } + # Add LLM-specific summary + llm_spans = trace.llm_spans + if llm_spans: + result["llm_summary"] = { + "llm_span_count": len(llm_spans), + "total_tokens": trace.total_llm_tokens, + "models_used": list( + { + span.attributes.gen_ai_request_model + or span.attributes.gen_ai_response_model + for span in llm_spans + if span.attributes.gen_ai_request_model + or span.attributes.gen_ai_response_model + } + ), + }src/opentelemetry_mcp/tools/expensive_traces.py (1)
55-64: Avoid double-fetching traces inget_expensive_traces
search_tracesalready returns fullTraceDataobjects with spans per the BaseBackend contract; callingget_traceagain for each result adds unnecessary network calls and latency.- # Search traces - trace_summaries = await backend.search_traces(query) - - # Collect trace data with token counts - expensive_traces = [] - - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) + # Search traces (returns full TraceData objects) + traces = await backend.search_traces(query) + + # Collect trace data with token counts + expensive_traces = [] + + for trace in traces:src/opentelemetry_mcp/tools/model_stats.py (1)
79-106: Avoid redundantget_tracecalls; reuse full traces fromsearch_traces
search_tracesalready returns fullTraceDataobjects with spans (as exercised intests/integration/test_jaeger_integration.pywheretrace.spansis asserted), so callingget_traceagain for each summary introduces avoidable N+1 backend calls without adding data. You can simplify and improve performance by iterating the traces directly:- # Search traces - trace_summaries = await backend.search_traces(query) + # Search traces + traces = await backend.search_traces(query) @@ - for summary in trace_summaries: - # Get full trace to access LLM spans - trace = await backend.get_trace(summary.trace_id) + for trace in traces: for span in trace.llm_spans: ...This keeps the external behavior identical while removing extra network round-trips.
CLAUDE.md (1)
111-112: Minor doc polish: wording and fenced code languageTwo small tweaks to align with style/linting:
- Around Line 111: “FastMCP application, CLI interface, tool handlers” – the “I” in CLI already means interface; consider shortening to “FastMCP application, CLI, tool handlers”.
- Around Lines 452–469: the fenced block showing the
ValueErrorcurrently has no language; add one (e.g.,```textor```bash) to satisfy markdownlint and improve rendering.Purely cosmetic, so feel free to tackle later.
Also applies to: 452-469
tests/integration/test_jaeger_integration.py (1)
63-190: Strong Jaeger coverage; a couple of small cleanups possibleThese integration tests do a nice job exercising:
- Service and operation discovery,
- Limits, duration/error filters, and generic filter behavior,
- Span search (including LLM spans) and model filters.
Two optional nits you might consider:
The pattern:
services = await jaeger_backend.list_services() service_name = services[0]appears in several tests; a small helper/fixture returning a “default test service” would DRY this up.
In
test_search_traces_with_llm_model_filter,llm_spansis already[s for s in trace.spans if s.is_llm_span], so the extraif s.is_llm_spaninside the list comprehension that buildsmodel_namesis redundant.None of this affects correctness; tests look good as-is.
Also applies to: 228-367
src/opentelemetry_mcp/backends/base.py (1)
15-42: Backend abstraction and client lifecycle look good; tiny robustness tweak possibleThe
BaseBackendinterface is well-shaped for the Jaeger/Tempo/Traceloop backends, and the lazyAsyncClientplus async context manager give you nice connection pooling and an easyasync withstory.A small optional tweak in
closewould avoid callingaclose()on an already-closed client:async def close(self) -> None: """Close HTTP client connections.""" - if self._client: - await self._client.aclose() - self._client = None + if self._client and not self._client.is_closed: + await self._client.aclose() + self._client = NoneFunctionally it should behave the same today, but this is a bit more defensive if subclasses ever close the client themselves.
Also applies to: 136-160
src/opentelemetry_mcp/backends/filter_engine.py (2)
24-55: Simplify duplicated filtering loop inapply_filtersBoth branches of
apply_filtersrun the same loop; only type-checking differs. You can remove the duplication and rely on the overloads plus_matches_all_filtersto handle trace vs span items.Example refactor:
@staticmethod def apply_filters( items: list[TraceData] | list[SpanData], filters: list[Filter] ) -> list[TraceData] | list[SpanData]: @@ - filtered_items: list[TraceData] | list[SpanData] - if isinstance(items[0] if items else None, TraceData): - filtered_items = [] - for item in items: - if FilterEngine._matches_all_filters(item, filters): - filtered_items.append(item) # type: ignore[arg-type] - else: - filtered_items = [] - for item in items: - if FilterEngine._matches_all_filters(item, filters): - filtered_items.append(item) # type: ignore[arg-type] + filtered_items: list[TraceData] | list[SpanData] = [ + item for item in items if FilterEngine._matches_all_filters(item, filters) + ]
200-253: Tighten boolean coercion to avoid surprising matches
_compare_valuecurrently doesactual_bool = bool(actual)andexpected_bool = bool(expected). For string inputs,"false"and"0"both evaluate toTrue, which makes filters likevalue_type=BOOLEAN, value="false"behave unexpectedly.Consider normalizing booleans more explicitly (e.g., handling
True/Falseand common string/int representations) and treating unparseable values as non-matches instead of relying on barebool(...).src/opentelemetry_mcp/backends/tempo.py (1)
463-475: Escape values in TraceQLCONTAINS/INconditions to avoid malformed regex/queries
_filter_to_traceqlinterpolates raw values into TraceQL, e.g.:
CONTAINS→field =~ ".*{value}.*"INon strings → OR offield = "{v}"If
valueor entries invaluescontain quotes or regex/meta characters, this can produce invalid or surprising TraceQL.Consider escaping values before interpolation (e.g. via
re.escapeforCONTAINSand proper quoting/escaping for string literals) so that filter values are treated as data, not regex/query syntax.src/opentelemetry_mcp/backends/jaeger.py (1)
85-104: Clarify or remove unusednative_filtersparameter
_search_service_traces_with_filtersacceptsnative_filtersbut does not use it; all server-side filtering is driven byquery.to_backend_params(). This can be misleading given the hybrid-filtering comment above.Either wire
native_filtersinto the Jaeger request (if/when you add tag-based native filtering) or drop the parameter for now to keep the API surface honest.src/opentelemetry_mcp/attributes.py (1)
60-83: Broaden attribute value types and simplify serialization usingmodel_dump.
SpanAttributescan legitimately hold list-valued fields (e.g.,gen_ai_response_finish_reasons,llm_response_finish_reasons) and arbitrary extras viaextra="allow", but:
to_dict()is annotated as returningdict[str, str | int | float | bool]and initializesresultwith the same narrow value type.get()/__getitem__()are also typed to return only scalar primitives.At runtime these methods will happily return lists (and possibly other shapes from
__pydantic_extra__), so the current signatures are misleading and will fight static analysis.Consider widening the types and delegating to Pydantic’s own dumping rather than hand-rolling the alias/extra logic, e.g.:
-from typing import Literal +from typing import Any, Literal @@ - def to_dict(self) -> dict[str, str | int | float | bool]: + def to_dict(self) -> dict[str, Any]: @@ - result: dict[str, str | int | float | bool] = {} - - # Map field names back to their aliases for proper serialization - for field_name, field_info in self.__class__.model_fields.items(): - value = getattr(self, field_name) - if value is not None: - # Use the alias if available, otherwise use field name - key = field_info.alias or field_name - result[key] = value - - # Add extra fields that were allowed through ConfigDict(extra='allow') - if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__: - for key, value in self.__pydantic_extra__.items(): - if value is not None: - result[key] = value - - return result + # Rely on Pydantic's dumping to handle aliases, extras and None filtering. + return self.model_dump(by_alias=True, exclude_none=True) @@ - def get( - self, key: str, default: str | int | float | bool | None = None - ) -> str | int | float | bool | None: + def get(self, key: str, default: Any = None) -> Any: @@ - def __getitem__(self, key: str) -> str | int | float | bool: + def __getitem__(self, key: str) -> Any:This keeps runtime behavior the same but makes the API honest about the possible value types and reduces maintenance overhead.
Also applies to: 84-133
src/opentelemetry_mcp/server.py (2)
95-115: Tighten config check and consider behavior when backend health is not “healthy”.
_get_backend()currently checks configuration viaif not _config:and logs health problems but always proceeds:if not _config: raise RuntimeError("Server configuration not set") ... if _backend is None: ... health = await _backend.health_check() ... if health.status != "healthy": logger.warning("Backend is not healthy, but continuing...")Two suggestions:
- Use an explicit
is Nonecheck to avoid any surprises ifServerConfigever defines__bool__:- if not _config: + if _config is None: raise RuntimeError("Server configuration not set")
- Decide whether you want fail-fast semantics when health is not
"healthy"(e.g., raising aRuntimeErroror making this behavior configurable via an env/flag). Right now the server will start and tools will later fail at call time, which can be harder to debug.These aren’t correctness blockers but will make behavior more predictable in misconfigured environments.
588-626: Reconsider default HTTP bind address for security.The HTTP transport currently defaults to
host="0.0.0.0", exposing the MCP endpoint on all interfaces:@click.option( "--host", type=str, default="0.0.0.0", help="Host to bind HTTP server to (only for --transport http, default: 0.0.0.0)", )For local development and typical MCP usage, a safer default is
127.0.0.1, reducing accidental exposure on shared machines or CI agents. You can still allow0.0.0.0explicitly via the CLI when needed.- default="0.0.0.0", - help="Host to bind HTTP server to (only for --transport http, default: 0.0.0.0)", + default="127.0.0.1", + help="Host to bind HTTP server to (only for --transport http, default: 127.0.0.1)",This is operational rather than functional, but worth considering before the default becomes part of the public behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (70)
.dockerignore(1 hunks).env.example(1 hunks).github/workflows/ci.yml(1 hunks).python-version(1 hunks)CLAUDE.md(1 hunks)Dockerfile(1 hunks)pyproject.toml(1 hunks)src/opentelemetry_mcp/__init__.py(1 hunks)src/opentelemetry_mcp/attributes.py(1 hunks)src/opentelemetry_mcp/backends/__init__.py(1 hunks)src/opentelemetry_mcp/backends/base.py(1 hunks)src/opentelemetry_mcp/backends/filter_engine.py(1 hunks)src/opentelemetry_mcp/backends/jaeger.py(1 hunks)src/opentelemetry_mcp/backends/tempo.py(1 hunks)src/opentelemetry_mcp/backends/traceloop.py(1 hunks)src/opentelemetry_mcp/config.py(1 hunks)src/opentelemetry_mcp/models.py(1 hunks)src/opentelemetry_mcp/server.py(1 hunks)src/opentelemetry_mcp/tools/__init__.py(1 hunks)src/opentelemetry_mcp/tools/errors.py(1 hunks)src/opentelemetry_mcp/tools/expensive_traces.py(1 hunks)src/opentelemetry_mcp/tools/list_llm_tools.py(1 hunks)src/opentelemetry_mcp/tools/list_models.py(1 hunks)src/opentelemetry_mcp/tools/model_stats.py(1 hunks)src/opentelemetry_mcp/tools/search.py(1 hunks)src/opentelemetry_mcp/tools/search_spans.py(1 hunks)src/opentelemetry_mcp/tools/services.py(1 hunks)src/opentelemetry_mcp/tools/slow_traces.py(1 hunks)src/opentelemetry_mcp/tools/trace.py(1 hunks)src/opentelemetry_mcp/tools/usage.py(1 hunks)src/opentelemetry_mcp/utils.py(1 hunks)start_locally.sh(1 hunks)tests/__init__.py(1 hunks)tests/conftest.py(1 hunks)tests/integration/__init__.py(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerBackendHealth.test_health_check_healthy.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerGetTrace.test_get_trace_by_id.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerGetTrace.test_get_trace_invalid_id.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerLLMSpans.test_search_llm_spans.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerLLMSpans.test_search_traces_with_llm_model_filter.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerListServices.test_list_services.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_basic.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_generic_filter.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_limit.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchSpans.test_search_spans_with_operation.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_basic.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_duration_filter.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_error_filter.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_generic_filter.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_limit.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerSearchTraces.test_search_traces_with_operation.yaml(1 hunks)tests/integration/cassettes/test_jaeger_integration/TestJaegerServiceOperations.test_get_service_operations.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoBackendHealth.test_health_check_healthy.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoGetTrace.test_get_trace_by_id.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoGetTrace.test_get_trace_invalid_id.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoLLMSpans.test_search_traces_with_llm_model_filter.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoListServices.test_list_services.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoSearchSpans.test_search_spans_with_generic_filter.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoSearchSpans.test_search_spans_with_operation.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_generic_filter.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_operation.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoServiceOperations.test_get_service_operations.yaml(1 hunks)tests/integration/cassettes/test_tempo_integration/TestTempoTraceQL.test_search_with_multiple_filters.yaml(1 hunks)tests/integration/conftest.py(1 hunks)tests/integration/test_jaeger_integration.py(1 hunks)tests/integration/test_tempo_integration.py(1 hunks)tests/integration/test_traceloop_integration.py(1 hunks)tests/test_models.py(1 hunks)tests/test_traceloop.py(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yamltests/integration/cassettes/test_tempo_integration/TestTempoLLMSpans.test_search_traces_with_llm_model_filter.yamltests/integration/__init__.pytests/integration/conftest.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Re-record VCR cassettes when API interactions change
Applied to files:
tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yamltests/integration/__init__.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to **/cassettes/**/*.{yaml,yml,json} : Never commit secrets or PII in VCR cassettes; scrub sensitive data
Applied to files:
tests/integration/cassettes/test_tempo_integration/TestTempoSearchTraces.test_search_traces_with_error_filter.yaml
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
Applied to files:
tests/__init__.pypyproject.tomlsrc/opentelemetry_mcp/config.pysrc/opentelemetry_mcp/__init__.pysrc/opentelemetry_mcp/server.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording
Applied to files:
tests/integration/__init__.pytests/integration/conftest.py
🧬 Code graph analysis (19)
tests/conftest.py (3)
src/opentelemetry_mcp/attributes.py (1)
SpanAttributes(8-133)src/opentelemetry_mcp/config.py (1)
BackendConfig(16-65)src/opentelemetry_mcp/models.py (2)
SpanData(226-253)TraceData(425-454)
src/opentelemetry_mcp/backends/__init__.py (1)
src/opentelemetry_mcp/backends/base.py (1)
BaseBackend(12-160)
src/opentelemetry_mcp/tools/expensive_traces.py (2)
src/opentelemetry_mcp/models.py (4)
LLMSpanAttributes(256-382)TraceQuery(457-556)from_span(283-382)from_span(666-682)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)
src/opentelemetry_mcp/tools/search_spans.py (4)
src/opentelemetry_mcp/backends/base.py (2)
BaseBackend(12-160)search_spans(80-92)src/opentelemetry_mcp/models.py (7)
Filter(45-80)SpanQuery(590-647)SpanSummary(650-682)gen_ai_system(246-248)has_error(251-253)from_span(283-382)from_span(666-682)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)src/opentelemetry_mcp/backends/tempo.py (1)
search_spans(141-260)
src/opentelemetry_mcp/tools/model_stats.py (2)
src/opentelemetry_mcp/models.py (5)
LLMSpanAttributes(256-382)TraceQuery(457-556)from_span(283-382)from_span(666-682)has_error(251-253)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)
src/opentelemetry_mcp/attributes.py (1)
src/opentelemetry_mcp/models.py (1)
gen_ai_system(246-248)
src/opentelemetry_mcp/tools/trace.py (5)
src/opentelemetry_mcp/backends/base.py (1)
get_trace(95-107)src/opentelemetry_mcp/models.py (6)
LLMSpanAttributes(256-382)is_llm_span(241-243)from_span(283-382)from_span(666-682)llm_spans(437-439)total_llm_tokens(447-454)src/opentelemetry_mcp/backends/jaeger.py (1)
get_trace(202-228)src/opentelemetry_mcp/backends/tempo.py (1)
get_trace(262-271)src/opentelemetry_mcp/server.py (1)
get_trace(200-217)
tests/integration/test_traceloop_integration.py (3)
src/opentelemetry_mcp/models.py (7)
Filter(45-80)FilterOperator(12-34)FilterType(37-42)SpanQuery(590-647)TraceQuery(457-556)has_error(251-253)gen_ai_system(246-248)tests/integration/conftest.py (1)
traceloop_backend(213-225)src/opentelemetry_mcp/backends/base.py (6)
health_check(137-146)list_services(110-119)get_service_operations(122-134)search_traces(65-77)get_trace(95-107)search_spans(80-92)
tests/integration/test_jaeger_integration.py (2)
src/opentelemetry_mcp/backends/jaeger.py (6)
health_check(268-294)list_services(230-246)get_service_operations(248-266)search_traces(42-83)get_trace(202-228)search_spans(123-200)src/opentelemetry_mcp/models.py (6)
Filter(45-80)FilterOperator(12-34)FilterType(37-42)SpanQuery(590-647)TraceQuery(457-556)has_error(251-253)
src/opentelemetry_mcp/backends/tempo.py (4)
src/opentelemetry_mcp/attributes.py (4)
HealthCheckResponse(155-167)SpanAttributes(8-133)SpanEvent(136-152)get(84-115)src/opentelemetry_mcp/backends/base.py (10)
BaseBackend(12-160)_create_headers(45-51)get_supported_operators(54-62)search_traces(65-77)client(29-42)search_spans(80-92)get_trace(95-107)list_services(110-119)get_service_operations(122-134)health_check(137-146)src/opentelemetry_mcp/backends/filter_engine.py (4)
FilterEngine(13-284)apply_filters(18-18)apply_filters(22-22)apply_filters(25-55)src/opentelemetry_mcp/models.py (10)
Filter(45-80)FilterOperator(12-34)FilterType(37-42)SpanData(226-253)SpanQuery(590-647)TraceData(425-454)TraceQuery(457-556)get_all_filters(497-514)get_all_filters(630-647)has_error(251-253)
src/opentelemetry_mcp/backends/base.py (2)
src/opentelemetry_mcp/attributes.py (1)
HealthCheckResponse(155-167)src/opentelemetry_mcp/models.py (5)
FilterOperator(12-34)SpanData(226-253)SpanQuery(590-647)TraceData(425-454)TraceQuery(457-556)
src/opentelemetry_mcp/tools/slow_traces.py (6)
src/opentelemetry_mcp/backends/base.py (3)
BaseBackend(12-160)search_traces(65-77)get_trace(95-107)src/opentelemetry_mcp/models.py (4)
LLMSpanAttributes(256-382)TraceQuery(457-556)from_span(283-382)from_span(666-682)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)src/opentelemetry_mcp/backends/jaeger.py (2)
search_traces(42-83)get_trace(202-228)src/opentelemetry_mcp/backends/tempo.py (2)
search_traces(58-139)get_trace(262-271)src/opentelemetry_mcp/backends/traceloop.py (2)
search_traces(85-174)get_trace(299-332)
src/opentelemetry_mcp/backends/filter_engine.py (1)
src/opentelemetry_mcp/models.py (8)
Filter(45-80)FilterOperator(12-34)FilterType(37-42)SpanData(226-253)TraceData(425-454)llm_spans(437-439)total_llm_tokens(447-454)has_errors(442-444)
tests/test_models.py (3)
src/opentelemetry_mcp/attributes.py (1)
SpanAttributes(8-133)src/opentelemetry_mcp/models.py (6)
LLMSpanAttributes(256-382)SpanData(226-253)TraceData(425-454)from_span(283-382)from_span(666-682)llm_spans(437-439)tests/conftest.py (1)
sample_trace_data(36-68)
src/opentelemetry_mcp/tools/usage.py (8)
src/opentelemetry_mcp/backends/base.py (2)
BaseBackend(12-160)search_traces(65-77)src/opentelemetry_mcp/models.py (8)
LLMSpanAttributes(256-382)TraceQuery(457-556)UsageMetrics(385-422)gen_ai_system(246-248)llm_spans(437-439)from_span(283-382)from_span(666-682)add_span(399-422)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)src/opentelemetry_mcp/server.py (2)
get_llm_usage(221-261)search_traces(119-196)src/opentelemetry_mcp/backends/jaeger.py (1)
search_traces(42-83)src/opentelemetry_mcp/backends/tempo.py (1)
search_traces(58-139)src/opentelemetry_mcp/backends/traceloop.py (1)
search_traces(85-174)src/opentelemetry_mcp/tools/search.py (1)
search_traces(13-117)
src/opentelemetry_mcp/tools/errors.py (8)
src/opentelemetry_mcp/backends/base.py (2)
BaseBackend(12-160)search_traces(65-77)src/opentelemetry_mcp/models.py (6)
TraceQuery(457-556)TraceSummary(559-587)has_error(251-253)from_trace(574-587)is_llm_span(241-243)gen_ai_system(246-248)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)src/opentelemetry_mcp/server.py (2)
find_errors(281-312)search_traces(119-196)src/opentelemetry_mcp/backends/jaeger.py (1)
search_traces(42-83)src/opentelemetry_mcp/backends/tempo.py (1)
search_traces(58-139)src/opentelemetry_mcp/backends/traceloop.py (1)
search_traces(85-174)src/opentelemetry_mcp/tools/search.py (1)
search_traces(13-117)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
SpanAttributes(8-133)SpanEvent(136-152)to_dict(60-82)get(84-115)
src/opentelemetry_mcp/tools/services.py (4)
src/opentelemetry_mcp/backends/base.py (3)
BaseBackend(12-160)list_services(110-119)get_service_operations(122-134)src/opentelemetry_mcp/backends/jaeger.py (2)
list_services(230-246)get_service_operations(248-266)src/opentelemetry_mcp/backends/tempo.py (2)
list_services(273-312)get_service_operations(314-344)src/opentelemetry_mcp/server.py (1)
list_services(265-277)
src/opentelemetry_mcp/backends/traceloop.py (3)
src/opentelemetry_mcp/attributes.py (3)
HealthCheckResponse(155-167)SpanAttributes(8-133)SpanEvent(136-152)src/opentelemetry_mcp/backends/filter_engine.py (4)
FilterEngine(13-284)apply_filters(18-18)apply_filters(22-22)apply_filters(25-55)src/opentelemetry_mcp/models.py (5)
Filter(45-80)FilterOperator(12-34)FilterType(37-42)SpanData(226-253)TraceData(425-454)
🪛 GitHub Actions: CI
pyproject.toml
[warning] 1-1: The tool.uv.dev-dependencies field (used in pyproject.toml) is deprecated and will be removed in a future release; use dependency-groups.dev instead
tests/integration/test_traceloop_integration.py
[error] 37-37: Health check failed: traceloop_backend.health_check() returned 'unhealthy' instead of 'healthy'.
tests/integration/test_tempo_integration.py
[error] 395-395: Argument "values" to "Filter" has incompatible type "list[str]"; expected "list[str | int | float | bool] | None" [arg-type].
tests/integration/conftest.py
[error] 140-140: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].
[error] 144-144: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].
[error] 169-169: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].
[error] 173-173: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].
[error] 209-209: Argument "url" to "BackendConfig" has incompatible type "str"; expected "HttpUrl" [arg-type].
[error] 213-213: The return type of an async generator function should be "AsyncGenerator" or one of its supertypes [misc].
🪛 LanguageTool
CLAUDE.md
[style] ~111-~111: This phrase is redundant (‘I’ stands for ‘interface’). Use simply “CLI”.
Context: ...y_mcp/server.py) - FastMCP application, CLI interface, tool handlers - [config.py](openteleme...
(ACRONYM_TAUTOLOGY)
[style] ~452-~452: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ueries - Querying all services would be very expensive (especially for spans) - Users can easi...
(EN_WEAK_ADJECTIVE)
🪛 markdownlint-cli2 (0.18.1)
CLAUDE.md
466-466: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
| build-test: | ||
| needs: test | ||
| runs-on: ubuntu-latest | ||
| strategy: | ||
| matrix: | ||
| platform: [amd64, arm64] | ||
| include: | ||
| - platform: amd64 | ||
| runs-on: ubuntu-latest | ||
| docker-platform: linux/amd64 | ||
| - platform: arm64 | ||
| runs-on: Linux-ARM64 | ||
| docker-platform: linux/arm64 |
There was a problem hiding this comment.
🧩 Analysis chain
Clarify matrix configuration for platform-specific runners.
The build-test job specifies runs-on: ubuntu-latest at the job level (line 50), but the matrix includes different runs-on values for each platform (lines 56, 59). The matrix's runs-on field won't override the job-level setting in this configuration.
To properly use different runners per platform, you need to reference the matrix value in the job's runs-on:
build-test:
needs: test
- runs-on: ubuntu-latest
+ runs-on: ${{ matrix.runs-on }}
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64Also note that "Linux-ARM64" runners are typically self-hosted and may not be available in all GitHub organizations.
Verify that the Linux-ARM64 runner is available in your GitHub organization, or consider using GitHub's hosted ARM runners if available in your plan.
I'll verify the review comment's claims about GitHub Actions matrix configuration and available runners.
🌐 Web query:
GitHub Actions matrix runs-on override job level configuration
💡 Result:
Short answer: GitHub Actions doesn't automatically override a job's runs-on from the matrix — you must explicitly use a matrix expression for runs-on (e.g. runs-on: ${{ matrix.os }}) or put runner labels into matrix include values and reference them. For self‑hosted runners some runners (e.g. CodeBuild-hosted) accept label-style overrides inside runs-on entries. [1][2][3]
How to do it:
- Standard: define an os (or runner-label) in your matrix and set runs-on: ${{ matrix.os }}. This makes each matrix leg run on that runner. [2]
- Use include/exclude to add or remove specific matrix combinations (and extra per-leg variables you can reference in runs-on). [2][3]
- Self-hosted/third-party runners: some providers support label overrides inside the runs-on array (provider-specific syntax). Example: CodeBuild-hosted runner supports label overrides like image:${{ matrix.os }}. Check your runner provider docs. [4]
Example minimal snippet:
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
runs-on: ${{ matrix.os }} <-- required to override per-matrix
Sources: GitHub Docs (matrix usage) [2], community Q&A explaining runs-on behavior [1], matrix include/exclude examples [3], and provider-specific label overrides (CodeBuild example) [4].
🌐 Web query:
GitHub Actions available runners Linux ARM64
💡 Result:
Available GitHub-hosted Linux ARM64 runner labels (current GitHub docs):
- ubuntu-24.04-arm. [1]
- ubuntu-22.04-arm. [1]
Notes:
- Arm64 standard runners are available for public repositories (and GitHub’s larger/private options exist — see docs for pricing/availability). [2][3]
Sources: [1] GitHub-hosted runners reference; [2] GitHub Changelog (arm64 runners GA); [3] actions/runner-images repo.
Matrix configuration prevents platform-specific runners from being used.
The job-level runs-on: ubuntu-latest (line 50) is not overridden by the matrix include values. The matrix entries for runs-on will be ignored, causing both amd64 and arm64 builds to run on ubuntu-latest.
To use different runners per platform, explicitly reference the matrix value:
build-test:
needs: test
- runs-on: ubuntu-latest
+ runs-on: ${{ matrix.runs-on }}
strategy:
matrix:
platform: [amd64, arm64]
include:
- platform: amd64
runs-on: ubuntu-latest
docker-platform: linux/amd64
- platform: arm64
runs-on: Linux-ARM64
docker-platform: linux/arm64Note: "Linux-ARM64" is not a standard GitHub-hosted runner. Verify it exists as a self-hosted runner in your organization, or consider using GitHub's hosted ARM64 runners (ubuntu-24.04-arm or ubuntu-22.04-arm) if available in your plan.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| build-test: | |
| needs: test | |
| runs-on: ubuntu-latest | |
| strategy: | |
| matrix: | |
| platform: [amd64, arm64] | |
| include: | |
| - platform: amd64 | |
| runs-on: ubuntu-latest | |
| docker-platform: linux/amd64 | |
| - platform: arm64 | |
| runs-on: Linux-ARM64 | |
| docker-platform: linux/arm64 | |
| build-test: | |
| needs: test | |
| runs-on: ${{ matrix.runs-on }} | |
| strategy: | |
| matrix: | |
| platform: [amd64, arm64] | |
| include: | |
| - platform: amd64 | |
| runs-on: ubuntu-latest | |
| docker-platform: linux/amd64 | |
| - platform: arm64 | |
| runs-on: Linux-ARM64 | |
| docker-platform: linux/arm64 |
🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 48 to 60: the job-level runs-on:
ubuntu-latest is overriding matrix include values so matrix-specific runners are
ignored; replace the static runs-on with the matrix value (e.g. set runs-on: ${{
matrix.runs-on }}), remove or adjust the top-level runs-on, and ensure the
matrix includes use valid runner labels (verify "Linux-ARM64" exists as a
self-hosted runner or use a GitHub-hosted ARM label like
ubuntu-24.04-arm/ubuntu-22.04-arm); keep docker-platform in the matrix and use
that value in the workflow steps when starting containers.
| def _convert_spans_to_trace(self, trace_id: str, spans_data: list[dict[str, Any]]) -> TraceData | None: | ||
| """Convert Traceloop spans array to TraceData. | ||
|
|
||
| Args: | ||
| trace_id: Trace identifier | ||
| spans_data: List of span data from API | ||
|
|
||
| Returns: | ||
| Complete TraceData with all spans | ||
| """ | ||
| spans = [] | ||
|
|
||
| for span_data in spans_data: | ||
| # Extract and transform span attributes from llm.* to gen_ai.* format | ||
| raw_attrs = span_data.get("span_attributes", {}) | ||
| transformed_attrs = self._transform_llm_attributes_to_gen_ai(raw_attrs) | ||
|
|
||
| # Create strongly-typed SpanAttributes | ||
| span_attributes = SpanAttributes(**transformed_attrs) | ||
|
|
||
| # Transform events if present | ||
| events_data = span_data.get("events", []) | ||
| events = [ | ||
| SpanEvent( | ||
| name=event.get("name", ""), | ||
| timestamp=event.get("timestamp", 0), | ||
| attributes=event.get("attributes", {}), | ||
| ) | ||
| for event in events_data | ||
| ] | ||
|
|
||
| span = SpanData( | ||
| trace_id=span_data["trace_id"], | ||
| span_id=span_data["span_id"], | ||
| parent_span_id=span_data.get("parent_span_id"), | ||
| operation_name=span_data["span_name"], | ||
| service_name=raw_attrs.get("service.name", ""), | ||
| start_time=datetime.fromtimestamp(span_data["timestamp"] / 1000), # Convert ms to seconds | ||
| duration_ms=float(span_data["duration"]), | ||
| status=self._status_code_to_status(span_data.get("status_code", "UNSET")), | ||
| attributes=span_attributes, | ||
| events=events, | ||
| ) | ||
|
|
||
| spans.append(span) | ||
|
|
||
| # Find root span | ||
| root_spans = [s for s in spans if not s.parent_span_id] | ||
| root_span = root_spans[0] if root_spans else spans[0] | ||
|
|
||
| # Calculate trace duration | ||
| if spans: | ||
| start_times = [s.start_time for s in spans] | ||
| trace_start = min(start_times) | ||
| # Find the maximum end time | ||
| end_times = [datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) for s in spans] | ||
| trace_end = max(end_times) | ||
| trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000 | ||
| else: | ||
| trace_start = root_span.start_time | ||
| trace_duration_ms = root_span.duration_ms | ||
|
|
||
| # Determine status | ||
| trace_status = "OK" | ||
| if any(span.has_error for span in spans): | ||
| trace_status = "ERROR" | ||
|
|
||
| return TraceData( | ||
| trace_id=trace_id, | ||
| spans=spans, | ||
| start_time=trace_start, | ||
| duration_ms=trace_duration_ms, | ||
| service_name=root_span.service_name, | ||
| root_operation=root_span.operation_name, | ||
| status=self._status_code_to_status(trace_status), | ||
| ) |
There was a problem hiding this comment.
Guard _convert_spans_to_trace against empty parsed span list
If spans_data is non-empty but every span fails to parse (e.g., schema drift), spans ends up empty. The code then does spans[0] when computing root_span, which raises IndexError instead of failing gracefully.
Add an early check after the loop to bail out when no spans were parsed:
def _convert_spans_to_trace(self, trace_id: str, spans_data: list[dict[str, Any]]) -> TraceData | None:
@@
- spans = []
-
- for span_data in spans_data:
+ spans: list[SpanData] = []
+
+ for span_data in spans_data:
@@
- spans.append(span)
-
- # Find root span
- root_spans = [s for s in spans if not s.parent_span_id]
- root_span = root_spans[0] if root_spans else spans[0]
+ spans.append(span)
+
+ if not spans:
+ logger.warning(f"No valid spans in trace {trace_id}")
+ return None
+
+ # Find root span
+ root_spans = [s for s in spans if not s.parent_span_id]
+ root_span = root_spans[0] if root_spans else spans[0]
@@
- # Calculate trace duration
- if spans:
- start_times = [s.start_time for s in spans]
- trace_start = min(start_times)
- # Find the maximum end time
- end_times = [datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) for s in spans]
- trace_end = max(end_times)
- trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000
- else:
- trace_start = root_span.start_time
- trace_duration_ms = root_span.duration_ms
+ # Calculate trace duration
+ start_times = [s.start_time for s in spans]
+ trace_start = min(start_times)
+ end_times = [
+ datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000))
+ for s in spans
+ ]
+ trace_end = max(end_times)
+ trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000This keeps get_trace’s None-on-failure behavior consistent with the other backends.
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 621-696, the
conversion loop can produce an empty spans list if all incoming spans fail
parsing, but the code later indexes spans[0] causing IndexError; add an early
guard immediately after the for-loop that checks if not spans and return None
(preserving get_trace’s None-on-failure behavior) so the function bails out
gracefully when no spans were parsed and avoids later references to spans[0].
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/opentelemetry_mcp/models.py (1)
84-156: Useis not Noneformax_duration_msto avoid silently dropping a valid0filter.In
_convert_legacy_params_to_filters,max_duration_msis checked via truthiness:if max_duration_ms: ...Given the field is declared with
ge=0on the query models,0is a valid value but will be treated as “no filter” and skipped here, unlikemin_duration_mswhich explicitly checksis not None. To keep semantics predictable, mirror themin_duration_mscheck:- if max_duration_ms: + if max_duration_ms is not None: all_filters.append( Filter( field="duration", operator=FilterOperator.LTE, value=max_duration_ms, value_type=FilterType.NUMBER, ) )This keeps legacy param conversion consistent and ensures
max_duration_ms=0is handled as an actual upper-bound filter.src/opentelemetry_mcp/backends/traceloop.py (1)
633-715: Guard against empty spans list to prevent IndexError.If all spans fail to parse in the loop (lines 647-681), the
spanslist remains empty, but line 685 attemptsspans[0]when no root span exists, causing anIndexError. Theif spans:check at line 688 comes too late to prevent this.Add an early guard after the loop:
for span_data in spans_data: # Extract and transform span attributes from llm.* to gen_ai.* format raw_attrs = span_data.get("span_attributes", {}) transformed_attrs = self._transform_llm_attributes_to_gen_ai(raw_attrs) # Create strongly-typed SpanAttributes span_attributes = SpanAttributes(**transformed_attrs) # Transform events if present events_data = span_data.get("events", []) events = [ SpanEvent( name=event.get("name", ""), timestamp=event.get("timestamp", 0), attributes=event.get("attributes", {}), ) for event in events_data ] span = SpanData( trace_id=span_data["trace_id"], span_id=span_data["span_id"], parent_span_id=span_data.get("parent_span_id"), operation_name=span_data["span_name"], service_name=raw_attrs.get("service.name", ""), start_time=datetime.fromtimestamp( span_data["timestamp"] / 1000 ), # Convert ms to seconds duration_ms=float(span_data["duration"]), status=self._status_code_to_status(span_data.get("status_code", "UNSET")), attributes=span_attributes, events=events, ) spans.append(span) + if not spans: + logger.warning(f"No valid spans parsed for trace {trace_id}") + return None + # Find root span root_spans = [s for s in spans if not s.parent_span_id] root_span = root_spans[0] if root_spans else spans[0] # Calculate trace duration - if spans: - start_times = [s.start_time for s in spans] - trace_start = min(start_times) - # Find the maximum end time - end_times = [ - datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) - for s in spans - ] - trace_end = max(end_times) - trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000 - else: - trace_start = root_span.start_time - trace_duration_ms = root_span.duration_ms + start_times = [s.start_time for s in spans] + trace_start = min(start_times) + # Find the maximum end time + end_times = [ + datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) + for s in spans + ] + trace_end = max(end_times) + trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000tests/integration/test_traceloop_integration.py (1)
33-40: Make health check test resilient to unhealthy responses.The test unconditionally asserts
health.status == "healthy", which makes CI brittle when the backend (or VCR cassette) reports an unhealthy state. Other tests in this file usepytest.skip()for unavailable data; apply the same pattern here.Add a skip before assertions:
async def test_health_check_healthy(self, traceloop_backend: TraceloopBackend) -> None: """Test health check returns healthy status when backend is accessible.""" health = await traceloop_backend.health_check() + if health.status != "healthy": + pytest.skip(f"Backend reported unhealthy status: {health.error or health.status}") + assert health.status == "healthy" assert health.backend == "traceloop" assert health.url is not None
🧹 Nitpick comments (10)
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml (1)
1-394: Very large cassette; consider trimming or normalizing if it becomes a maintenance burden.This fixture captures rich, realistic LLM prompts/completions, which is great for exercising parsing and filtering. The tradeoff is cassette size and noisy diffs when payloads change. If test execution or PR noise becomes an issue, you might:
- Trim prompt/completion content to shorter strings before recording.
- Or normalize dynamic fields (timestamps, itineraries) via VCR matchers/custom serializers.
Not urgent, just something to keep in mind as the suite grows.
src/opentelemetry_mcp/models.py (1)
483-496: Consider aligninghas_filters/ backend-param checks with the explicitis not Nonesemantics for duration fields.
TraceQuery.has_filters,SpanQuery.has_filters, andTraceQuery.to_backend_paramsall use truthiness checks formin_duration_ms/max_duration_ms, while the models declare these fields withge=0and_convert_legacy_params_to_filtersalready usesis not Noneformin_duration_ms.This means
0is effectively treated as “no filter” in these helpers, which is inconsistent with the type constraints and with the fix you’re applying in_convert_legacy_params_to_filters. If you care about strict semantics, consider:
- Checking
is not Noneinhas_filtersandto_backend_paramsfor the duration fields, or- Documenting that
0is treated as “unset” for those helpers.Not blocking, but tightening this would make behavior more predictable.
Also applies to: 534-539, 616-629
src/opentelemetry_mcp/backends/tempo.py (1)
374-383:_build_traceql_from_filtersdoesn’t use itsqueryargument.The docstring mentions using
query(for time range), but the implementation only walksfiltersand ignoresquery. Given time range is handled viastart/endrequest params elsewhere, you might:
- Drop the unused
queryparameter from_build_traceql_from_filtersand its call sites, or- Actually incorporate information from
queryif you intend to use it here.Not functionally wrong, but cleaning this up will make the API clearer.
tests/integration/test_jaeger_integration.py (7)
1-10: Module setup and VCR usage look good (minor redundancy only)The module-level
pytestmark = [pytest.mark.integration, pytest.mark.vcr]plus per-test@pytest.mark.vcrensures everything is recorded/replayed correctly. The per-test VCR marks are slightly redundant given the module-level mark, but that’s purely stylistic and not worth changing unless you want to simplify.
66-87: Consider asserting that at least one trace is returned in the “basic” search test
test_search_traces_basiccurrently only asserts thattracesis a list and then iterates it; if no traces are returned, the field assertions in the loop never run, so the test still passes. If the cassettes are guaranteed to include at least one trace, addingassert traces(orassert len(traces) > 0) would turn this into a stronger regression check.
135-165: Strengthen duration/error filter tests by asserting non-empty resultsBoth
test_search_traces_with_duration_filterandtest_search_traces_with_error_filteronly assert conditions inside afor trace in tracesloop. If filtering ever yielded zero traces, these tests would pass without actually validating filtering. Consider addingassert traces(or a minimum length) before the loop so the tests fail if the filters stop matching anything under the recorded fixture.
191-221: Narrow the expected exception type for invalid trace IDsIn
test_get_trace_invalid_id,pytest.raises((ValueError, Exception))effectively just asserts that someExceptionis raised, sinceExceptionalready coversValueError. If the backend has (or can have) a stable error type for this case (e.g.,ValueErroror a specific HTTPX error), it would be better to assert that type explicitly and optionally match part of the message.
227-247: Basic span search test is solid; consider asserting at least one span
test_search_spans_basicthoroughly checks required fields on each returned span, which is great. Similar to the basic trace test, adding a simpleassert spans(assuming the cassette guarantees results) would prevent the test from passing when the backend unexpectedly returns an empty list.
317-338: LLM span search test is correct but could be made stricter
test_search_llm_spansnicely exercises the EXISTS operator ongen_ai.systemand asserts that returned spans are LLM spans withgen_ai_systemset. As with other filter tests, you might optionally assert thatspansis non-empty (if your cassette guarantees LLM spans) so the test doesn’t silently succeed when no LLM spans are found.
340-361: LLM model filter test is flexible but may pass vacuously for some modelsIn
test_search_traces_with_llm_model_filter, iterating over several model names and checking matches only inside thefor trace in traces/if llm_spansblocks is pragmatic. However, if none of the models are present in the fixture, the test will pass without asserting anything. If you want stronger guarantees, consider:
- Restricting to a model you know exists in the Jaeger data, and
- Asserting that at least one trace and one LLM span are returned for that model before checking the
gen_ai_request_model.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
src/opentelemetry_mcp/backends/tempo.py(1 hunks)src/opentelemetry_mcp/backends/traceloop.py(1 hunks)src/opentelemetry_mcp/models.py(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopFilters.test_search_with_comparison_operators.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchSpans.test_search_spans_with_limit.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_basic.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_duration_filter.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_error_filter.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_generic_filter.yaml(1 hunks)tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_limit.yaml(1 hunks)tests/integration/conftest.py(1 hunks)tests/integration/test_jaeger_integration.py(1 hunks)tests/integration/test_traceloop_integration.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_limit.yaml
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yamltests/integration/conftest.pytests/integration/test_traceloop_integration.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Re-record VCR cassettes when API interactions change
Applied to files:
tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording
Applied to files:
tests/integration/conftest.py
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_spans_with_llm_model_filter.yaml
🧬 Code graph analysis (6)
tests/integration/test_jaeger_integration.py (5)
src/opentelemetry_mcp/backends/jaeger.py (6)
health_check(268-294)list_services(230-246)get_service_operations(248-266)search_traces(42-83)get_trace(202-228)search_spans(123-200)src/opentelemetry_mcp/models.py (9)
Filter(46-81)FilterOperator(13-35)FilterType(38-43)SpanQuery(591-648)TraceQuery(458-557)has_error(252-254)is_llm_span(242-244)gen_ai_system(247-249)llm_spans(438-440)src/opentelemetry_mcp/backends/tempo.py (6)
health_check(347-372)list_services(274-313)get_service_operations(315-345)search_traces(58-139)get_trace(263-272)search_spans(141-261)src/opentelemetry_mcp/tools/trace.py (1)
get_trace(10-80)src/opentelemetry_mcp/tools/search_spans.py (1)
search_spans(13-116)
src/opentelemetry_mcp/models.py (1)
src/opentelemetry_mcp/attributes.py (4)
SpanAttributes(8-133)SpanEvent(136-152)to_dict(60-82)get(84-115)
tests/integration/conftest.py (1)
src/opentelemetry_mcp/config.py (1)
BackendConfig(16-65)
src/opentelemetry_mcp/backends/traceloop.py (3)
src/opentelemetry_mcp/attributes.py (4)
HealthCheckResponse(155-167)SpanAttributes(8-133)SpanEvent(136-152)get(84-115)src/opentelemetry_mcp/backends/filter_engine.py (4)
FilterEngine(13-284)apply_filters(18-18)apply_filters(22-22)apply_filters(25-55)src/opentelemetry_mcp/models.py (9)
Filter(46-81)FilterOperator(13-35)FilterType(38-43)SpanData(227-254)TraceData(426-455)TraceQuery(458-557)get_all_filters(498-515)get_all_filters(631-648)has_error(252-254)
tests/integration/test_traceloop_integration.py (8)
src/opentelemetry_mcp/backends/traceloop.py (7)
TraceloopBackend(23-723)health_check(420-449)list_services(340-385)get_service_operations(387-418)search_traces(85-176)search_spans(178-303)get_trace(305-338)src/opentelemetry_mcp/models.py (7)
Filter(46-81)FilterOperator(13-35)FilterType(38-43)SpanQuery(591-648)TraceQuery(458-557)has_error(252-254)gen_ai_system(247-249)tests/integration/conftest.py (1)
traceloop_backend(265-279)src/opentelemetry_mcp/backends/tempo.py (6)
health_check(347-372)list_services(274-313)get_service_operations(315-345)search_traces(58-139)search_spans(141-261)get_trace(263-272)src/opentelemetry_mcp/backends/base.py (6)
health_check(137-146)list_services(110-119)get_service_operations(122-134)search_traces(65-77)search_spans(80-92)get_trace(95-107)src/opentelemetry_mcp/tools/search.py (1)
search_traces(13-117)src/opentelemetry_mcp/tools/search_spans.py (1)
search_spans(13-116)src/opentelemetry_mcp/tools/trace.py (1)
get_trace(10-80)
src/opentelemetry_mcp/backends/tempo.py (3)
src/opentelemetry_mcp/attributes.py (4)
HealthCheckResponse(155-167)SpanAttributes(8-133)SpanEvent(136-152)get(84-115)src/opentelemetry_mcp/backends/filter_engine.py (4)
FilterEngine(13-284)apply_filters(18-18)apply_filters(22-22)apply_filters(25-55)src/opentelemetry_mcp/models.py (10)
Filter(46-81)FilterOperator(13-35)FilterType(38-43)SpanData(227-254)TraceData(426-455)TraceQuery(458-557)get_all_filters(498-515)get_all_filters(631-648)has_error(252-254)gen_ai_system(247-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test (arm64)
🔇 Additional comments (16)
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_basic.yaml (1)
1-40: Cassette structure and data look appropriate for integration testing.Request/response payloads are well-formed, scoped to localhost, and don’t appear to leak credentials or sensitive runtime configuration. This is fine to keep as a stable fixture.
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_duration_filter.yaml (1)
1-40: Duration-filter cassette is consistent and safe to commit.The recorded request correctly exercises a numeric duration filter, and the response data is similar to the basic case without exposing secrets. Looks good as an integration fixture.
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_error_filter.yaml (1)
1-40: Cassette structure and contents look consistent and safeThe recorded request/response for this Traceloop trace search is well‑formed, uses only localhost endpoints, and does not expose any obvious secrets or PII; it should be fine to keep as‑is for the error‑filter integration scenario.
tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchTraces.test_search_traces_with_generic_filter.yaml (1)
1-40: Generic duration-filter cassette looks correctThe cassette cleanly captures a numeric duration filter request and a representative root_spans response against localhost, with no sensitive data embedded. It’s suitable for exercising the generic filter path in the Traceloop backend.
tests/integration/cassettes/test_traceloop_integration/TestTraceloopGetTrace.test_get_trace_invalid_id.yaml (1)
1-32: Invalid trace ID cassette accurately reflects null-result behaviorThis cassette clearly records the “invalid trace ID” scenario as a 200 with
{"spans":null}, which is useful to exercise the Traceloop backend’s handling of non-existent traces without leaking any sensitive data.tests/integration/cassettes/test_traceloop_integration/TestTraceloopFilters.test_search_with_comparison_operators.yaml (1)
1-40: Comparison-operator cassette correctly captures a>=duration filterThe recorded interaction for
greater_than_or_equalduration filtering is structurally sound and free of sensitive data, and should give solid coverage for Traceloop’s comparison-operator handling.tests/integration/cassettes/test_traceloop_integration/TestTraceloopSearchSpans.test_search_spans_with_limit.yaml (1)
1-754: Rich span-limit cassette provides good LLM/tool coverage without leaking secretsThis cassette gives a detailed
spanspayload (including LLM/tool attributes and external HTTP calls) for a limit‑10 query, which is great for exercising span search and parsing logic. The data is synthetic/public enough to avoid obvious secret/PII exposure.tests/integration/test_jaeger_integration.py (2)
167-186: Generic trace filter test aligns well with Filter/FilterOperator API
test_search_traces_with_generic_filtercorrectly constructs aFilterwithfield="duration_ms",operator=FilterOperator.GT, andvalue_type=FilterType.NUMBER, then verifiestrace.duration_ms > 50. This is a good end‑to‑end check of the generic filter pipeline for traces.
293-311: Generic span filter test mirrors the trace filter test correctly
test_search_spans_with_generic_filteruses aFilteronduration_mswithFilterOperator.GTandFilterType.NUMBER, matching the models, and validatesspan.duration_ms > 10. This is a good integration check of the span-level filter path.tests/integration/conftest.py (1)
1-279: LGTM! Past type issues have been resolved.The integration test fixtures are well-structured with proper VCR configuration, timestamp filtering, and header sanitization. The previous mypy errors regarding
BackendConfig.urltype mismatches and async generator return types have been correctly addressed usingTypeAdapter(HttpUrl).validate_python()andAsyncGeneratortype annotations.src/opentelemetry_mcp/backends/traceloop.py (3)
31-56: LGTM!The initialization properly validates the required API key and sets sensible defaults for
project_idandenvironments.
85-176: Well-implemented hybrid filtering strategy.The method correctly separates API-supported filters from those requiring client-side filtering, with clear logging and sensible defaults for time ranges. The conversion and fallback logic is sound.
451-536: LGTM! Robust filter conversion with proper fallback.The conversion logic correctly maps between internal filter representations and Traceloop's API format, including the important
gen_ai.system→llm.vendorspecial case. Unsupported filters properly returnNoneto trigger client-side filtering.tests/integration/test_traceloop_integration.py (3)
48-61: LGTM!The test gracefully handles empty service lists while properly validating structure, uniqueness, and sorting when data is available.
93-109: LGTM!Basic trace search test with appropriate structure validation for trace data.
257-273: LGTM!Basic span search test with appropriate structure validation.
| def _build_traceql_from_filters(self, filters: list[Filter], query: TraceQuery) -> str: | ||
| """Build TraceQL query from Filter objects. | ||
|
|
||
| Args: | ||
| filters: List of Filter conditions | ||
| query: Original query (for time range) | ||
|
|
||
| Returns: | ||
| TraceQL query string | ||
| """ | ||
| conditions = [] | ||
|
|
||
| for filter_obj in filters: | ||
| condition = self._filter_to_traceql(filter_obj) | ||
| if condition: | ||
| conditions.append(condition) | ||
|
|
||
| # If no filters, match all traces | ||
| if not conditions: | ||
| return "{}" | ||
|
|
||
| # Combine with AND logic | ||
| return "{ " + " && ".join(conditions) + " }" | ||
|
|
There was a problem hiding this comment.
has_error=False / status NOT_EQUALS filters are currently ignored for Tempo (native but not translated).
In search_traces / search_spans, filters are split by operator+field into native_filters vs client_filters based on get_supported_operators and trace_level_fields. For has_error=False, _convert_legacy_params_to_filters produces:
field="status",operator=FilterOperator.NOT_EQUALS,value="ERROR".
Because NOT_EQUALS is in get_supported_operators and status is in trace_level_fields, this filter is treated as native and removed from client_filters. However, in _filter_to_traceql the status branch only handles EQUALS and returns None for other operators, so the filter is silently dropped and never applied on either side.
This means Tempo queries with has_error=False won’t actually exclude error traces/spans, which is a behavior bug rather than just an optimization issue.
You can fix this by handling NOT_EQUALS for status explicitly:
- elif field == "status":
- # Special handling for status
- if operator == FilterOperator.EQUALS:
- if value == "ERROR":
- return "status = error"
- elif value == "OK":
- return "status = ok"
- return None
+ elif field == "status":
+ # Special handling for status (OK/ERROR) with both equality and inequality
+ if operator in (FilterOperator.EQUALS, FilterOperator.NOT_EQUALS):
+ op = "=" if operator == FilterOperator.EQUALS else "!="
+ if value == "ERROR":
+ return f"status {op} error"
+ if value == "OK":
+ return f"status {op} ok"
+ return NoneThis preserves the “native” classification while ensuring both has_error=True and has_error=False are respected when querying Tempo.
Also applies to: 398-427, 438-485
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/tempo.py around lines 374-397, the status
branch in _filter_to_traceql only emits TraceQL for EQUALS and returns None for
other operators, causing native NOT_EQUALS (used for has_error=False -> status
!= "ERROR") to be dropped; update _filter_to_traceql to explicitly handle
FilterOperator.NOT_EQUALS for the status field and emit the corresponding
TraceQL condition (e.g., status != "ERROR"), and apply the same NOT_EQUALS
handling to the parallel status-handling blocks at lines 398-427 and 438-485 so
has_error=True and has_error=False are both respected by Tempo queries.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
tests/integration/conftest.py (1)
94-118: Robust VCR header scrubbing; consider case‑insensitive matching for response headers.The VCR config and
filter_sensitive_headersdo a good job of keeping auth data and cookies out of cassettes, matching the project’s VCR/PII guidelines. As a small hardening, you could normalize header keys (e.g., iterate overresponse_headers.items()with.lower()) so both"set-cookie"and"Set-Cookie"variants are filtered consistently.src/opentelemetry_mcp/tools/errors.py (1)
13-121: find_errors implementation is consistent and well‑defended.Timestamp parsing,
TraceQueryvalidation, and the error‑span formatting all mirror patterns in other tools, and wrappingTraceQueryin aValidationErrorhandler keeps the contract of always returning JSON. If you want to be extra defensive, you could skip traces whereerror_spansends up empty, but withhas_error=Truein the query this is mostly a safeguard.src/opentelemetry_mcp/backends/traceloop.py (1)
31-56: Backend logic looks robust; just align a couple of docstrings/comments.The Traceloop backend’s hybrid filtering, attribute transformation, and the new empty‑spans guard in
_convert_spans_to_traceall look good and address the earlier failure mode. Two tiny nits you might clean up later:
__init__docstring says the default environments are["production"], but the code uses["prd"].- The
health_checkdocstring mentions using a “projects endpoint”, while the implementation probes"/v2/warehouse/spans".These don’t affect behavior but updating them would avoid confusion for future readers.
Also applies to: 420-449, 633-720
tests/integration/test_traceloop_integration.py (1)
202-217: Move test to the appropriate class and fix terminology.This test method is in
TestTraceloopSearchTracesbut tests span search functionality (usesSpanQuery, callssearch_spans). It should be moved to theTestTraceloopSearchSpansclass. Additionally, the docstring and comments incorrectly reference "traces" when they should say "spans".Move this test to
TestTraceloopSearchSpansclass and update the terminology:-class TestTraceloopSearchTraces: - """Test Traceloop trace search functionality.""" - - # ... other tests ... - - @pytest.mark.vcr - async def test_search_spans_with_llm_model_filter( - self, traceloop_backend: TraceloopBackend - ) -> None: - """Test trace search filtered by LLM model.""" - query = SpanQuery(gen_ai_request_model="gpt-4o-mini", limit=200) - - spans = await traceloop_backend.search_spans(query) - - # Skip if no matching traces found (test data may not have gpt-4 traces) - if len(spans) == 0: - pytest.skip("No traces with gpt-4o-mini model found in test data") - - # Verify all returned traces have the correct model - for span in spans: - # Check if any span has the requested model - assert span.attributes.gen_ai_request_model == "gpt-4o-mini"And add it to the
TestTraceloopSearchSpansclass with corrected terminology:class TestTraceloopSearchSpans: """Test Traceloop span search functionality.""" # ... other tests ... @pytest.mark.vcr async def test_search_spans_with_llm_model_filter( self, traceloop_backend: TraceloopBackend ) -> None: """Test span search filtered by LLM model.""" query = SpanQuery(gen_ai_request_model="gpt-4o-mini", limit=200) spans = await traceloop_backend.search_spans(query) # Skip if no matching spans found (test data may not have gpt-4 spans) if len(spans) == 0: pytest.skip("No spans with gpt-4o-mini model found in test data") # Verify all returned spans have the correct model for span in spans: assert span.attributes.gen_ai_request_model == "gpt-4o-mini"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
pyproject.toml(1 hunks)src/opentelemetry_mcp/backends/traceloop.py(1 hunks)src/opentelemetry_mcp/server.py(1 hunks)src/opentelemetry_mcp/tools/errors.py(1 hunks)src/opentelemetry_mcp/tools/search_spans.py(1 hunks)tests/integration/conftest.py(1 hunks)tests/integration/test_traceloop_integration.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/opentelemetry_mcp/tools/search_spans.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
Applied to files:
src/opentelemetry_mcp/server.pypyproject.toml
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/conftest.py : Use VCR filters (e.g., filter_headers, before_record) or framework equivalents to scrub secrets/PII during recording
Applied to files:
tests/integration/conftest.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes
Applied to files:
tests/integration/conftest.py
🧬 Code graph analysis (4)
src/opentelemetry_mcp/tools/errors.py (3)
src/opentelemetry_mcp/backends/base.py (2)
BaseBackend(12-160)search_traces(65-77)src/opentelemetry_mcp/models.py (6)
TraceQuery(458-557)TraceSummary(560-588)has_error(252-254)from_trace(575-588)is_llm_span(242-244)gen_ai_system(247-249)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)
src/opentelemetry_mcp/server.py (8)
src/opentelemetry_mcp/tools/list_llm_tools.py (1)
list_llm_tools(24-135)src/opentelemetry_mcp/tools/list_models.py (1)
list_models(11-106)src/opentelemetry_mcp/tools/search_spans.py (1)
search_spans(13-119)src/opentelemetry_mcp/tools/search.py (1)
search_traces(13-117)src/opentelemetry_mcp/tools/trace.py (1)
get_trace(10-80)src/opentelemetry_mcp/tools/usage.py (1)
get_llm_usage(11-114)src/opentelemetry_mcp/tools/services.py (1)
list_services(8-25)src/opentelemetry_mcp/tools/slow_traces.py (1)
get_slow_traces(10-114)
tests/integration/test_traceloop_integration.py (3)
src/opentelemetry_mcp/models.py (7)
Filter(46-81)FilterOperator(13-35)FilterType(38-43)SpanQuery(591-648)TraceQuery(458-557)has_error(252-254)gen_ai_system(247-249)tests/integration/conftest.py (1)
traceloop_backend(269-283)src/opentelemetry_mcp/backends/base.py (1)
health_check(137-146)
tests/integration/conftest.py (1)
src/opentelemetry_mcp/config.py (1)
BackendConfig(16-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test (arm64)
🔇 Additional comments (3)
pyproject.toml (1)
30-39: Dev dependency configuration looks correct and fixes the previous uv deprecation.Using
[dependency-groups].devinstead oftool.uv.dev-dependenciesaligns with current uv semantics, and the dev stack (pytest, mypy, ruff, etc.) is coherent with the rest of the project.src/opentelemetry_mcp/server.py (1)
37-363: Centralized tool error handling and MCP wrappers look solid.
_handle_tool_errorfixes the earlier invalid‑JSON pattern and gives you consistent logging, and each@mcp.toolwrapper simply fetches the backend and delegates to the corresponding tool module, keeping the server thin and testable. The lazy_get_backendinitialization with a soft‑fail health check also matches the “single FastMCP server per process” constraint the team has documented.tests/integration/test_traceloop_integration.py (1)
1-26: Clear documentation and appropriate imports.The module docstring provides excellent guidance on using VCR cassettes for recording and replay, making it easy for developers to work with these integration tests.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/opentelemetry_mcp/backends/traceloop.py (1)
634-707: Trace conversion now guards empty spans; simplify duration logic and status aggregationThe new guard on
spans(Lines [684-687]) addresses the earlierIndexErrorrisk when all spans fail to parse—nice improvement. However:
- The subsequent
if spans: ... else:duration block (Lines [693-706]) still contains anelsethat is now unreachable due to the earlyif not spans: return None. This can be simplified to just the non-empty branch.trace_statusis derived as"OK"if no span hashas_error, and then passed through_status_code_to_status(Lines [709-721]). That means a trace whose spans are all"UNSET"is classified as"OK"rather than"UNSET".If you want to preserve “UNSET” semantics, you could compute status along these lines:
- trace_status = "OK" - if any(span.has_error for span in spans): - trace_status = "ERROR" + if any(span.has_error for span in spans): + trace_status = "ERROR" + elif all(span.status == "UNSET" for span in spans): + trace_status = "UNSET" + else: + trace_status = "OK"and drop the unreachable
elsebranch in the duration calculation.Also applies to: 709-721
src/opentelemetry_mcp/models.py (1)
85-157: Handlemin_duration_ms/max_duration_msusingis not Noneinstead of truthiness
min_duration_ms/max_duration_msare declared withge=0, so0is a valid value. Currently:
_convert_legacy_params_to_filterschecksif max_duration_ms:(Line [149]).TraceQuery.has_filtersandSpanQuery.has_filterstreatmin_duration_ms/max_duration_msas truthy (Lines [490-492], [623-625]).TraceQuery.to_backend_paramsonly sendsminDuration/maxDurationwhen the values are truthy (Lines [536-540]).This means a caller passing
0will have that filter silently ignored and omitted from backend params and filter lists, which is inconsistent with the field’s type constraints and with themin_duration_ms is not Nonecheck already used in_convert_legacy_params_to_filters.You can fix this by switching to explicit
is not Nonechecks in all these places, for example:- if max_duration_ms: + if max_duration_ms is not None: all_filters.append( Filter( field=Fields.DURATION, operator=FilterOperator.LTE, value=max_duration_ms, value_type=FilterType.NUMBER, ) )and similarly:
- or self.min_duration_ms - or self.max_duration_ms + or self.min_duration_ms is not None + or self.max_duration_ms is not Noneand:
- if self.min_duration_ms: + if self.min_duration_ms is not None: params["minDuration"] = f"{self.min_duration_ms}ms" - if self.max_duration_ms: + if self.max_duration_ms is not None: params["maxDuration"] = f"{self.max_duration_ms}ms"The same pattern should be applied in
SpanQuery.has_filters.Also applies to: 463-498, 519-541, 593-631
🧹 Nitpick comments (5)
src/opentelemetry_mcp/attributes.py (2)
64-87: Broaden value types inSpanAttributes.to_dict/getto match actual data
SpanAttributesincludes list-valued fields (for examplegen_ai_response_finish_reasons,llm_response_finish_reasons) and allows arbitrary extras viaextra="allow", but:
to_dictis annotated as returningdict[str, str | int | float | bool]while it will happily emit lists and possibly other types from__pydantic_extra__.get/__getitem__have similarly narrow return annotations, yet can return any type present in attributes/extras.This is more of a typing mismatch than a runtime bug, but it will confuse type-checkers and future readers. Consider widening the value type (e.g. to
Anyor to includeSequence/Mapping) so the annotations reflect actual behavior.Also applies to: 88-120
140-157: Consider allowing array-valued event attributes for better OTEL compatibility
SpanEvent.attributesis currently typed asdict[str, str | int | float | bool]. Many OpenTelemetry exporters and GenAI semantics use array-valued attributes on events (lists of strings/ints). With the strict union, any such event from a backend will raise apydanticvalidation error and cause the whole span to be dropped in callers that wrap construction in a broadtry/except.If you expect to handle generic OTEL payloads, consider extending the union to include list types (for example
list[str],list[int]) or loosening it toAny, and then normalizing where needed.src/opentelemetry_mcp/constants.py (1)
31-57: Centralizing semantic conventions here looks good; consider pushing more raw strings intoGenAIAttributesThis module nicely aligns with the pattern of keeping AI/LLM semantic conventions centralized (using
semconv_ai.SpanAttributesplus your ownGenAIAttributes/GenAI/LegacyLLM/Traceloop/Fieldswrappers), which matches the team’s guidelines around OpenTelemetry GenAI semantics and avoiding scattered magic strings. Based on learningsFor consistency, you could also move the cache token and event-name strings currently hardcoded in
GenAI(for examplegen_ai.usage.cache_creation_input_tokens,gen_ai.content.prompt) intoGenAIAttributesand reference them there, so all GenAI attribute keys live in a single namespace.Also applies to: 74-112, 172-196
src/opentelemetry_mcp/models.py (1)
284-329: Be explicit about 0-token semantics inLLMSpanAttributesand summaries
LLMSpanAttributes.from_spanand downstream usage currently treat0tokens as falsy:
prompt_tokens/completion_tokens/total_tokensare built withorchains and default to0(Lines [292-306]).total_tokensis recomputed whenif not total_tokens:(Line [314]), so an explicit0total is treated as “missing” and recomputed.- When constructing
LLMSpanAttributes, fields are set asprompt_tokens=prompt_tokens if prompt_tokens else Noneand similarly forcompletion_tokens/total_tokens(Lines [380-382]), so an explicit 0 becomesNone.This is not a hard bug—aggregation via
UsageMetrics.add_spanstill usesor 0and will countNoneand0equivalently—but it does conflate “no data” with “explicitly zero tokens”, and can change how totals are derived when both aggregated and explicit totals are present.If you care about distinguishing “0 tokens” from “no metric reported”, consider:
- Using
is not Nonechecks instead of truthiness when deciding whether to recompute totals.- Passing
prompt_tokens/completion_tokens/total_tokensthrough as-is (including0) rather than converting them toNone.This would make the model’s behavior more predictable and match the
ge=0semantics used elsewhere.Also applies to: 380-382, 388-425, 450-457, 669-685
src/opentelemetry_mcp/backends/traceloop.py (1)
388-417: Clarify or useservice_nameparameter inget_service_operations
get_service_operationsacceptsservice_namebut does not use it in the Traceloop request; it returns workflow names across all services for the last 7 days.If the Traceloop API doesn’t support service-level filtering, consider:
- Adjusting the docstring to say it returns operations across all services, or
- Adding a comment explaining why
service_nameis intentionally unused, or- If the API supports it, adding a filter (for example via query params or filters) so the result is scoped to the given service.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
pyproject.toml(1 hunks)src/opentelemetry_mcp/attributes.py(1 hunks)src/opentelemetry_mcp/backends/traceloop.py(1 hunks)src/opentelemetry_mcp/constants.py(1 hunks)src/opentelemetry_mcp/models.py(1 hunks)src/opentelemetry_mcp/tools/list_llm_tools.py(1 hunks)src/opentelemetry_mcp/tools/search_spans.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/opentelemetry_mcp/tools/search_spans.py
- src/opentelemetry_mcp/tools/list_llm_tools.py
- pyproject.toml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
src/opentelemetry_mcp/constants.py
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Semantic conventions must follow the OpenTelemetry GenAI specification
Applied to files:
src/opentelemetry_mcp/constants.py
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/ai-semantic-conventions/src/SemanticAttributes.ts : Define all AI/LLM span attribute constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts
Applied to files:
src/opentelemetry_mcp/constants.py
🧬 Code graph analysis (3)
src/opentelemetry_mcp/constants.py (1)
src/opentelemetry_mcp/attributes.py (1)
SpanAttributes(12-137)
src/opentelemetry_mcp/models.py (2)
src/opentelemetry_mcp/attributes.py (4)
SpanAttributes(12-137)SpanEvent(140-157)to_dict(64-86)get(88-119)src/opentelemetry_mcp/constants.py (3)
Fields(186-196)GenAI(75-111)Service(172-176)
src/opentelemetry_mcp/attributes.py (2)
src/opentelemetry_mcp/constants.py (4)
GenAI(75-111)LegacyLLM(114-140)Service(172-176)Status(179-182)src/opentelemetry_mcp/models.py (1)
gen_ai_system(248-250)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test (arm64)
🔇 Additional comments (2)
src/opentelemetry_mcp/backends/traceloop.py (2)
195-222: Hybrid filtering flow looks solidThe split into API-supported vs client-side filters, plus the fallback when
_filter_to_traceloopreturnsNone, provides a clear and predictable hybrid filtering path. Logging ofclient_filtersalso makes debugging easier when filters are silently downgraded to client-side.Also applies to: 259-305
421-450: Health check implementation is straightforward and sufficientUsing a simple GET against
/v2/warehouse/spansas a proxy health check and mapping the result intoHealthCheckResponsewithproject_idanderrorfields gives a clear healthy/unhealthy signal without over-complicating things. Logging on failure is also appropriate.
| def _filter_to_traceloop(self, filter_obj: Filter) -> dict[str, Any] | None: | ||
| """Convert a Filter object to Traceloop API filter format. | ||
|
|
||
| Args: | ||
| filter_obj: Filter to convert | ||
|
|
||
| Returns: | ||
| Traceloop filter dict or None if not convertible | ||
| """ | ||
| field = filter_obj.field | ||
| operator = filter_obj.operator | ||
| value = filter_obj.value | ||
|
|
||
| # Convert gen_ai.* to llm.* for Traceloop backend | ||
| # Traceloop uses legacy llm.* naming convention instead of gen_ai.* | ||
| if field.startswith("gen_ai."): | ||
| # Map gen_ai.system -> llm.vendor (special case) | ||
| if field == GenAI.SYSTEM: | ||
| field = LegacyLLM.VENDOR | ||
| else: | ||
| # General mapping: gen_ai.* -> llm.* | ||
| field = field.replace("gen_ai.", "llm.", 1) | ||
| logger.debug( | ||
| f"Converted filter field to Traceloop format: {filter_obj.field} -> {field}" | ||
| ) | ||
|
|
||
| # Map field names to Traceloop API fields | ||
| if field == Service.NAME: | ||
| traceloop_field = Service.NAME | ||
| elif field == "name" or field == Fields.OPERATION_NAME: | ||
| traceloop_field = "span_name" | ||
| elif field == Fields.DURATION: | ||
| traceloop_field = Fields.DURATION | ||
| elif field == "duration_ms": | ||
| traceloop_field = Fields.DURATION | ||
| elif field == Fields.STATUS: | ||
| # Status filtering is not supported by Traceloop API - filter client-side | ||
| logger.debug("Status filter not supported by Traceloop, will apply client-side") | ||
| return None | ||
| elif field.startswith("traceloop."): | ||
| # Traceloop-specific attributes (like traceloop.span.kind) are supported directly | ||
| # without span_attributes. prefix | ||
| traceloop_field = field | ||
| elif field.startswith("llm."): | ||
| # LLM attributes in Traceloop don't need span_attributes. prefix | ||
| traceloop_field = field | ||
| else: | ||
| # Assume it's a span attribute - prefix with span_attributes. | ||
| traceloop_field = f"span_attributes.{field}" | ||
|
|
||
| # Map operators to Traceloop format | ||
| operator_map = { | ||
| FilterOperator.EQUALS: "equals", | ||
| FilterOperator.NOT_EQUALS: "not_equals", | ||
| FilterOperator.GT: "greater_than", | ||
| FilterOperator.LT: "less_than", | ||
| FilterOperator.GTE: "greater_than_or_equal", | ||
| FilterOperator.LTE: "less_than_or_equal", | ||
| } | ||
|
|
||
| traceloop_operator = operator_map.get(operator) | ||
| if not traceloop_operator: | ||
| logger.warning( | ||
| f"Operator {operator} not supported by Traceloop, will filter client-side" | ||
| ) | ||
| return None | ||
|
|
||
| # Map value_type to Traceloop format | ||
| value_type_map = { | ||
| FilterType.STRING: "string", | ||
| FilterType.NUMBER: "number", | ||
| FilterType.BOOLEAN: "boolean", | ||
| } | ||
| traceloop_value_type = value_type_map.get(filter_obj.value_type, "string") | ||
|
|
||
| if traceloop_value_type == "boolean": | ||
| serialized_value = "true" if value else "false" | ||
| else: | ||
| serialized_value = str(value) | ||
|
|
||
| return { | ||
| "field": traceloop_field, | ||
| "operator": traceloop_operator, | ||
| "value": serialized_value, | ||
| "value_type": traceloop_value_type, | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Tighten boolean handling in _filter_to_traceloop to avoid surprises
The mapping logic generally looks good, but the boolean serialization has a couple of edge cases:
- For
FilterType.BOOLEAN,serialized_value = "true" if value else "false"relies on truthiness. If a caller ever passes"false"(string) withvalue_type=BOOLEAN, it will serialize as"true". - There’s no explicit validation that
valueis actually aboolwhenvalue_typeisBOOLEAN.
Since Filter is public and could be constructed manually, consider:
- if traceloop_value_type == "boolean":
- serialized_value = "true" if value else "false"
+ if traceloop_value_type == "boolean":
+ if isinstance(value, bool):
+ serialized_value = "true" if value else "false"
+ elif isinstance(value, str):
+ v = value.strip().lower()
+ serialized_value = "true" if v in ("1", "true", "yes") else "false"
+ else:
+ raise ValueError(f"Boolean filter value must be bool or 'true'/'false', got {type(value)}")This keeps the API robust against mis-typed filters and avoids silently wrong queries.
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 452-537, tighten
boolean handling: when filter_obj.value_type is FilterType.BOOLEAN, validate and
normalize the value instead of using truthiness. Specifically, if value is a
bool serialize to "true"/"false"; if value is a string accept only
"true"/"false" (case-insensitive) and normalize; otherwise log a warning about
invalid boolean value and return None so the filter will be applied client-side
(avoid silently coercing arbitrary truthy values).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/opentelemetry_mcp/models.py (1)
85-157: Treat 0 ms durations as valid filters across helpers and query models.
min_duration_ms/max_duration_msare declared withge=0, so0is a valid value, but several places treat0as “no filter” due to truthiness checks:
- Line 149:
_convert_params_to_filtersusesif max_duration_ms:.- Lines 487–492, 620–625:
TraceQuery.has_filters/SpanQuery.has_filtersuseor self.min_duration_ms/or self.max_duration_ms.- Lines 536–541:
TraceQuery.to_backend_paramsusesif self.min_duration_ms:/if self.max_duration_ms:before emittingminDuration/maxDuration.This means
max_duration_ms=0is silently dropped in filter conversion and backend params, and both queries’has_filters()ignore 0, which is inconsistent with the type constraints and with the already-correctmin_duration_ms is not Nonecheck in_convert_params_to_filters.Consider tightening all these to explicit
is not Nonechecks so 0 is treated as an actual bound:@@ def _convert_params_to_filters( - if max_duration_ms: + if max_duration_ms is not None: @@ class TraceQuery(BaseModel): - or self.min_duration_ms - or self.max_duration_ms + or self.min_duration_ms is not None + or self.max_duration_ms is not None @@ def to_backend_params(self) -> dict[str, str | int]: - if self.min_duration_ms: + if self.min_duration_ms is not None: params["minDuration"] = f"{self.min_duration_ms}ms" - if self.max_duration_ms: + if self.max_duration_ms is not None: params["maxDuration"] = f"{self.max_duration_ms}ms" @@ class SpanQuery(BaseModel): - or self.min_duration_ms - or self.max_duration_ms + or self.min_duration_ms is not None + or self.max_duration_ms is not NoneThis extends the earlier feedback about
max_duration_msto all call sites that currently use truthiness for these fields.Also applies to: 485-498, 536-541, 596-603, 618-631, 639-650
src/opentelemetry_mcp/backends/tempo.py (1)
58-76: NOT_EQUALS status filters (e.g.has_error=False) are silently dropped for Tempo.In both
search_traces(Lines 70–77) andsearch_spans(Lines 173–181), filters are split into:
native_filters:f.operator in get_supported_operators()(and trace-level field for spans),client_filters: everything else.Because
FilterOperator.NOT_EQUALSis inget_supported_operators()and"status"is a trace-level field, a filter produced fromhas_error=False:
field="status",operator=FilterOperator.NOT_EQUALS,value="ERROR"is classified as native and removed from
client_filters.However,
_filter_to_traceql(Lines 418–427, 438–443) only handlesstatusforFilterOperator.EQUALSand returnsNonefor all other operators, including NOT_EQUALS. WhenNoneis returned, the filter is simply skipped and never re-applied client-side, sohas_error=Falsequeries do not actually exclude error traces/spans.You can fix this by handling NOT_EQUALS in the
statusbranch and emitting the corresponding TraceQL, e.g.:- elif field == "status": - # Special handling for status - if operator == FilterOperator.EQUALS: - if value == "ERROR": - return "status = error" - elif value == "OK": - return "status = ok" - return None + elif field == "status": + # Special handling for status (OK/ERROR) with both equality and inequality + if operator in (FilterOperator.EQUALS, FilterOperator.NOT_EQUALS): + op = "=" if operator == FilterOperator.EQUALS else "!=" + if value == "ERROR": + return f"status {op} error" + if value == "OK": + return f"status {op} ok" + return NoneThis keeps the native classification while ensuring both
has_error=Trueandhas_error=Falseare respected for Tempo queries.Also applies to: 141-181, 398-485
src/opentelemetry_mcp/backends/traceloop.py (1)
452-537: Tighten boolean value handling in_filter_to_traceloopto avoid truthiness surprises.When
value_typeisFilterType.BOOLEAN, the current serialization:if traceloop_value_type == "boolean": serialized_value = "true" if value else "false"uses Python truthiness. If a caller constructs
Filter(value="false", value_type=FilterType.BOOLEAN), this will serialize as"true", and non-bool, non-string values are silently coerced.To make this more robust for public
Filterusage, consider validating and normalizing booleans explicitly:- if traceloop_value_type == "boolean": - serialized_value = "true" if value else "false" - else: - serialized_value = str(value) + if traceloop_value_type == "boolean": + if isinstance(value, bool): + serialized_value = "true" if value else "false" + elif isinstance(value, str): + v = value.strip().lower() + serialized_value = "true" if v in ("1", "true", "yes") else "false" + else: + raise ValueError( + f"Boolean filter value must be bool or string 'true'/'false', got {type(value)}" + ) + else: + serialized_value = str(value)This keeps the API predictable and avoids silently wrong Traceloop queries when boolean filters are built manually.
🧹 Nitpick comments (2)
src/opentelemetry_mcp/backends/traceloop.py (1)
634-707: Remove unreachableelsebranch after empty-spans guard in_convert_spans_to_trace.After building
spans, you guard against the empty case:if not spans: logger.error(f"Failed to convert any spans for trace {trace_id}") return NoneGiven this early return, the subsequent
if spans: ... else: ...block (Lines 694–706) will always take theif spanspath; theelseis unreachable.You can simplify the duration calculation and avoid dead code:
- # Calculate trace duration - if spans: - start_times = [s.start_time for s in spans] - trace_start = min(start_times) - # Find the maximum end time - end_times = [ - datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) - for s in spans - ] - trace_end = max(end_times) - trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000 - else: - trace_start = root_span.start_time - trace_duration_ms = root_span.duration_ms + # Calculate trace duration + start_times = [s.start_time for s in spans] + trace_start = min(start_times) + end_times = [ + datetime.fromtimestamp(s.start_time.timestamp() + (s.duration_ms / 1000)) + for s in spans + ] + trace_end = max(end_times) + trace_duration_ms = (trace_end - trace_start).total_seconds() * 1000This makes the control flow clearer without changing behavior.
src/opentelemetry_mcp/server.py (1)
53-56: Consider explicit backend shutdown to close HTTP resources
_backendis created once and then kept for the process lifetime, but it’s never closed, even thoughBaseBackend.close()is available and will clean up underlying HTTP resources. This is fine for short‑lived CLIs, but for long‑running MCP servers it’s cleaner to hook shutdown into FastMCP’s lifecycle (if supported) or process teardown and callawait _backend.close()once.This is not a correctness bug today, just a lifecycle improvement to avoid relying on implicit GC/exit for connection cleanup.
Also applies to: 101-133
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/opentelemetry_mcp/attributes.py(1 hunks)src/opentelemetry_mcp/backends/jaeger.py(1 hunks)src/opentelemetry_mcp/backends/tempo.py(1 hunks)src/opentelemetry_mcp/backends/traceloop.py(1 hunks)src/opentelemetry_mcp/models.py(1 hunks)src/opentelemetry_mcp/server.py(1 hunks)src/opentelemetry_mcp/tools/search.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
📚 Learning: 2025-09-18T14:36:24.693Z
Learnt from: nina-kollman
Repo: traceloop/openllmetry PR: 3388
File: packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py:20-20
Timestamp: 2025-09-18T14:36:24.693Z
Learning: In packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py, the team is aware that the global self._server_name field is unsafe for multi-server or concurrent usage but has consciously decided not to support multiple FastMCP servers currently.
Applied to files:
src/opentelemetry_mcp/server.py
🧬 Code graph analysis (5)
src/opentelemetry_mcp/server.py (9)
src/opentelemetry_mcp/backends/base.py (6)
BaseBackend(12-160)search_spans(80-92)health_check(137-146)search_traces(65-77)get_trace(95-107)list_services(110-119)src/opentelemetry_mcp/tools/list_models.py (1)
list_models(11-106)src/opentelemetry_mcp/tools/search_spans.py (1)
search_spans(13-119)src/opentelemetry_mcp/tools/search.py (1)
search_traces(13-117)src/opentelemetry_mcp/tools/trace.py (1)
get_trace(10-80)src/opentelemetry_mcp/tools/usage.py (1)
get_llm_usage(11-114)src/opentelemetry_mcp/tools/services.py (1)
list_services(8-25)src/opentelemetry_mcp/tools/model_stats.py (1)
get_model_stats(43-168)src/opentelemetry_mcp/tools/slow_traces.py (1)
get_slow_traces(10-114)
src/opentelemetry_mcp/attributes.py (2)
src/opentelemetry_mcp/constants.py (1)
GenAI(75-111)src/opentelemetry_mcp/models.py (1)
gen_ai_system(248-250)
src/opentelemetry_mcp/models.py (2)
src/opentelemetry_mcp/attributes.py (4)
SpanAttributes(12-137)SpanEvent(140-157)to_dict(64-86)get(88-119)src/opentelemetry_mcp/constants.py (3)
Fields(186-196)GenAI(75-111)Service(172-176)
src/opentelemetry_mcp/tools/search.py (3)
src/opentelemetry_mcp/backends/base.py (2)
BaseBackend(12-160)search_traces(65-77)src/opentelemetry_mcp/models.py (6)
Filter(47-82)TraceQuery(460-559)TraceSummary(562-590)gen_ai_system(248-250)has_error(253-255)from_trace(577-590)src/opentelemetry_mcp/utils.py (1)
parse_iso_timestamp(6-23)
src/opentelemetry_mcp/backends/jaeger.py (3)
src/opentelemetry_mcp/attributes.py (4)
HealthCheckResponse(160-172)SpanAttributes(12-137)SpanEvent(140-157)get(88-119)src/opentelemetry_mcp/backends/filter_engine.py (4)
FilterEngine(13-284)apply_filters(18-18)apply_filters(22-22)apply_filters(25-55)src/opentelemetry_mcp/models.py (10)
FilterOperator(14-36)SpanData(228-255)SpanQuery(593-650)TraceData(428-457)TraceQuery(460-559)get_all_filters(500-517)get_all_filters(633-650)to_backend_params(519-559)has_error(253-255)gen_ai_system(248-250)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test (arm64)
🔇 Additional comments (2)
src/opentelemetry_mcp/tools/search.py (1)
13-115: search_traces wrapper looks solid and idiomatic.Nice layering from raw MCP params → validated
TraceQuery→ backend call →TraceSummaryJSON, with clear error handling for timestamp/filter/query/backend failures. I don’t see changes needed here.src/opentelemetry_mcp/backends/jaeger.py (1)
248-267: ****The review comment's assumption about the Jaeger operations response format is incorrect. VCR cassettes from the integration tests show the actual
/api/services/{service}/operationsendpoint returns{"data": [...]}wheredatacontains bare strings, not objects:{"data": ["/api/metrics/latencies", "/api/traces/{traceID}", "/api/services/{service}/operations", "GET", ...]}The current implementation correctly handles this:
return [str(op) for op in data.get("data", [])]Since
dataalready contains strings, callingstr()on each item returns it unchanged—the code works as intended. The proposed fix would introduce unnecessary complexity without addressing any actual bug, and the tests pass against recorded Jaeger responses confirming the current behavior is correct.Likely an incorrect or invalid review comment.
| def to_dict(self) -> dict[str, str | int | float | bool]: | ||
| """ | ||
| Convert to dictionary representation with dotted keys. | ||
|
|
||
| Returns only non-None values with their original dotted notation (e.g., "gen_ai.system"). | ||
| """ | ||
| result: dict[str, str | int | float | bool] = {} | ||
|
|
||
| # Map field names back to their aliases for proper serialization | ||
| for field_name, field_info in self.__class__.model_fields.items(): | ||
| value = getattr(self, field_name) | ||
| if value is not None: | ||
| # Use the alias if available, otherwise use field name | ||
| key = field_info.alias or field_name | ||
| result[key] = value | ||
|
|
||
| # Add extra fields that were allowed through ConfigDict(extra='allow') | ||
| if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__: | ||
| for key, value in self.__pydantic_extra__.items(): | ||
| if value is not None: | ||
| result[key] = value | ||
|
|
||
| return result | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Broaden attribute value types and avoid relying on pydantic internals
The helper methods assume attribute values are only str | int | float | bool, but the model defines list fields (e.g., gen_ai_response_finish_reasons, llm_response_finish_reasons). At runtime to_dict, get, and __getitem__ can all return lists, so the current type hints are inaccurate and will confuse static checking.
You’re also accessing __pydantic_extra__ directly, which is an internal pydantic detail. In pydantic v2 the public surface is model_extra; relying on the dunder name may break with upgrades.
Consider something along these lines:
-from typing import Literal
+from typing import Literal, TypeAlias
@@
-class SpanAttributes(BaseModel):
+SpanAttributeValue: TypeAlias = str | int | float | bool | list[str]
+
+
+class SpanAttributes(BaseModel):
@@
- def to_dict(self) -> dict[str, str | int | float | bool]:
+ def to_dict(self) -> dict[str, SpanAttributeValue]:
@@
- result: dict[str, str | int | float | bool] = {}
+ result: dict[str, SpanAttributeValue] = {}
@@
- # Add extra fields that were allowed through ConfigDict(extra='allow')
- if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
- for key, value in self.__pydantic_extra__.items():
- if value is not None:
- result[key] = value
+ # Add extra fields that were allowed through ConfigDict(extra='allow')
+ extra = getattr(self, "model_extra", None)
+ if extra:
+ for key, value in extra.items():
+ if value is not None:
+ result[key] = value # type: ignore[assignment]
@@
- def get(
- self, key: str, default: str | int | float | bool | None = None
- ) -> str | int | float | bool | None:
+ def get(
+ self, key: str, default: SpanAttributeValue | None = None
+ ) -> SpanAttributeValue | None:
@@
- # Try extra fields (dotted notation)
- if (
- hasattr(self, "__pydantic_extra__")
- and self.__pydantic_extra__
- and key in self.__pydantic_extra__
- ):
- return self.__pydantic_extra__[key] # type: ignore[no-any-return]
+ # Try extra fields (dotted notation)
+ extra = getattr(self, "model_extra", None)
+ if extra and key in extra:
+ return extra[key] # type: ignore[no-any-return]
@@
- def __getitem__(self, key: str) -> str | int | float | bool:
+ def __getitem__(self, key: str) -> SpanAttributeValue:This keeps behavior the same while aligning the types with the actual data and using pydantic’s public API for extras. Please double‑check the exact attribute name (model_extra) against your pinned pydantic version and adjust if it differs.
Also applies to: 88-137
🌐 Web query:
In pydantic v2, what is the documented public attribute or property for accessing extra fields on a `BaseModel` instance, and is `model_extra` the recommended way over using `__pydantic_extra__` directly?
💡 Result:
- The documented public property is BaseModel.model_extra (a read-only property returning the extra fields). [1]
- Yes — use model_extra (the public API). pydantic_extra is the internal attribute that model_extra returns, so prefer model_extra instead of accessing pydantic_extra directly. [1][2]
Sources:
[1] Pydantic BaseModel API — model_extra property.
[2] Pydantic Models docs — explanation that extra fields are stored on the instance (in pydantic_extra).
I need to run scripts in the repository to verify the actual structure. Let me do that now:
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the SpanAttributes model and inspect its field definitions
rg -n "finish_reasons|gen_ai_response_finish_reasons|llm_response_finish_reasons" src/opentelemetry_mcp/attributes.py
Length of output: 224
🏁 Script executed:
#!/bin/bash
# Inspect the attributes.py file, particularly around the specified lines and for list fields
head -150 src/opentelemetry_mcp/attributes.py | tail -100Length of output: 3760
🏁 Script executed:
#!/bin/bash
# Search for all list-typed fields in SpanAttributes
grep -n "list\[" src/opentelemetry_mcp/attributes.pyLength of output: 168
---
🏁 Script executed:
```shell
#!/bin/bash
# Check the pydantic version requirement
grep -i pydantic pyproject.toml requirements.txt 2>/dev/null | head -5
Length of output: 38
Broaden attribute value types and use pydantic's public API for extra fields
The model defines list-typed fields (gen_ai_response_finish_reasons, llm_response_finish_reasons at lines 38, 52), but to_dict(), get(), and __getitem__() all have type hints claiming only str | int | float | bool. At runtime these methods can return lists, making the type signatures inaccurate for static checking.
Additionally, all three methods access __pydantic_extra__ directly, which is an internal pydantic detail. In pydantic v2.12.0+ the public API is model_extra.
Apply the following pattern to to_dict() (lines 64–87), get() (lines 88–137), and __getitem__():
-from typing import Literal
+from typing import Literal, TypeAlias
@@
-class SpanAttributes(BaseModel):
+SpanAttributeValue: TypeAlias = str | int | float | bool | list[str]
+
+
+class SpanAttributes(BaseModel):
@@
- def to_dict(self) -> dict[str, str | int | float | bool]:
+ def to_dict(self) -> dict[str, SpanAttributeValue]:
@@
- result: dict[str, str | int | float | bool] = {}
+ result: dict[str, SpanAttributeValue] = {}
@@
- # Add extra fields that were allowed through ConfigDict(extra='allow')
- if hasattr(self, "__pydantic_extra__") and self.__pydantic_extra__:
- for key, value in self.__pydantic_extra__.items():
- if value is not None:
- result[key] = value
+ # Add extra fields that were allowed through ConfigDict(extra='allow')
+ extra = getattr(self, "model_extra", None)
+ if extra:
+ for key, value in extra.items():
+ if value is not None:
+ result[key] = value # type: ignore[assignment]
@@
- def get(
- self, key: str, default: str | int | float | bool | None = None
- ) -> str | int | float | bool | None:
+ def get(
+ self, key: str, default: SpanAttributeValue | None = None
+ ) -> SpanAttributeValue | None:
@@
- # Try extra fields (dotted notation)
- if (
- hasattr(self, "__pydantic_extra__")
- and self.__pydantic_extra__
- and key in self.__pydantic_extra__
- ):
- return self.__pydantic_extra__[key] # type: ignore[no-any-return]
+ # Try extra fields (dotted notation)
+ extra = getattr(self, "model_extra", None)
+ if extra and key in extra:
+ return extra[key] # type: ignore[no-any-return]
@@
- def __getitem__(self, key: str) -> str | int | float | bool:
+ def __getitem__(self, key: str) -> SpanAttributeValue:Committable suggestion skipped: line range outside the PR's diff.
| def __init__( | ||
| self, | ||
| url: str, | ||
| api_key: str | None = None, | ||
| timeout: float = 30.0, | ||
| environments: list[str] | None = None, | ||
| ): | ||
| """Initialize Traceloop backend. | ||
|
|
||
| Args: | ||
| url: Traceloop API base URL (e.g., https://api.traceloop.com) | ||
| api_key: API key for authentication (required, contains project info) | ||
| timeout: Request timeout in seconds | ||
| environments: List of environments to query (default: ["production"]) | ||
| """ | ||
| super().__init__(url, api_key, timeout) | ||
|
|
||
| if not self.api_key: | ||
| raise ValueError("Traceloop backend requires an API key") | ||
|
|
||
| # Use "default" as project_id - Traceloop resolves actual project from API key | ||
| self.project_id = "default" | ||
|
|
||
| # Store environments for all API requests | ||
| self.environments = environments if environments else ["prd"] | ||
|
|
There was a problem hiding this comment.
Align environments default in docstring with implementation.
The __init__ docstring says the default environments are ["production"], but the code actually uses ["prd"] when environments is None:
self.environments = environments if environments else ["prd"]Either update the docstring to reflect ["prd"] or change the default value to ["production"] so behavior matches the documentation.
🤖 Prompt for AI Agents
In src/opentelemetry_mcp/backends/traceloop.py around lines 32 to 57, the
docstring states the default environments is ["production"] but the code sets
self.environments to ["prd"] when environments is None; make them consistent by
either updating the docstring to indicate the default is ["prd"] or changing the
fallback in code to ["production"]. Pick one approach and apply it throughout
(docstring and code) so the documented default matches the actual runtime
default.
Fixes TLP-1126
Fixes TLP-1128
Summary by CodeRabbit
New Features
Documentation
Chores
Tests