Implement Prometheus metrics endpoint#35
Conversation
|
Warning Rate limit exceeded@maorfr has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 24 minutes and 20 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (5)
WalkthroughPrometheus metrics tracking and exposure have been added to the MCP server. A new metrics module defines request counters and latency histograms, and all tool functions are now decorated for metrics collection. The server now exposes a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MCP_Server
participant Prometheus
Client->>MCP_Server: Call tool function (e.g., list_clusters)
MCP_Server->>MCP_Server: track_tool_usage decorator increments request count and records latency
MCP_Server-->>Client: Respond with tool result
Prometheus->>MCP_Server: GET /metrics
MCP_Server-->>Prometheus: Return current metrics in Prometheus format
Estimated code review effort3 (90–240 minutes) Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Welcome @maorfr! It looks like this is your first PR to openshift-assisted/assisted-service-mcp 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
README.md (1)
148-152: Improve documentation readabilityThe documentation accurately describes the metrics, but the repetitive sentence structure makes it harder to read.
Consider restructuring for better readability:
-* **assisted_service_mcp_tool_request_count** - Number of tool requests. -* **assisted_service_mcp_tool_request_duration_sum** - Total time to run the tool, in seconds. -* **assisted_service_mcp_tool_request_duration_count** - Total number of tool requests measured. -* **assisted_service_mcp_tool_request_duration_bucket** - Number of tool requests organized in buckets. +* **assisted_service_mcp_tool_request_count** - Total number of tool requests +* **assisted_service_mcp_tool_request_duration** - Histogram tracking request latency with the following components: + * **_sum** - Total time to run tools, in seconds + * **_count** - Total number of requests measured + * **_bucket** - Requests organized by latency bucketsservice_client/metrics.py (2)
27-32: Consider adjusting histogram buckets for better observabilityThe histogram buckets (0.1, 1.0, 10.0, 30.0, inf) may not provide optimal granularity for API request latencies. Most API calls would likely fall in the 0.1-1.0 second range.
Consider using more granular buckets for better observability:
- buckets=(0.1, 1.0, 10.0, 30.0, float("inf")), + buckets=(0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, float("inf")),This provides better granularity for typical API response times.
52-53: Add type hints and parameter validationThe metrics endpoint function lacks type hints and doesn't validate the request parameter.
-async def metrics(request): +async def metrics(request: Any) -> PlainTextResponse: + """Serve Prometheus metrics endpoint""" return PlainTextResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(1 hunks)pyproject.toml(1 hunks)server.py(15 hunks)service_client/__init__.py(1 hunks)service_client/metrics.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
service_client/__init__.py (1)
service_client/metrics.py (2)
metrics(52-53)track_tool_usage(35-48)
server.py (2)
service_client/assisted_service_api.py (1)
InventoryClient(22-723)service_client/metrics.py (2)
metrics(52-53)track_tool_usage(35-48)
🪛 LanguageTool
README.md
[style] ~150-~150: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l time to run the tool, in seconds. * **assisted_service_mcp_tool_request_duration_count...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~151-~151: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l number of tool requests measured. * **assisted_service_mcp_tool_request_duration_bucke...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (7)
service_client/__init__.py (1)
10-12: LGTM: Clean module exportsThe imports and
__all__exports follow Python best practices and correctly expose the new metrics functionality.service_client/metrics.py (2)
20-24: LGTM: Well-defined metrics counterThe REQUEST_COUNT metric is properly defined with appropriate name, description, and labels.
35-48: Add exception handling to the metrics decoratorThe decorator doesn't handle exceptions, which means failed requests won't be counted and timing context won't be properly closed.
def track_tool_usage(tool_name: str) -> Callable: """Decorator to track tool usage metrics""" def decorator(func: Callable) -> Callable: @wraps(func) async def wrapper(*args, **kwargs) -> Any: REQUEST_COUNT.labels(tool=tool_name).inc() - with REQUEST_LATENCY.labels(tool=tool_name).time(): - response = await func(*args, **kwargs) - return response + try: + with REQUEST_LATENCY.labels(tool=tool_name).time(): + response = await func(*args, **kwargs) + return response + except Exception: + # Re-raise the exception after metrics are recorded + raise return wrapper return decoratorThis ensures metrics are properly recorded even when functions raise exceptions.
Likely an incorrect or invalid review comment.
server.py (4)
13-13: LGTM: Required import additionThe uvicorn import is correctly added to support the new server setup.
18-18: LGTM: Metrics importsThe imports for metrics functionality are correctly added.
548-550: LGTM: Proper server setup with metrics endpointThe server setup correctly:
- Creates an SSE app from the MCP instance
- Adds the
/metricsroute for Prometheus scraping- Uses uvicorn to run the server on all interfaces
The switch from
mcp.run(transport="sse")to this approach enables both SSE transport and HTTP metrics endpoint.
122-122: All @track_tool_usage decorator names match their functions
Verified that the decorator arguments align exactly with their correspondingasync defnames throughout server.py—no mismatches found.
|
/ok-to-test |
|
/lgtm |
jhernand
left a comment
There was a problem hiding this comment.
Looks good to me. I only have a minor suggestion to avoid repeating the name of the tool in the metrics decorator.
|
@jhernand: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
New changes are detected. LGTM label has been removed. |
320498c to
50bcdbf
Compare
|
@jhernand: changing LGTM is restricted to collaborators DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jhernand, maorfr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Part of https://issues.redhat.com/browse/MGMT-21159
Implemented Prometheus metrics endpoint and instrumentation for MCP tool usage:
The implementation is done using a decorator (inspired by redhat-community-ai-tools/jira-mcp-snowflake#12).
An alternative I considered was using a Middleware (inspired by #5), but it is challenging to pass the MCP tool name in such a way.
Summary by CodeRabbit
/metrics.