Skip to content

MGMT-21387: Add metrics for just the assisted service API calls#67

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift-assisted:masterfrom
carbonin:assisted-service-metrics
Aug 17, 2025
Merged

MGMT-21387: Add metrics for just the assisted service API calls#67
openshift-merge-bot[bot] merged 2 commits intoopenshift-assisted:masterfrom
carbonin:assisted-service-metrics

Conversation

@carbonin
Copy link
Copy Markdown
Collaborator

@carbonin carbonin commented Aug 15, 2025

Ideally this should be nearly the whole request, but if there is a difference we want to know about it.

Also skew the buckets toward the smaller end since we don't expect many 10s or longer requests

https://issues.redhat.com/browse/MGMT-21387

Summary by CodeRabbit

  • New Features

    • Added detailed per-method API latency metrics, enabling finer-grained observability in dashboards.
    • Refined latency histogram buckets for tool requests to improve measurement accuracy.
  • Refactor

    • Centralized API call timing through a common wrapper for consistent latency tracking.
    • Reorganized imports to use a dedicated metrics module; no runtime behavior changes.
  • Chores

    • Streamlined metrics module exposure and cleaned up redundant public exports to simplify usage.

Ideally this should be nearly the whole request, but if there is a
difference we want to know about it.

Also skew the buckets toward the smaller end since we don't expect many
10s or longer requests

https://issues.redhat.com/browse/MGMT-21387
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 15, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Aug 15, 2025

@carbonin: This pull request references MGMT-21387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set.

Details

In response to this:

Ideally this should be nearly the whole request, but if there is a difference we want to know about it.

Also skew the buckets toward the smaller end since we don't expect many 10s or longer requests

https://issues.redhat.com/browse/MGMT-21387

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from eranco74 and omertuc August 15, 2025 13:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 15, 2025

Walkthrough

Consolidates metrics into a dedicated metrics package, updates histogram buckets, adds API_CALL_LATENCY, centralizes API call timing via a new internal _api_call wrapper, and adjusts imports accordingly without changing public method signatures or runtime behavior.

Changes

Cohort / File(s) Summary
Metrics package exports
metrics/__init__.py
Adds package initializer that re-exports initiate_metrics, metrics, track_tool_usage via all.
Metrics definitions and buckets
metrics/metrics.py
Refines MCP tool request histogram buckets; introduces API_CALL_LATENCY histogram labeled by api_method.
Server import adjustments
server.py
Switches metrics imports from service_client to metrics; imports log from service_client.logger explicitly.
Service client public API surface
service_client/__init__.py
Removes metrics-related re-exports; all now only includes InventoryClient and log.
InventoryClient API timing wrapper
service_client/assisted_service_api.py
Adds private _api_call to wrap asyncio.to_thread calls and time via API_CALL_LATENCY; updates all API methods to use the wrapper; public signatures unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant InventoryClient
  participant _api_call
  participant SDK_Function
  participant Metrics

  Caller->>InventoryClient: get_cluster(...)/list_clusters(...)/...
  InventoryClient->>_api_call: _api_call(func, *args)
  _api_call->>Metrics: API_CALL_LATENCY.labels(api_method).time()
  _api_call->>SDK_Function: asyncio.to_thread(func, *args)
  SDK_Function-->>_api_call: result
  _api_call-->>InventoryClient: result
  InventoryClient-->>Caller: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related PRs

Suggested labels

size/L, ok-to-test

Suggested reviewers

  • jhernand

Poem

A whisk of whiskers, timers set—tick-tock!
I hop through calls, I measure every clock.
New buckets brim, API beats align,
A metric nibble, latency so fine.
Carrot in paw, I chart the line—
Hop, track, report—our service shines! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a78492e and 3c157d2.

📒 Files selected for processing (5)
  • metrics/__init__.py (1 hunks)
  • metrics/metrics.py (1 hunks)
  • server.py (1 hunks)
  • service_client/__init__.py (1 hunks)
  • service_client/assisted_service_api.py (19 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
server.py (2)
service_client/assisted_service_api.py (1)
  • InventoryClient (25-536)
metrics/metrics.py (3)
  • metrics (69-71)
  • track_tool_usage (51-65)
  • initiate_metrics (44-48)
service_client/assisted_service_api.py (1)
metrics/metrics.py (1)
  • metrics (69-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (7)
service_client/__init__.py (1)

11-11: LGTM! Clean separation of concerns.

The removal of metrics-related exports aligns with the architectural refactoring to consolidate metrics functionality into a dedicated package. This improves modularity and follows the single responsibility principle.

server.py (1)

19-21: LGTM! Import reorganization aligns with metrics consolidation.

The import changes successfully implement the architectural refactoring:

  • Metrics-related functionality (metrics, track_tool_usage, initiate_metrics) now imported from the dedicated metrics package
  • log explicitly imported from service_client.logger for clarity
  • InventoryClient remains the primary export from service_client

This improves separation of concerns and makes the dependency structure more explicit.

metrics/__init__.py (1)

1-10: LGTM! Clean public API establishment for metrics package.

The module properly establishes a public API by:

  • Adding clear documentation about the metrics functionality
  • Re-exporting the core metrics functions (initiate_metrics, metrics, track_tool_usage)
  • Using __all__ to explicitly define the public interface

This enables clean imports from external modules while maintaining encapsulation.

metrics/metrics.py (1)

32-41: LGTM! Histogram bucket adjustments align with PR objectives.

The changes successfully implement the PR requirements:

  1. Updated REQUEST_LATENCY buckets (Line 32): Changed from (0.1, 1.0, 10.0, 30.0, float("inf")) to (0.05, 0.1, 0.5, 1.0, 5.0, 10.0, float("inf")) - properly skewed toward smaller values as requested

  2. New API_CALL_LATENCY histogram (Lines 36-41): Added with appropriate configuration for tracking assisted service API call durations, using the same optimized bucket structure

The bucket configuration correctly addresses the PR objective of skewing toward smaller values since long requests (10s+) are not expected, with most requests likely falling in the sub-second range.

service_client/assisted_service_api.py (3)

11-11: LGTM! Clean imports for generic typing and metrics integration.

The additions properly support the new _api_call wrapper:

  • Callable and TypeVar for generic method signature support
  • API_CALL_LATENCY import from the metrics package for latency instrumentation
  • T = TypeVar("T") for generic return type preservation

Also applies to: 20-22


46-62: LGTM! Centralized API call wrapper with proper latency tracking.

The _api_call method provides excellent centralization of concerns:

Strengths:

  • Consistent latency tracking: Uses API_CALL_LATENCY with method-specific labels
  • Preserved typing: Generic T return type maintains type safety
  • Thread safety: Properly uses asyncio.to_thread for synchronous API calls
  • Error handling: Maintains existing exception handling through decorators

Implementation details:

  • Method name extraction via getattr(func, "__name__", "unknown_api_method") provides meaningful metrics labels
  • Context manager pattern ensures accurate timing measurement
  • Generic typing preserves API contract integrity

This approach successfully centralizes latency instrumentation while maintaining the existing public API signatures and behavior.


137-143: LGTM! Consistent API call instrumentation across all methods.

All public API methods have been systematically updated to use the centralized _api_call wrapper, which provides:

Benefits:

  • Uniform latency tracking: Every assisted service API call is now properly instrumented
  • Preserved functionality: All methods maintain their original signatures and behavior
  • Consistent error handling: The @sanitize_exceptions decorators continue to work as expected
  • Type safety: Return types are preserved through the generic _api_call wrapper

Coverage verification: The instrumentation covers all major API operations:

  • Cluster management (get, list, create, update, install)
  • Infrastructure environments (get, list, create, update, download URLs)
  • Events (cluster and host events)
  • Versions and operators (list, get, add bundles)
  • Host management (update)
  • Credentials (presigned URLs)

This comprehensive instrumentation aligns perfectly with the PR objective to "capture metrics representing nearly the whole request body for assisted service API calls."

Also applies to: 154-156, 190-200, 214-218, 232-239, 274-278, 298-303, 321-327, 358-364, 378-382, 398-403, 414-416, 433-433, 469-477, 499-509, 528-536

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Aug 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 15, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Aug 15, 2025

@carbonin: This pull request references MGMT-21387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set.

Details

In response to this:

Ideally this should be nearly the whole request, but if there is a difference we want to know about it.

Also skew the buckets toward the smaller end since we don't expect many 10s or longer requests

https://issues.redhat.com/browse/MGMT-21387

Summary by CodeRabbit

  • New Features

  • Added detailed per-method API latency metrics, enabling finer-grained observability in dashboards.

  • Refined latency histogram buckets for tool requests to improve measurement accuracy.

  • Refactor

  • Centralized API call timing through a common wrapper for consistent latency tracking.

  • Reorganized imports to use a dedicated metrics module; no runtime behavior changes.

  • Chores

  • Streamlined metrics module exposure and cleaned up redundant public exports to simplify usage.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@carbonin
Copy link
Copy Markdown
Collaborator Author

/retest

Copy link
Copy Markdown
Collaborator

@eranco74 eranco74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2025
@openshift-merge-bot openshift-merge-bot Bot merged commit 8463fc5 into openshift-assisted:master Aug 17, 2025
15 checks passed
@maorfr maorfr mentioned this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants