MGMT-21844: Enable running Signatures regardless of log bundle availability#128
MGMT-21844: Enable running Signatures regardless of log bundle availability#128CrystalChun wants to merge 9 commits intoopenshift-assisted:masterfrom
Conversation
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: CrystalChun The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughIntroduce a parameterless ClusterAnalyzer base class and expose it in the package API; refactor LogAnalyzer to subclass it and load archives when available; add per-signature Changes
Sequence Diagram(s)sequenceDiagram
participant Main as analyze_cluster
participant API as assisted-service API
participant ClusterA as ClusterAnalyzer
participant LogA as LogAnalyzer
participant Sig as Signature
Main->>API: get_cluster(cluster_id)
API-->>Main: cluster (with logs_info)
alt logs_info.completed == True
Main->>LogA: instantiate with logs_archive
Note right of LogA #D6EAF8: Load metadata & events from archive
LogA->>LogA: set metadata & events
Main->>Sig: run selected signatures (all allowed)
else logs not completed
Main->>ClusterA: instantiate ClusterAnalyzer()
Main->>ClusterA: set_cluster_metadata(metadata)
Main->>ClusterA: set_cluster_events(parsed_events)
Main->>Sig: run signatures where logs_required == False
end
Sig-->>Main: SignatureResult(s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
59-86: Consider extracting datetime parsing logic to reduce duplication.The datetime/string handling pattern (lines 61-65 and 70-74) is duplicated and also appears in
format_time. This could be extracted into a helper method.Consider adding a helper method:
@staticmethod def _parse_datetime(value: str | datetime) -> datetime: """Parse datetime from string or return datetime object.""" if isinstance(value, str): return dateutil.parser.isoparse(value) return valueThen simplify the code:
- install_started_at = md["cluster"]["install_started_at"] - # Handle both datetime objects and ISO strings - if isinstance(install_started_at, str): - installation_start_time = dateutil.parser.isoparse(install_started_at) - else: - # It's already a datetime object - installation_start_time = install_started_at + installation_start_time = self._parse_datetime(md["cluster"]["install_started_at"])And similar for the
deleted_athandling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
assisted_service_mcp/src/utils/log_analyzer/__init__.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py(2 hunks)assisted_service_mcp/src/utils/log_analyzer/main.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py(2 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py(1 hunks)tests/test_log_analyzer.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (1)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_cluster_logs(157-170)get_events(173-214)
assisted_service_mcp/src/utils/log_analyzer/main.py (2)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_events(173-214)get_cluster_logs(157-170)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
ClusterAnalyzer(22-141)LogAnalyzer(144-271)set_cluster_events(36-38)set_cluster_metadata(29-34)
assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
ClusterAnalyzer(22-141)LogAnalyzer(144-271)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
tests/test_log_analyzer.py (1)
get(10-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (17)
assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1)
39-40: LGTM! Correctly marked as not requiring logs.The
SNOHostnameHasEtcdsignature only accesses cluster metadata and uses theget_hostname()helper, neither of which require log archives. Settinglogs_required = Falseis appropriate.assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)
82-83: LGTM! Correctly marked as not requiring logs.The
EventsInstallationAttemptssignature only accesses cluster events via methods that work without log archives. Settinglogs_required = Falseis appropriate.assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1)
17-18: LGTM! Correctly marked as not requiring logs.The
ComponentsVersionSignaturesignature only accesses cluster metadata (release_tag, versions). Settinglogs_required = Falseis appropriate.assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1)
22-23: LGTM! Correctly marked as not requiring logs.The
LibvirtRebootFlagSignaturesignature only accesses cluster metadata and host inventory JSON (from metadata). Settinglogs_required = Falseis appropriate.assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)
30-31: LGTM! Correctly marked as not requiring logs.The
SNOMachineCidrSignaturesignature only accesses cluster metadata and host inventory (from metadata). Settinglogs_required = Falseis appropriate.assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1)
19-19: LGTM! Correctly marked as not requiring logs.The
SlowImageDownloadSignaturesignature only accesses cluster events to extract image download rates. Settinglogs_required = Falseis appropriate.tests/test_log_analyzer.py (2)
3-3: LGTM! Correct import for async mocking.Adding
AsyncMockimport is necessary for properly mocking the async methods in the client (get_cluster,get_cluster_logs,get_events).
96-111: LGTM! Properly structured mocks for the updated flow.The test correctly mocks the async client and cluster object:
AsyncMockis appropriate for async methodsfake_cluster.logs_info = "completed"triggers the logs-available code pathfake_cluster.to_dict()provides the expected cluster metadata structurefake_archive.get()returns a JSON string matching the expected formatThe test validates the happy path with an empty signatures list.
assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)
7-7: LGTM! Correctly exposes the new ClusterAnalyzer base class.Adding
ClusterAnalyzerto the imports makes the new base class available for consumers who need to analyze clusters without log archives.
12-12: LGTM! Correctly adds ClusterAnalyzer to the public API.Including
ClusterAnalyzerin__all__properly exposes it as part of the module's public interface while maintaining backward compatibility with existingLogAnalyzerusage.assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
54-55: LGTM! Conservative default protects existing behavior.The
logs_requiredclass attribute is well-designed with a safe default that ensures existing signatures continue to require logs unless explicitly marked otherwise.assisted_service_mcp/src/utils/log_analyzer/main.py (2)
6-6: LGTM! Imports aligned with refactored architecture.Also applies to: 10-18
103-116: LGTM! Clean helper with appropriate warning for unknown signatures.assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
22-38: LGTM! Clean base class with dependency injection via setters.The
ClusterAnalyzerbase class enables analysis without log archives by accepting metadata and events via setters. The metadata wrapping logic (lines 31-33) handles the structural difference between API responses and log bundle formats.
45-54: LGTM! Safe default behavior for missing events.The
get_all_cluster_eventsmethod returns an empty list when events haven't been set, which is a safe default that prevents downstream code from handlingNone.
130-141: LGTM! Robust hostname extraction with sensible fallbacks.The method correctly tries
requested_hostname, then parses the inventory JSON, and finally falls back to the host ID. The error handling appropriately catches JSON parsing failures.
144-188: LGTM! Clean subclass specialization for archive-based analysis.The
LogAnalyzersubclass appropriately overridesmetadataandget_all_cluster_eventsto load from the log archive instead of requiring injection via setters. The error handling (line 186) aligns with the base class behavior.
| # first call the api to get the cluster and check if logs are available | ||
| cluster = await api_client.get_cluster(cluster_id) | ||
|
|
||
| if cluster.logs_info != "completed": | ||
| logger.info( | ||
| "Logs are not available for cluster: %s\nDefaulting to signatures that don't require logs", | ||
| cluster_id, | ||
| ) | ||
|
|
||
| analyzer = ClusterAnalyzer() | ||
|
|
||
| # Call events API to get the events and set the events in the analyzer | ||
| events = await api_client.get_events(cluster_id) | ||
| analyzer.set_cluster_events(json.loads(events)) | ||
|
|
||
| # Set the cluster metadata in the analyzer | ||
| analyzer.set_cluster_metadata(cluster.to_dict()) | ||
|
|
||
| # Select signatures that don't require logs | ||
| signatures_to_run = [ | ||
| sig for sig in ALL_SIGNATURES if sig.logs_required is False | ||
| ] | ||
|
|
||
| # Initialize log analyzer | ||
| log_analyzer = LogAnalyzer(logs_archive) | ||
| else: | ||
| # Download logs | ||
| logs_archive = await api_client.get_cluster_logs(cluster_id) | ||
|
|
||
| # Determine which signatures to run | ||
| signatures_to_run = ALL_SIGNATURES | ||
| # Initialize log analyzer | ||
| analyzer = LogAnalyzer(logs_archive) | ||
|
|
||
| # Add all signatures to the list to run | ||
| signatures_to_run = ALL_SIGNATURES | ||
|
|
||
| # If specific signatures are provided, filter the signatures to run | ||
| if specific_signatures: | ||
| signature_classes = {sig.__name__: sig for sig in ALL_SIGNATURES} | ||
| signatures_to_run = [] | ||
| for sig_name in specific_signatures: | ||
| if sig_name in signature_classes: | ||
| signatures_to_run.append(signature_classes[sig_name]) | ||
| else: | ||
| logger.warning("Unknown signature: %s", sig_name) | ||
| signatures_to_run = filter_signatures( | ||
| signatures_to_run, specific_signatures | ||
| ) |
There was a problem hiding this comment.
Add warning when user-requested signatures are skipped due to log unavailability.
When logs are unavailable, the code filters to signatures where logs_required=False (lines 62-64), then applies user-requested specific_signatures (lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without any feedback.
Consider adding a warning after line 80:
# If specific signatures are provided, filter the signatures to run
if specific_signatures:
+ # Track which signatures were filtered out
+ before_filter = set(sig.__name__ for sig in signatures_to_run)
signatures_to_run = filter_signatures(
signatures_to_run, specific_signatures
)
+ after_filter = set(sig.__name__ for sig in signatures_to_run)
+ skipped = set(specific_signatures) - after_filter
+ if skipped and cluster.logs_info != "completed":
+ logger.warning(
+ "Skipped signatures requiring logs (logs unavailable): %s",
+ ", ".join(sorted(skipped))
+ )🤖 Prompt for AI Agents
In assisted_service_mcp/src/utils/log_analyzer/main.py around lines 43 to 80,
when logs are unavailable we restrict to signatures with logs_required=False and
then apply user-requested specific_signatures, which can silently skip requested
signatures that require logs; after applying the specific_signatures filter
(i.e., after line ~80) detect the case where cluster.logs_info != "completed"
and specific_signatures was provided, compute which requested signatures were
removed (requested minus resulting), and emit a logger.warning listing those
skipped signatures and that they were skipped because logs are unavailable;
ensure signatures_to_run is a concrete list before computing the difference so
the comparison works.
|
I didn't look deeply into this, but I will say if the goal is to do some analysis without the logs then we should probably change the description of the tool that calls all of this. I think right now the description is very specific about needing logs available to run. |
1a30d36 to
0ee69cd
Compare
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
80-92: Type-safe time formatting and narrowed exceptions — LGTM.Accepts str | datetime and avoids blanket Exception; matches earlier review guidance.
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
76-80: Warn when requested signatures are skipped due to missing logs.Currently silent. Add a targeted warning to improve UX. (This mirrors a prior suggestion.)
# If specific signatures are provided, filter the signatures to run if specific_signatures: signatures_to_run = filter_signatures( signatures_to_run, specific_signatures ) + # Inform users when requested signatures require logs but logs are unavailable + if cluster.logs_info != "completed": + requires_logs = { + sig.__name__ for sig in ALL_SIGNATURES if getattr(sig, "logs_required", True) + } + skipped = sorted(set(specific_signatures) & requires_logs) + if skipped: + logger.warning( + "Skipped signatures requiring logs (logs unavailable): %s", + ", ".join(skipped), + )
🧹 Nitpick comments (4)
assisted_service_mcp/src/tools/cluster_tools.py (1)
367-388: Function name no longer matches its behavior; clarify vague docstring terms.The updated docstring correctly reflects that logs are optional and addresses the reviewer's concern. However:
Naming inconsistency: The function name
analyze_cluster_logsimplies logs are the primary or required input, but the docstring now states logs are optional ("if available"). This creates a mismatch between the function name and its actual behavior. Consider renaming toanalyze_clusterto better reflect the broader scope, though be aware this would be a breaking API change.Vague terminology:
- "cluster's data" (line 374) is unclear—does this refer to cluster metadata, events, configuration, or something else?
- "Cluster is available (downloadable via the API)" (line 380) is ambiguous—what specifically is "downloadable"? The cluster metadata, the logs, or both?
Please clarify these terms in the docstring to help users understand what analysis is performed and what prerequisites are actually required.
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
54-55: Make logs_required a ClassVar and document the contract.Prevents accidental instance shadowing and clarifies intent for subclass authors.
-from typing import Optional, Any, Sequence +from typing import Optional, Any, Sequence, ClassVar ... -class Signature(abc.ABC): +class Signature(abc.ABC): @@ - logs_required = True + # Subclasses may override to opt into running without logs. + logs_required: ClassVar[bool] = Trueassisted_service_mcp/src/utils/log_analyzer/main.py (1)
91-100: Log exceptions with stack traces for easier debugging.Use logger.exception to include traceback.
- except Exception as e: - logger.error( - "Error running signature %s: %s", signature_class.__name__, e - ) + except Exception: + logger.exception("Error running signature %s", signature_class.__name__) @@ - except Exception as e: - logger.error("Error analyzing cluster %s: %s", cluster_id, e) + except Exception: + logger.exception("Error analyzing cluster %s", cluster_id) raiseassisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
171-173: Prefer logger.exception for metadata load failures.Adds traceback to aid debugging.
- except Exception as e: - logger.error("Failed to load metadata: %s", e) + except Exception: + logger.exception("Failed to load metadata") raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
assisted_service_mcp/src/tools/cluster_tools.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py(2 hunks)assisted_service_mcp/src/utils/log_analyzer/main.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py(3 hunks)tests/test_log_analyzer.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_log_analyzer.py
🧰 Additional context used
🧬 Code graph analysis (2)
assisted_service_mcp/src/utils/log_analyzer/main.py (2)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_events(173-214)get_cluster_logs(157-170)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
ClusterAnalyzer(22-141)LogAnalyzer(144-271)set_cluster_events(36-38)set_cluster_metadata(29-34)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
tests/test_log_analyzer.py (1)
get(10-15)assisted_service_mcp/src/tools/event_tools.py (1)
cluster_events(12-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (1)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
130-142: Hostname extraction utility — LGTM.Clear precedence and safe fallbacks.
logs_required will indicate if a signature requires logs downloaded from the cluster to be run. It will default to True for all signatures unless marked otherwise. The signatures that do not require logs have been marked. The following are signatures that do not need logs: - ComponentsVersionSignature - SlowImageDownloadSignature - SNOMachineCidrSignature - SNOHostnameHasEtcd - EventsInstallationAttempts - LibvirtRebootFlagSignature
This class will be the parent class of LogAnalyzer and will contain the cluster metadata and events for a cluster.
This function was doing what the get last install cluster event was also doing. Removed the partitioning so that it returns all cluster events as it is named.
Moves the functions that are common to both ClusterAnalyzer and LogAnalyzer to the parent class (ClusterAnalyzer) so both can access them.
0ee69cd to
4eb8866
Compare
4eb8866 to
55949bc
Compare
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
57-97: Add error handling for timestamp parsing and missing hosts.The current implementation has two issues:
Lines 64-73, 78-85: The
match/caseblocks handle type checking but don't catch parsing exceptions fromdateutil.parser.isoparse(), which can raiseValueErroror other exceptions.Line 89: Assumes
md["cluster"]["hosts"]exists, which could raiseKeyError.Apply this diff to add proper error handling:
@staticmethod def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - install_started_at = md.get("cluster", {}).get("install_started_at") - if not install_started_at: - return md - - match install_started_at: - case str(): - installation_start_time = dateutil.parser.isoparse(install_started_at) - case datetime(): - installation_start_time = install_started_at - case _: - logger.error( - "Unable to parse install started at: %s", install_started_at - ) - return md + cluster = md.get("cluster", {}) + install_started_at = cluster.get("install_started_at") + installation_start_time = None + + if install_started_at: + try: + match install_started_at: + case str(): + installation_start_time = dateutil.parser.isoparse(install_started_at) + case datetime(): + installation_start_time = install_started_at + case _: + logger.debug( + "Unable to parse install_started_at (unknown type): %s", install_started_at + ) + except (ValueError, TypeError) as e: + logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e) def host_deleted_before_installation_started(host): + if not installation_start_time: + return False if deleted_at := host.get("deleted_at"): - # Handle both datetime objects and ISO strings - match deleted_at: - case str(): - deleted_at_time = dateutil.parser.isoparse(deleted_at) - case datetime(): - deleted_at_time = deleted_at - case _: - logger.error("Unable to parse deleted at: %s", deleted_at) - return False - return deleted_at_time < installation_start_time + try: + match deleted_at: + case str(): + deleted_at_time = dateutil.parser.isoparse(deleted_at) + case datetime(): + deleted_at_time = deleted_at + case _: + logger.debug("Unable to parse deleted_at (unknown type): %s", deleted_at) + return False + return deleted_at_time < installation_start_time + except (ValueError, TypeError) as e: + logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e) + return False return False - all_hosts = md["cluster"]["hosts"] - md["cluster"]["deleted_hosts"] = [ + all_hosts = list(cluster.get("hosts", [])) + cluster["deleted_hosts"] = [ h for h in all_hosts if host_deleted_before_installation_started(h) ] - md["cluster"]["hosts"] = [ + cluster["hosts"] = [ h for h in all_hosts if not host_deleted_before_installation_started(h) ] + + md["cluster"] = cluster return mdassisted_service_mcp/src/utils/log_analyzer/main.py (1)
77-80: Add warning when user-requested signatures are skipped due to log unavailability.When logs are unavailable, signatures are filtered to those with
logs_required=False(lines 62-64), then user-requestedspecific_signaturesare applied (lines 77-80). If a user requests a signature requiring logs when logs aren't available, it will be silently skipped.Consider logging which requested signatures were skipped:
# If specific signatures are provided, filter the signatures to run if specific_signatures: + before_filter = {sig.__name__ for sig in signatures_to_run} signatures_to_run = filter_signatures( signatures_to_run, specific_signatures ) + after_filter = {sig.__name__ for sig in signatures_to_run} + skipped = set(specific_signatures) - after_filter + if skipped and cluster.logs_info != "completed": + logger.warning( + "Skipped signatures requiring logs (logs unavailable): %s", + ", ".join(sorted(skipped)) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
assisted_service_mcp/src/tools/cluster_tools.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/__init__.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/main.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py(1 hunks)tests/test_log_analyzer.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py
- assisted_service_mcp/src/utils/log_analyzer/signatures/base.py
- assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py
- assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py
- assisted_service_mcp/src/tools/cluster_tools.py
- assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (3)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
ClusterAnalyzer(23-152)set_cluster_metadata(30-35)set_cluster_events(37-39)metadata(42-44)metadata(171-185)cluster_events(47-49)get_all_cluster_events(51-55)get_all_cluster_events(187-199)get_last_install_cluster_events(99-110)get_events_by_host(133-139)get_hostname(142-152)assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_cluster_logs(157-170)get_events(173-214)assisted_service_mcp/src/utils/log_analyzer/main.py (1)
analyze_cluster(21-100)
assisted_service_mcp/src/utils/log_analyzer/main.py (3)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_events(173-214)get_cluster_logs(157-170)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
ClusterAnalyzer(23-152)LogAnalyzer(155-282)set_cluster_events(37-39)set_cluster_metadata(30-35)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)
SignatureResult(15-48)Signature(51-90)analyze(61-70)
assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
ClusterAnalyzer(23-152)LogAnalyzer(155-282)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
tests/test_log_analyzer.py (1)
get(10-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (12)
assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)
82-83: LGTM! Correctly declares the signature does not require logs.The
EventsInstallationAttemptssignature only uses event and metadata APIs (get_all_cluster_events(),partition_cluster_events(),get_last_install_cluster_events()) that are available on the baseClusterAnalyzer, sologs_required = Falseis accurate.assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)
30-31: LGTM! Correctly declares the signature does not require logs.The
SNOMachineCidrSignaturesignature only accesseslog_analyzer.metadata, which is available on the baseClusterAnalyzer, sologs_required = Falseis accurate.assisted_service_mcp/src/utils/log_analyzer/__init__.py (1)
7-16: LGTM! Correctly exposes the new base class in the public API.The changes appropriately extend the package's public API to include
ClusterAnalyzeralongside the existingLogAnalyzer, supporting the refactoring that introduces a base analyzer without log dependencies.tests/test_log_analyzer.py (4)
20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.The test thoroughly validates the base
ClusterAnalyzerfunctionality including metadata handling, deleted host separation, event partitioning, and hostname extraction.
183-211: LGTM! Test correctly validates the logs-available path.The test properly mocks the cluster object and logs archive, ensuring the main flow executes the logs-available path with
LogAnalyzer.
214-248: LGTM! Test validates the no-logs execution path.The test correctly mocks an unavailable logs scenario and verifies that signatures with
logs_required = Falseexecute properly usingClusterAnalyzer.
250-284: LGTM! Test validates signature execution with logs available.The test correctly verifies that a signature requiring logs executes when logs are available.
assisted_service_mcp/src/utils/log_analyzer/main.py (2)
43-65: LGTM! Log availability check and ClusterAnalyzer initialization are correct.The code properly checks log availability via
cluster.logs_infoand initializesClusterAnalyzerwith metadata and events when logs are unavailable, correctly filtering to signatures withlogs_required = False.
103-116: LGTM! Signature filtering logic is correct.The
filter_signaturesfunction properly maps signature names to classes and warns about unknown signatures.assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3)
23-56: LGTM! ClusterAnalyzer base class API is well-designed.The parameterless constructor and explicit setter methods (
set_cluster_metadata,set_cluster_events) provide a clean separation between cluster-level analysis (no logs) and log-based analysis, with good defensive programming practices.
155-199: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.The subclass properly overrides
metadataandget_all_cluster_eventsto load from the logs archive while maintaining the same interface, with appropriate error handling.
141-152: LGTM! Robust hostname extraction with proper fallbacks.The
get_hostnamemethod handles multiple hostname sources with appropriate error handling and fallback logic.
55949bc to
16aa804
Compare
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_log_analyzer.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_log_analyzer.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
ClusterAnalyzer(23-152)set_cluster_metadata(30-35)set_cluster_events(37-39)metadata(42-44)metadata(171-185)cluster_events(47-49)get_all_cluster_events(51-55)get_all_cluster_events(187-199)get_last_install_cluster_events(99-110)get_events_by_host(133-139)get_hostname(142-152)assisted_service_mcp/src/utils/log_analyzer/main.py (1)
analyze_cluster(21-100)
🔇 Additional comments (6)
tests/test_log_analyzer.py (6)
1-18: LGTM!The addition of
AsyncMockimport and themake_archivehelper function are well-structured for the test scenarios.
20-109: LGTM!Comprehensive test coverage for the new
ClusterAnalyzerbase class. The test validates metadata handling, event partitioning (including reset events), deleted host filtering, and hostname resolution.
183-212: LGTM!The test has been properly updated to use the new mock structure with
logs_infoandto_dict()return value, correctly triggering the logs-available path.
251-285: LGTM! Past review issues addressed.The test correctly verifies that
ApiInvalidCertificateSignatureruns when logs are available. The mock setup properly configuresto_dict.return_valueandlogs_info, and the signature name has been corrected from the previous review.
287-318: LGTM! Past review issues resolved.The test correctly validates that
ApiInvalidCertificateSignaturedoes not run when logs are unavailable. The mock configuration has been fixed per previous feedback:to_dict.return_valueis properly set andlogs_info = "not_completed"correctly triggers the no-logs path.
320-348: LGTM! Past review issues addressed.The test properly validates
SlowImageDownloadSignaturebehavior without logs. The mock setup has been corrected per previous feedback, withto_dict.return_valueandlogs_infoproperly configured.
| assert len(results) == 2 | ||
| for result in results: | ||
| assert result.title in ["No etcd in SNO hostname", "Slow Image Download"] | ||
| if result.title in "Slow Image Download": |
There was a problem hiding this comment.
Fix the string comparison operator.
Line 247 uses in for string comparison, which checks substring membership. This should be an equality check.
Apply this diff:
- if result.title in "Slow Image Download":
+ if result.title == "Slow Image Download":📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if result.title in "Slow Image Download": | |
| if result.title == "Slow Image Download": |
🤖 Prompt for AI Agents
In tests/test_log_analyzer.py around line 247, the code uses the membership
operator `in` to compare result.title to the literal "Slow Image Download",
which tests substring membership rather than equality; change the comparison to
an equality check (use ==) so it only matches when result.title exactly equals
"Slow Image Download".
| """Get cluster metadata.""" | ||
| if self._metadata is None: | ||
| try: | ||
| metadata_content = self.logs_archive.get("cluster_metadata.json") |
There was a problem hiding this comment.
It's a bit strange to me that we now sometimes get the metadata from the API and other times get it from the log bundle.
Did you consider making this consistent (I guess just always getting it from the API)?
I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.
Also it's a bit odd that the user of the ClusterAnalyzer is responsible for fetching the metadata and events from outside the object and setting them where LogAnalyzer handles that for the caller. We should probably be more consistent there as well.
There was a problem hiding this comment.
Then a follow on from this would be that maybe we don't need separate classes at all.
If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.
Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.
There was a problem hiding this comment.
Did you consider making this consistent (I guess just always getting it from the API)?
I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.
Yes that's exactly it, the data from the live cluster might differ from what's in the log bundle.
I've considered making it toggle-able as in either choose to use the api data or the log bundle data if it's available for the cluster metadata/events. However, this would then need to be something that the model decides when calling the tool, and I'm not sure how to do that...
If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.
Sure we could do it this way instead too, I was just hesitant to pass in the api client to the analyzer since the client is used outside of it.
Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.
Yeah that works too, but it just requires more changes where we'd need to update all of the signatures that do require logs to do this check.
In analyze_cluster, adds logic to call the api first to see if logs are available, and initialize the corresponding analyzer. Also updates which signatures to run depending on the analyzer chosen.
16aa804 to
b6530e0
Compare
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
b6530e0 to
8760682
Compare
Tests new cluster analyzer and running signatures when logs are not available for a cluster.
8760682 to
b4faa94
Compare
|
@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
43-80: Consider warning when user-requested signatures are skipped due to log unavailability.When logs are unavailable, the code filters to signatures where
logs_required=False(lines 62-64), then applies user-requestedspecific_signatures(lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without feedback.This matches a concern raised in past review comments. Consider adding a warning after line 80 to inform users which requested signatures were skipped:
# If specific signatures are provided, filter the signatures to run if specific_signatures: + before_filter = {sig.__name__ for sig in signatures_to_run} signatures_to_run = filter_signatures( signatures_to_run, specific_signatures ) + after_filter = {sig.__name__ for sig in signatures_to_run} + skipped = set(specific_signatures) - after_filter + if skipped and cluster.logs_info != "completed": + logger.warning( + "Skipped signatures requiring logs (logs unavailable): %s", + ", ".join(sorted(skipped)) + )assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
56-76: Add error handling for missing or invalid timestamps.The
_clean_metadata_jsonmethod assumesmd["cluster"]["install_started_at"]exists and can be parsed. If the field is missing or contains an invalid timestamp, this will raiseKeyErroror parsing exceptions.This was flagged in previous reviews. Consider adding defensive checks:
@staticmethod def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - installation_start_time = dateutil.parser.isoparse( - str(md["cluster"]["install_started_at"]) - ) + cluster = md.get("cluster", {}) + install_started_at = cluster.get("install_started_at") + installation_start_time = None + if install_started_at: + try: + installation_start_time = dateutil.parser.isoparse(str(install_started_at)) + except (ValueError, TypeError) as e: + logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e) def host_deleted_before_installation_started(host): + if not installation_start_time: + return False if deleted_at := host.get("deleted_at"): - return dateutil.parser.isoparse(deleted_at) < installation_start_time + try: + return dateutil.parser.isoparse(deleted_at) < installation_start_time + except (ValueError, TypeError) as e: + logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e) + return False return False - all_hosts = md["cluster"]["hosts"] - md["cluster"]["deleted_hosts"] = [ + all_hosts = list(cluster.get("hosts", [])) + cluster["deleted_hosts"] = [ h for h in all_hosts if host_deleted_before_installation_started(h) ] - md["cluster"]["hosts"] = [ + cluster["hosts"] = [ h for h in all_hosts if not host_deleted_before_installation_started(h) ] + md["cluster"] = cluster return mdNote: Previous review comments indicated this was addressed in commits 7547111 to 55949bc, but the current code still lacks these safeguards.
tests/test_log_analyzer.py (1)
214-249: Fix string comparison on line 247.Line 247 uses the
inoperator for string comparison, which checks substring membership rather than equality.Apply this diff:
if result.title in "Slow Image Download": - assert "Detected slow image download rate (MBps):" in result.content + if result.title == "Slow Image Download": + assert "Detected slow image download rate (MBps):" in result.contentNote: This issue was flagged in a previous review but remains unresolved.
🧹 Nitpick comments (1)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
62-64: Consider more idiomatic boolean check.The expression
sig.logs_required is Falseworks but could be more Pythonic.Apply this diff:
# Select signatures that don't require logs signatures_to_run = [ - sig for sig in ALL_SIGNATURES if sig.logs_required is False + sig for sig in ALL_SIGNATURES if not sig.logs_required ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
assisted_service_mcp/src/mcp.py(1 hunks)assisted_service_mcp/src/tools/cluster_tools.py(1 hunks)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py(3 hunks)assisted_service_mcp/src/utils/log_analyzer/main.py(4 hunks)tests/test_log_analyzer.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (3)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
ClusterAnalyzer(22-131)set_cluster_metadata(29-34)set_cluster_events(36-38)metadata(41-43)metadata(150-164)cluster_events(46-48)get_all_cluster_events(50-54)get_all_cluster_events(166-178)get_last_install_cluster_events(78-89)get_events_by_host(112-118)get_hostname(121-131)assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_cluster_logs(157-170)get_events(173-214)assisted_service_mcp/src/utils/log_analyzer/main.py (1)
analyze_cluster(21-100)
assisted_service_mcp/src/mcp.py (1)
assisted_service_mcp/src/tools/cluster_tools.py (1)
troubleshoot_cluster(368-388)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
tests/test_log_analyzer.py (1)
get(10-15)assisted_service_mcp/src/tools/event_tools.py (1)
cluster_events(12-41)
assisted_service_mcp/src/utils/log_analyzer/main.py (3)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
get_cluster(117-141)get_events(173-214)get_cluster_logs(157-170)assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
ClusterAnalyzer(22-131)LogAnalyzer(134-261)set_cluster_events(36-38)set_cluster_metadata(29-34)assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)
SignatureResult(15-48)Signature(51-85)analyze(61-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (8)
assisted_service_mcp/src/mcp.py (1)
78-78: LGTM! Tool registration updated to reflect broader troubleshooting scope.The change from
analyze_cluster_logstotroubleshoot_clustercorrectly reflects the new capability to analyze clusters even when logs are unavailable.assisted_service_mcp/src/tools/cluster_tools.py (1)
368-388: LGTM! Function rename and documentation updates accurately reflect the new behavior.The rename to
troubleshoot_clusterand the updated docstring correctly communicate that this function can now analyze cluster data even when logs are unavailable.assisted_service_mcp/src/utils/log_analyzer/main.py (1)
103-116: LGTM! Clean signature filtering implementation.The
filter_signatureshelper correctly maps signature names to classes and warns about unknown signatures.tests/test_log_analyzer.py (3)
20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.The test validates metadata handling, deleted host separation, event partitioning, and hostname resolution.
251-318: LGTM! Good test coverage for log-present and log-absent scenarios.These tests correctly verify that signatures requiring logs run only when logs are available and are skipped when logs are unavailable.
320-348: LGTM! Test validates log-independent signature execution.The test correctly verifies that
SlowImageDownloadSignatureruns when logs are not available.assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
22-132: LGTM! Clean base class design for cluster analysis.The
ClusterAnalyzerbase class provides a clear separation of concerns, allowing analysis with or without log archives.
134-178: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.The refactored
LogAnalyzerproperly inherits fromClusterAnalyzerand loads metadata and events from the log archive.
|
@CrystalChun: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| # first call the api to get the cluster and check if logs are available | ||
| cluster = await api_client.get_cluster(cluster_id) | ||
|
|
||
| if cluster.logs_info != "completed": |
There was a problem hiding this comment.
Let's change this to actually check if we can download the logs.
I say this because when you reset a cluster the logs_info gets reset, but the actual log bundle is still present and we should be able to use it.
|
/close The log bundle will always be available. Events and metadata are uploaded to the "log bundle" on demand when the GET logs API is called. |
|
@CrystalChun: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Certain Signatures do not require the log bundle to run.
This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.
The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.
NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation