MGMT-21844: Enable running Signatures regardless of log bundle availability by CrystalChun · Pull Request #128 · openshift-assisted/assisted-service-mcp

CrystalChun · 2025-10-17T20:18:34Z

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features
- Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.
- Select a subset of checks by name; unknown names produce warnings.
- Troubleshooting tool updated to reflect broader cluster-data handling.
Improvements
- More checks marked as not requiring logs, widening analysis coverage.
- Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.
Tests
- Expanded coverage for both log-present and log-missing paths.
Documentation
- Docstring updated to reflect analysis from cluster data or logs.

openshift-ci-robot · 2025-10-17T20:18:38Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-17T20:18:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CrystalChun
Once this PR has been reviewed and has the lgtm label, please assign carbonin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2025-10-17T20:18:45Z

Walkthrough

Introduce a parameterless ClusterAnalyzer base class and expose it in the package API; refactor LogAnalyzer to subclass it and load archives when available; add per-signature logs_required flags and a filter_signatures helper; update main workflow, tests, and a troubleshooting tool rename.

Changes

Cohort / File(s)	Summary
Core analyzers `assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py`	Add `ClusterAnalyzer` (parameterless ctor) with `set_cluster_metadata()`, `set_cluster_events()`, `cluster_events` property, `get_all_cluster_events()`, and static `get_hostname()`; add `LogAnalyzer(ClusterAnalyzer)` that stores `logs_archive` and loads metadata/events from the archive.
Package exports `assisted_service_mcp/src/utils/log_analyzer/__init__.py`	Import `ClusterAnalyzer` and add it to `__all__`.
Main workflow `assisted_service_mcp/src/utils/log_analyzer/main.py`	`analyze_cluster()` now selects `ClusterAnalyzer` when logs are not completed and `LogAnalyzer` when logs are available; adds `filter_signatures()` to map requested signature names and respect per-signature `logs_required` flags; loads events/metadata into analyzer when logs are unavailable.
Signature base `assisted_service_mcp/src/utils/log_analyzer/signatures/base.py`	Add `logs_required = True` on `Signature`; change `format_time()` to accept `str
Log-optional signatures `assisted_service_mcp/src/utils/log_analyzer/signatures/*.py`	Mark signatures as log-optional by adding `logs_required = False` to: `ComponentsVersionSignature`, `SNOHostnameHasEtcd`, `SNOMachineCidrSignature`, `SlowImageDownloadSignature`, `LibvirtRebootFlagSignature`, and `EventsInstallationAttempts`.
Tests `tests/test_log_analyzer.py`	Add/adjust tests for ClusterAnalyzer and LogAnalyzer metadata/events partitioning, hostname resolution, and analyze_cluster flows for both log-present and log-absent scenarios; use `AsyncMock` and updated mocks; expand signature execution tests.
Tools / API `assisted_service_mcp/src/tools/cluster_tools.py`, `assisted_service_mcp/src/mcp.py`	Rename `analyze_cluster_logs` → `troubleshoot_cluster` and update docstring/registration to use `troubleshoot_cluster` (tool registration updated in `mcp.py`).

Sequence Diagram(s)

sequenceDiagram
    participant Main as analyze_cluster
    participant API as assisted-service API
    participant ClusterA as ClusterAnalyzer
    participant LogA as LogAnalyzer
    participant Sig as Signature

    Main->>API: get_cluster(cluster_id)
    API-->>Main: cluster (with logs_info)

    alt logs_info.completed == True
        Main->>LogA: instantiate with logs_archive
        Note right of LogA #D6EAF8: Load metadata & events from archive
        LogA->>LogA: set metadata & events
        Main->>Sig: run selected signatures (all allowed)
    else logs not completed
        Main->>ClusterA: instantiate ClusterAnalyzer()
        Main->>ClusterA: set_cluster_metadata(metadata)
        Main->>ClusterA: set_cluster_events(parsed_events)
        Main->>Sig: run signatures where logs_required == False
    end

    Sig-->>Main: SignatureResult(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm

Suggested reviewers

carbonin

Poem

🐇
I hopped through clusters, metadata in paw,
When archives sleep, I still learn the law.
ClusterAnalyzer hums without a log,
Signatures sing, whether file or fog.
Hooray for rabbits, and tests that jog! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "MGMT-21844: Enable running Signatures regardless of log bundle availability" accurately captures the primary objective of the changeset. The PR introduces a `logs_required` attribute to mark signatures that don't need logs, restructures the analyzer hierarchy with a new `ClusterAnalyzer` base class, and updates the main analysis flow to conditionally use either `ClusterAnalyzer` or `LogAnalyzer` depending on log availability. This directly enables signatures to run regardless of whether a log bundle is present. The title is concise, specific, and clearly conveys the main change without vague language or unnecessary details.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2025-10-17T20:20:23Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Cluster diagnostics now available even with incomplete or unavailable logs; the system intelligently adapts analysis based on available data.

Enhanced cluster event and metadata management for comprehensive diagnostics.

Improvements

System better handles scenarios with partial cluster information, ensuring analyses complete when possible.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

59-86: Consider extracting datetime parsing logic to reduce duplication.

The datetime/string handling pattern (lines 61-65 and 70-74) is duplicated and also appears in format_time. This could be extracted into a helper method.

Consider adding a helper method:

@staticmethod
def _parse_datetime(value: str | datetime) -> datetime:
    """Parse datetime from string or return datetime object."""
    if isinstance(value, str):
        return dateutil.parser.isoparse(value)
    return value

Then simplify the code:

-        install_started_at = md["cluster"]["install_started_at"]
-        # Handle both datetime objects and ISO strings
-        if isinstance(install_started_at, str):
-            installation_start_time = dateutil.parser.isoparse(install_started_at)
-        else:
-            # It's already a datetime object
-            installation_start_time = install_started_at
+        installation_start_time = self._parse_datetime(md["cluster"]["install_started_at"])

And similar for the deleted_at handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b37220e and 1a30d36.

📒 Files selected for processing (11)

assisted_service_mcp/src/utils/log_analyzer/__init__.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2 hunks)
assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (2 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1 hunks)
tests/test_log_analyzer.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

tests/test_log_analyzer.py (1)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_cluster_logs (157-170)

get_events (173-214)

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_events (173-214)

get_cluster_logs (157-170)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

ClusterAnalyzer (22-141)

LogAnalyzer (144-271)

set_cluster_events (36-38)

set_cluster_metadata (29-34)

assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

ClusterAnalyzer (22-141)

LogAnalyzer (144-271)

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

SignatureResult (15-48)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

tests/test_log_analyzer.py (1)

get (10-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request

🔇 Additional comments (17)

assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1)

39-40: LGTM! Correctly marked as not requiring logs.

The SNOHostnameHasEtcd signature only accesses cluster metadata and uses the get_hostname() helper, neither of which require log archives. Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)

82-83: LGTM! Correctly marked as not requiring logs.

The EventsInstallationAttempts signature only accesses cluster events via methods that work without log archives. Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1)

17-18: LGTM! Correctly marked as not requiring logs.

The ComponentsVersionSignature signature only accesses cluster metadata (release_tag, versions). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1)

22-23: LGTM! Correctly marked as not requiring logs.

The LibvirtRebootFlagSignature signature only accesses cluster metadata and host inventory JSON (from metadata). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)

30-31: LGTM! Correctly marked as not requiring logs.

The SNOMachineCidrSignature signature only accesses cluster metadata and host inventory (from metadata). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1)

19-19: LGTM! Correctly marked as not requiring logs.

The SlowImageDownloadSignature signature only accesses cluster events to extract image download rates. Setting logs_required = False is appropriate.

tests/test_log_analyzer.py (2)

3-3: LGTM! Correct import for async mocking.

Adding AsyncMock import is necessary for properly mocking the async methods in the client (get_cluster, get_cluster_logs, get_events).

96-111: LGTM! Properly structured mocks for the updated flow.

The test correctly mocks the async client and cluster object:

AsyncMock is appropriate for async methods

fake_cluster.logs_info = "completed" triggers the logs-available code path

fake_cluster.to_dict() provides the expected cluster metadata structure

fake_archive.get() returns a JSON string matching the expected format

The test validates the happy path with an empty signatures list.

assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)

7-7: LGTM! Correctly exposes the new ClusterAnalyzer base class.

Adding ClusterAnalyzer to the imports makes the new base class available for consumers who need to analyze clusters without log archives.

12-12: LGTM! Correctly adds ClusterAnalyzer to the public API.

Including ClusterAnalyzer in __all__ properly exposes it as part of the module's public interface while maintaining backward compatibility with existing LogAnalyzer usage.

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

54-55: LGTM! Conservative default protects existing behavior.

The logs_required class attribute is well-designed with a safe default that ensures existing signatures continue to require logs unless explicitly marked otherwise.

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

6-6: LGTM! Imports aligned with refactored architecture.

Also applies to: 10-18

103-116: LGTM! Clean helper with appropriate warning for unknown signatures.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

22-38: LGTM! Clean base class with dependency injection via setters.

The ClusterAnalyzer base class enables analysis without log archives by accepting metadata and events via setters. The metadata wrapping logic (lines 31-33) handles the structural difference between API responses and log bundle formats.

45-54: LGTM! Safe default behavior for missing events.

The get_all_cluster_events method returns an empty list when events haven't been set, which is a safe default that prevents downstream code from handling None.

130-141: LGTM! Robust hostname extraction with sensible fallbacks.

The method correctly tries requested_hostname, then parses the inventory JSON, and finally falls back to the host ID. The error handling appropriately catches JSON parsing failures.

144-188: LGTM! Clean subclass specialization for archive-based analysis.

The LogAnalyzer subclass appropriately overrides metadata and get_all_cluster_events to load from the log archive instead of requiring injection via setters. The error handling (line 186) aligns with the base class behavior.

coderabbitai · 2025-10-17T20:22:50Z

+        # first call the api to get the cluster and check if logs are available
+        cluster = await api_client.get_cluster(cluster_id)
+
+        if cluster.logs_info != "completed":
+            logger.info(
+                "Logs are not available for cluster: %s\nDefaulting to signatures that don't require logs",
+                cluster_id,
+            )
+
+            analyzer = ClusterAnalyzer()
+
+            # Call events API to get the events and set the events in the analyzer
+            events = await api_client.get_events(cluster_id)
+            analyzer.set_cluster_events(json.loads(events))
+
+            # Set the cluster metadata in the analyzer
+            analyzer.set_cluster_metadata(cluster.to_dict())
+
+            # Select signatures that don't require logs
+            signatures_to_run = [
+                sig for sig in ALL_SIGNATURES if sig.logs_required is False
+            ]

-        # Initialize log analyzer
-        log_analyzer = LogAnalyzer(logs_archive)
+        else:
+            # Download logs
+            logs_archive = await api_client.get_cluster_logs(cluster_id)

-        # Determine which signatures to run
-        signatures_to_run = ALL_SIGNATURES
+            # Initialize log analyzer
+            analyzer = LogAnalyzer(logs_archive)
+
+            # Add all signatures to the list to run
+            signatures_to_run = ALL_SIGNATURES
+
+        # If specific signatures are provided, filter the signatures to run
        if specific_signatures:
-            signature_classes = {sig.__name__: sig for sig in ALL_SIGNATURES}
-            signatures_to_run = []
-            for sig_name in specific_signatures:
-                if sig_name in signature_classes:
-                    signatures_to_run.append(signature_classes[sig_name])
-                else:
-                    logger.warning("Unknown signature: %s", sig_name)
+            signatures_to_run = filter_signatures(
+                signatures_to_run, specific_signatures
+            )


⚠️ Potential issue | 🟡 Minor

Add warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, the code filters to signatures where logs_required=False (lines 62-64), then applies user-requested specific_signatures (lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without any feedback.

Consider adding a warning after line 80:

# If specific signatures are provided, filter the signatures to run if specific_signatures: + # Track which signatures were filtered out + before_filter = set(sig.__name__ for sig in signatures_to_run) signatures_to_run = filter_signatures( signatures_to_run, specific_signatures ) + after_filter = set(sig.__name__ for sig in signatures_to_run) + skipped = set(specific_signatures) - after_filter + if skipped and cluster.logs_info != "completed": + logger.warning( + "Skipped signatures requiring logs (logs unavailable): %s", + ", ".join(sorted(skipped)) + )

🤖 Prompt for AI Agents

In assisted_service_mcp/src/utils/log_analyzer/main.py around lines 43 to 80, when logs are unavailable we restrict to signatures with logs_required=False and then apply user-requested specific_signatures, which can silently skip requested signatures that require logs; after applying the specific_signatures filter (i.e., after line ~80) detect the case where cluster.logs_info != "completed" and specific_signatures was provided, compute which requested signatures were removed (requested minus resulting), and emit a logger.warning listing those skipped signatures and that they were skipped because logs are unavailable; ensure signatures_to_run is a concrete list before computing the difference so the comparison works.

carbonin · 2025-10-17T20:51:41Z

I didn't look deeply into this, but I will say if the goal is to do some analysis without the logs then we should probably change the description of the tool that calls all of this.

I think right now the description is very specific about needing logs available to run.

openshift-ci-robot · 2025-10-17T21:28:22Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Cluster diagnostics now run even when logs are incomplete or unavailable; events and metadata can be loaded directly for analysis.

Ability to select a subset of checks to run by name.

Improvements

Several checks updated to run without requiring logs, enabling broader analysis coverage.

Better handling and exposure of cluster events and metadata for more reliable diagnostics.

Tests

Tests updated to exercise log‑missing and log‑available paths more explicitly.

Documentation

Wording clarified to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

80-92: Type-safe time formatting and narrowed exceptions — LGTM.

Accepts str | datetime and avoids blanket Exception; matches earlier review guidance.

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

76-80: Warn when requested signatures are skipped due to missing logs.

Currently silent. Add a targeted warning to improve UX. (This mirrors a prior suggestion.)

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            # Inform users when requested signatures require logs but logs are unavailable
+            if cluster.logs_info != "completed":
+                requires_logs = {
+                    sig.__name__ for sig in ALL_SIGNATURES if getattr(sig, "logs_required", True)
+                }
+                skipped = sorted(set(specific_signatures) & requires_logs)
+                if skipped:
+                    logger.warning(
+                        "Skipped signatures requiring logs (logs unavailable): %s",
+                        ", ".join(skipped),
+                    )

🧹 Nitpick comments (4)

assisted_service_mcp/src/tools/cluster_tools.py (1)

367-388: Function name no longer matches its behavior; clarify vague docstring terms.

The updated docstring correctly reflects that logs are optional and addresses the reviewer's concern. However:

Naming inconsistency: The function name analyze_cluster_logs implies logs are the primary or required input, but the docstring now states logs are optional ("if available"). This creates a mismatch between the function name and its actual behavior. Consider renaming to analyze_cluster to better reflect the broader scope, though be aware this would be a breaking API change.

Vague terminology:

"cluster's data" (line 374) is unclear—does this refer to cluster metadata, events, configuration, or something else?

"Cluster is available (downloadable via the API)" (line 380) is ambiguous—what specifically is "downloadable"? The cluster metadata, the logs, or both?

Please clarify these terms in the docstring to help users understand what analysis is performed and what prerequisites are actually required.
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
54-55: Make logs_required a ClassVar and document the contract.

Prevents accidental instance shadowing and clarifies intent for subclass authors.
-from typing import Optional, Any, Sequence
+from typing import Optional, Any, Sequence, ClassVar
...
-class Signature(abc.ABC):
+class Signature(abc.ABC):
@@
-    logs_required = True
+    # Subclasses may override to opt into running without logs.
+    logs_required: ClassVar[bool] = True
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
91-100: Log exceptions with stack traces for easier debugging.

Use logger.exception to include traceback.
-            except Exception as e:
-                logger.error(
-                    "Error running signature %s: %s", signature_class.__name__, e
-                )
+            except Exception:
+                logger.exception("Error running signature %s", signature_class.__name__)
@@
-    except Exception as e:
-        logger.error("Error analyzing cluster %s: %s", cluster_id, e)
+    except Exception:
+        logger.exception("Error analyzing cluster %s", cluster_id)
         raise
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
171-173: Prefer logger.exception for metadata load failures.

Adds traceback to aid debugging.
-            except Exception as e:
-                logger.error("Failed to load metadata: %s", e)
+            except Exception:
+                logger.exception("Failed to load metadata")
                 raise

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a30d36 and 0ee69cd.

📒 Files selected for processing (5)

assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2 hunks)
assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3 hunks)
tests/test_log_analyzer.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_log_analyzer.py

🧰 Additional context used

🧬 Code graph analysis (2)

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_events (173-214)

get_cluster_logs (157-170)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

ClusterAnalyzer (22-141)

LogAnalyzer (144-271)

set_cluster_events (36-38)

set_cluster_metadata (29-34)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

tests/test_log_analyzer.py (1)

get (10-15)

assisted_service_mcp/src/tools/event_tools.py (1)

cluster_events (12-41)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request

🔇 Additional comments (1)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

130-142: Hostname extraction utility — LGTM.

Clear precedence and safe fallbacks.

logs_required will indicate if a signature requires logs downloaded from the cluster to be run. It will default to True for all signatures unless marked otherwise. The signatures that do not require logs have been marked. The following are signatures that do not need logs: - ComponentsVersionSignature - SlowImageDownloadSignature - SNOMachineCidrSignature - SNOHostnameHasEtcd - EventsInstallationAttempts - LibvirtRebootFlagSignature

This class will be the parent class of LogAnalyzer and will contain the cluster metadata and events for a cluster.

This function was doing what the get last install cluster event was also doing. Removed the partitioning so that it returns all cluster events as it is named.

Moves the functions that are common to both ClusterAnalyzer and LogAnalyzer to the parent class (ClusterAnalyzer) so both can access them.

openshift-ci-robot · 2025-10-21T22:10:05Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

Select a subset of checks by name; unknown names produce warnings.

Improvements

More checks no longer require logs, widening analysis coverage.

Improved handling and exposure of cluster events, metadata, and host identification for more reliable diagnostics.

Tests

Tests expanded to cover both log‑available and log‑missing flows.

Documentation

Clarified wording to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

57-97: Add error handling for timestamp parsing and missing hosts.

The current implementation has two issues:

Lines 64-73, 78-85: The match/case blocks handle type checking but don't catch parsing exceptions from dateutil.parser.isoparse(), which can raise ValueError or other exceptions.
Line 89: Assumes md["cluster"]["hosts"] exists, which could raise KeyError.

Apply this diff to add proper error handling:

     @staticmethod
     def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        install_started_at = md.get("cluster", {}).get("install_started_at")
-        if not install_started_at:
-            return md
-
-        match install_started_at:
-            case str():
-                installation_start_time = dateutil.parser.isoparse(install_started_at)
-            case datetime():
-                installation_start_time = install_started_at
-            case _:
-                logger.error(
-                    "Unable to parse install started at: %s", install_started_at
-                )
-                return md
+        cluster = md.get("cluster", {})
+        install_started_at = cluster.get("install_started_at")
+        installation_start_time = None
+        
+        if install_started_at:
+            try:
+                match install_started_at:
+                    case str():
+                        installation_start_time = dateutil.parser.isoparse(install_started_at)
+                    case datetime():
+                        installation_start_time = install_started_at
+                    case _:
+                        logger.debug(
+                            "Unable to parse install_started_at (unknown type): %s", install_started_at
+                        )
+            except (ValueError, TypeError) as e:
+                logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e)

         def host_deleted_before_installation_started(host):
+            if not installation_start_time:
+                return False
             if deleted_at := host.get("deleted_at"):
-                # Handle both datetime objects and ISO strings
-                match deleted_at:
-                    case str():
-                        deleted_at_time = dateutil.parser.isoparse(deleted_at)
-                    case datetime():
-                        deleted_at_time = deleted_at
-                    case _:
-                        logger.error("Unable to parse deleted at: %s", deleted_at)
-                        return False
-                return deleted_at_time < installation_start_time
+                try:
+                    match deleted_at:
+                        case str():
+                            deleted_at_time = dateutil.parser.isoparse(deleted_at)
+                        case datetime():
+                            deleted_at_time = deleted_at
+                        case _:
+                            logger.debug("Unable to parse deleted_at (unknown type): %s", deleted_at)
+                            return False
+                    return deleted_at_time < installation_start_time
+                except (ValueError, TypeError) as e:
+                    logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e)
+                    return False
             return False

-        all_hosts = md["cluster"]["hosts"]
-        md["cluster"]["deleted_hosts"] = [
+        all_hosts = list(cluster.get("hosts", []))
+        cluster["deleted_hosts"] = [
             h for h in all_hosts if host_deleted_before_installation_started(h)
         ]
-        md["cluster"]["hosts"] = [
+        cluster["hosts"] = [
             h for h in all_hosts if not host_deleted_before_installation_started(h)
         ]
+        
+        md["cluster"] = cluster
         return md

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

77-80: Add warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, signatures are filtered to those with logs_required=False (lines 62-64), then user-requested specific_signatures are applied (lines 77-80). If a user requests a signature requiring logs when logs aren't available, it will be silently skipped.

Consider logging which requested signatures were skipped:

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
+            before_filter = {sig.__name__ for sig in signatures_to_run}
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            after_filter = {sig.__name__ for sig in signatures_to_run}
+            skipped = set(specific_signatures) - after_filter
+            if skipped and cluster.logs_info != "completed":
+                logger.warning(
+                    "Skipped signatures requiring logs (logs unavailable): %s",
+                    ", ".join(sorted(skipped))
+                )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee69cd and 55949bc.

📒 Files selected for processing (12)

assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/__init__.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1 hunks)
tests/test_log_analyzer.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py
assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py
assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py
assisted_service_mcp/src/tools/cluster_tools.py
assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py

🧰 Additional context used

🧬 Code graph analysis (4)

tests/test_log_analyzer.py (3)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)

ClusterAnalyzer (23-152)

set_cluster_metadata (30-35)

set_cluster_events (37-39)

metadata (42-44)

metadata (171-185)

cluster_events (47-49)

get_all_cluster_events (51-55)

get_all_cluster_events (187-199)

get_last_install_cluster_events (99-110)

get_events_by_host (133-139)

get_hostname (142-152)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_cluster_logs (157-170)

get_events (173-214)

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

analyze_cluster (21-100)

assisted_service_mcp/src/utils/log_analyzer/main.py (3)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_events (173-214)

get_cluster_logs (157-170)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

ClusterAnalyzer (23-152)

LogAnalyzer (155-282)

set_cluster_events (37-39)

set_cluster_metadata (30-35)

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)

SignatureResult (15-48)

Signature (51-90)

analyze (61-70)

assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

ClusterAnalyzer (23-152)

LogAnalyzer (155-282)

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

SignatureResult (15-48)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

tests/test_log_analyzer.py (1)

get (10-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request

🔇 Additional comments (12)

assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)

82-83: LGTM! Correctly declares the signature does not require logs.

The EventsInstallationAttempts signature only uses event and metadata APIs (get_all_cluster_events(), partition_cluster_events(), get_last_install_cluster_events()) that are available on the base ClusterAnalyzer, so logs_required = False is accurate.

assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)

30-31: LGTM! Correctly declares the signature does not require logs.

The SNOMachineCidrSignature signature only accesses log_analyzer.metadata, which is available on the base ClusterAnalyzer, so logs_required = False is accurate.

assisted_service_mcp/src/utils/log_analyzer/__init__.py (1)

7-16: LGTM! Correctly exposes the new base class in the public API.

The changes appropriately extend the package's public API to include ClusterAnalyzer alongside the existing LogAnalyzer, supporting the refactoring that introduces a base analyzer without log dependencies.

tests/test_log_analyzer.py (4)

20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.

The test thoroughly validates the base ClusterAnalyzer functionality including metadata handling, deleted host separation, event partitioning, and hostname extraction.

183-211: LGTM! Test correctly validates the logs-available path.

The test properly mocks the cluster object and logs archive, ensuring the main flow executes the logs-available path with LogAnalyzer.

214-248: LGTM! Test validates the no-logs execution path.

The test correctly mocks an unavailable logs scenario and verifies that signatures with logs_required = False execute properly using ClusterAnalyzer.

250-284: LGTM! Test validates signature execution with logs available.

The test correctly verifies that a signature requiring logs executes when logs are available.

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

43-65: LGTM! Log availability check and ClusterAnalyzer initialization are correct.

The code properly checks log availability via cluster.logs_info and initializes ClusterAnalyzer with metadata and events when logs are unavailable, correctly filtering to signatures with logs_required = False.

103-116: LGTM! Signature filtering logic is correct.

The filter_signatures function properly maps signature names to classes and warns about unknown signatures.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3)

23-56: LGTM! ClusterAnalyzer base class API is well-designed.

The parameterless constructor and explicit setter methods (set_cluster_metadata, set_cluster_events) provide a clean separation between cluster-level analysis (no logs) and log-based analysis, with good defensive programming practices.

155-199: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.

The subclass properly overrides metadata and get_all_cluster_events to load from the logs archive while maintaining the same interface, with appropriate error handling.

141-152: LGTM! Robust hostname extraction with proper fallbacks.

The get_hostname method handles multiple hostname sources with appropriate error handling and fallback logic.

openshift-ci-robot · 2025-10-21T22:26:23Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

Select a subset of checks by name; unknown names produce warnings.

Improvements

More checks marked as not requiring logs, widening analysis coverage.

Unified handling and clearer exposure of cluster events, metadata, and host identification for more reliable diagnostics.

Tests

Expanded tests covering both log‑available and log‑missing flows.

Documentation

Clarified docstring to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55949bc and 16aa804.

📒 Files selected for processing (1)

tests/test_log_analyzer.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_log_analyzer.py (2)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)

ClusterAnalyzer (23-152)

set_cluster_metadata (30-35)

set_cluster_events (37-39)

metadata (42-44)

metadata (171-185)

cluster_events (47-49)

get_all_cluster_events (51-55)

get_all_cluster_events (187-199)

get_last_install_cluster_events (99-110)

get_events_by_host (133-139)

get_hostname (142-152)

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

analyze_cluster (21-100)

🔇 Additional comments (6)

tests/test_log_analyzer.py (6)

1-18: LGTM!

The addition of AsyncMock import and the make_archive helper function are well-structured for the test scenarios.

20-109: LGTM!

Comprehensive test coverage for the new ClusterAnalyzer base class. The test validates metadata handling, event partitioning (including reset events), deleted host filtering, and hostname resolution.

183-212: LGTM!

The test has been properly updated to use the new mock structure with logs_info and to_dict() return value, correctly triggering the logs-available path.

251-285: LGTM! Past review issues addressed.

The test correctly verifies that ApiInvalidCertificateSignature runs when logs are available. The mock setup properly configures to_dict.return_value and logs_info, and the signature name has been corrected from the previous review.

287-318: LGTM! Past review issues resolved.

The test correctly validates that ApiInvalidCertificateSignature does not run when logs are unavailable. The mock configuration has been fixed per previous feedback: to_dict.return_value is properly set and logs_info = "not_completed" correctly triggers the no-logs path.

320-348: LGTM! Past review issues addressed.

The test properly validates SlowImageDownloadSignature behavior without logs. The mock setup has been corrected per previous feedback, with to_dict.return_value and logs_info properly configured.

coderabbitai · 2025-10-21T22:28:34Z

+    assert len(results) == 2
+    for result in results:
+        assert result.title in ["No etcd in SNO hostname", "Slow Image Download"]
+        if result.title in "Slow Image Download":


⚠️ Potential issue | 🟡 Minor

Fix the string comparison operator.

Line 247 uses in for string comparison, which checks substring membership. This should be an equality check.

Apply this diff:

- if result.title in "Slow Image Download": + if result.title == "Slow Image Download":

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if result.title in "Slow Image Download":

if result.title == "Slow Image Download":

🤖 Prompt for AI Agents

In tests/test_log_analyzer.py around line 247, the code uses the membership operator `in` to compare result.title to the literal "Slow Image Download", which tests substring membership rather than equality; change the comparison to an equality check (use ==) so it only matches when result.title exactly equals "Slow Image Download".

carbonin · 2025-10-22T13:35:29Z

+        """Get cluster metadata."""
+        if self._metadata is None:
+            try:
+                metadata_content = self.logs_archive.get("cluster_metadata.json")


It's a bit strange to me that we now sometimes get the metadata from the API and other times get it from the log bundle.

Did you consider making this consistent (I guess just always getting it from the API)?

I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.

Also it's a bit odd that the user of the ClusterAnalyzer is responsible for fetching the metadata and events from outside the object and setting them where LogAnalyzer handles that for the caller. We should probably be more consistent there as well.

Then a follow on from this would be that maybe we don't need separate classes at all.

If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.

Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.

Did you consider making this consistent (I guess just always getting it from the API)?
I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.

Yes that's exactly it, the data from the live cluster might differ from what's in the log bundle.

I've considered making it toggle-able as in either choose to use the api data or the log bundle data if it's available for the cluster metadata/events. However, this would then need to be something that the model decides when calling the tool, and I'm not sure how to do that...

If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.

Sure we could do it this way instead too, I was just hesitant to pass in the api client to the analyzer since the client is used outside of it.

Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.

Yeah that works too, but it just requires more changes where we'd need to update all of the signatures that do require logs to do this check.

In analyze_cluster, adds logic to call the api first to see if logs are available, and initialize the corresponding analyzer. Also updates which signatures to run depending on the analyzer chosen.

openshift-ci-robot · 2025-10-22T18:04:15Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

Select a subset of checks by name; unknown names produce warnings.

Improvements

More checks marked as not requiring logs, widening analysis coverage.

Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.

Tests

Expanded coverage for both log-present and log-missing paths.

Documentation

Docstring updated to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Tests new cluster analyzer and running signatures when logs are not available for a cluster.

openshift-ci-robot · 2025-10-22T18:13:27Z

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

New Features

Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

Select a subset of checks by name; unknown names produce warnings.

Troubleshooting tool updated to reflect broader cluster-data handling.

Improvements

More checks marked as not requiring logs, widening analysis coverage.

Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.

Tests

Expanded coverage for both log-present and log-missing paths.

Documentation

Docstring updated to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

43-80: Consider warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, the code filters to signatures where logs_required=False (lines 62-64), then applies user-requested specific_signatures (lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without feedback.

This matches a concern raised in past review comments. Consider adding a warning after line 80 to inform users which requested signatures were skipped:
         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
+            before_filter = {sig.__name__ for sig in signatures_to_run}
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            after_filter = {sig.__name__ for sig in signatures_to_run}
+            skipped = set(specific_signatures) - after_filter
+            if skipped and cluster.logs_info != "completed":
+                logger.warning(
+                    "Skipped signatures requiring logs (logs unavailable): %s",
+                    ", ".join(sorted(skipped))
+                )

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

56-76: Add error handling for missing or invalid timestamps.

The _clean_metadata_json method assumes md["cluster"]["install_started_at"] exists and can be parsed. If the field is missing or contains an invalid timestamp, this will raise KeyError or parsing exceptions.

This was flagged in previous reviews. Consider adding defensive checks:

     @staticmethod
     def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        installation_start_time = dateutil.parser.isoparse(
-            str(md["cluster"]["install_started_at"])
-        )
+        cluster = md.get("cluster", {})
+        install_started_at = cluster.get("install_started_at")
+        installation_start_time = None
+        if install_started_at:
+            try:
+                installation_start_time = dateutil.parser.isoparse(str(install_started_at))
+            except (ValueError, TypeError) as e:
+                logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e)
 
         def host_deleted_before_installation_started(host):
+            if not installation_start_time:
+                return False
             if deleted_at := host.get("deleted_at"):
-                return dateutil.parser.isoparse(deleted_at) < installation_start_time
+                try:
+                    return dateutil.parser.isoparse(deleted_at) < installation_start_time
+                except (ValueError, TypeError) as e:
+                    logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e)
+                    return False
             return False
 
-        all_hosts = md["cluster"]["hosts"]
-        md["cluster"]["deleted_hosts"] = [
+        all_hosts = list(cluster.get("hosts", []))
+        cluster["deleted_hosts"] = [
             h for h in all_hosts if host_deleted_before_installation_started(h)
         ]
-        md["cluster"]["hosts"] = [
+        cluster["hosts"] = [
             h for h in all_hosts if not host_deleted_before_installation_started(h)
         ]
+        md["cluster"] = cluster
         return md

Note: Previous review comments indicated this was addressed in commits 7547111 to 55949bc, but the current code still lacks these safeguards.

tests/test_log_analyzer.py (1)

214-249: Fix string comparison on line 247.

Line 247 uses the in operator for string comparison, which checks substring membership rather than equality.

Apply this diff:
         if result.title in "Slow Image Download":
-            assert "Detected slow image download rate (MBps):" in result.content
+        if result.title == "Slow Image Download":
+            assert "Detected slow image download rate (MBps):" in result.content
Note: This issue was flagged in a previous review but remains unresolved.

🧹 Nitpick comments (1)

assisted_service_mcp/src/utils/log_analyzer/main.py (1)
62-64: Consider more idiomatic boolean check.

The expression sig.logs_required is False works but could be more Pythonic.

Apply this diff:
             # Select signatures that don't require logs
             signatures_to_run = [
-                sig for sig in ALL_SIGNATURES if sig.logs_required is False
+                sig for sig in ALL_SIGNATURES if not sig.logs_required
             ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16aa804 and b4faa94.

📒 Files selected for processing (5)

assisted_service_mcp/src/mcp.py (1 hunks)
assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3 hunks)
assisted_service_mcp/src/utils/log_analyzer/main.py (4 hunks)
tests/test_log_analyzer.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

tests/test_log_analyzer.py (3)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)

ClusterAnalyzer (22-131)

set_cluster_metadata (29-34)

set_cluster_events (36-38)

metadata (41-43)

metadata (150-164)

cluster_events (46-48)

get_all_cluster_events (50-54)

get_all_cluster_events (166-178)

get_last_install_cluster_events (78-89)

get_events_by_host (112-118)

get_hostname (121-131)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_cluster_logs (157-170)

get_events (173-214)

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

analyze_cluster (21-100)

assisted_service_mcp/src/mcp.py (1)

assisted_service_mcp/src/tools/cluster_tools.py (1)

troubleshoot_cluster (368-388)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

tests/test_log_analyzer.py (1)

get (10-15)

assisted_service_mcp/src/tools/event_tools.py (1)

cluster_events (12-41)

assisted_service_mcp/src/utils/log_analyzer/main.py (3)

assisted_service_mcp/src/service_client/assisted_service_api.py (3)

get_cluster (117-141)

get_events (173-214)

get_cluster_logs (157-170)

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

ClusterAnalyzer (22-131)

LogAnalyzer (134-261)

set_cluster_events (36-38)

set_cluster_metadata (29-34)

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)

SignatureResult (15-48)

Signature (51-85)

analyze (61-70)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request

🔇 Additional comments (8)

assisted_service_mcp/src/mcp.py (1)

78-78: LGTM! Tool registration updated to reflect broader troubleshooting scope.

The change from analyze_cluster_logs to troubleshoot_cluster correctly reflects the new capability to analyze clusters even when logs are unavailable.

assisted_service_mcp/src/tools/cluster_tools.py (1)

368-388: LGTM! Function rename and documentation updates accurately reflect the new behavior.

The rename to troubleshoot_cluster and the updated docstring correctly communicate that this function can now analyze cluster data even when logs are unavailable.

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

103-116: LGTM! Clean signature filtering implementation.

The filter_signatures helper correctly maps signature names to classes and warns about unknown signatures.

tests/test_log_analyzer.py (3)

20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.

The test validates metadata handling, deleted host separation, event partitioning, and hostname resolution.

251-318: LGTM! Good test coverage for log-present and log-absent scenarios.

These tests correctly verify that signatures requiring logs run only when logs are available and are skipped when logs are unavailable.

320-348: LGTM! Test validates log-independent signature execution.

The test correctly verifies that SlowImageDownloadSignature runs when logs are not available.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

22-132: LGTM! Clean base class design for cluster analysis.

The ClusterAnalyzer base class provides a clear separation of concerns, allowing analysis with or without log archives.

134-178: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.

The refactored LogAnalyzer properly inherits from ClusterAnalyzer and loads metadata and events from the log archive.

openshift-ci · 2025-10-22T18:36:22Z

@CrystalChun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/eval-test	`b4faa94`	link	false	`/test eval-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

carbonin · 2025-10-23T14:26:16Z

+        # first call the api to get the cluster and check if logs are available
+        cluster = await api_client.get_cluster(cluster_id)
+
+        if cluster.logs_info != "completed":


Let's change this to actually check if we can download the logs.

I say this because when you reset a cluster the logs_info gets reset, but the actual log bundle is still present and we should be able to use it.

CrystalChun · 2025-10-23T15:39:07Z

/close

The log bundle will always be available. Events and metadata are uploaded to the "log bundle" on demand when the GET logs API is called.

openshift-ci · 2025-10-23T15:39:10Z

@CrystalChun: Closed this PR.

Details

In response to this:

/close

The log bundle will always be available. Events and metadata are uploaded to the "log bundle" on demand when the GET logs API is called.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 17, 2025

openshift-ci Bot requested review from carbonin and keitwb October 17, 2025 20:18

openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 17, 2025

coderabbitai Bot reviewed Oct 17, 2025

View reviewed changes

CrystalChun force-pushed the logs-not-required-signatures branch from 1a30d36 to 0ee69cd Compare October 17, 2025 21:26

coderabbitai Bot reviewed Oct 17, 2025

View reviewed changes

Comment thread assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py

CrystalChun marked this pull request as draft October 20, 2025 14:14

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2025

CrystalChun added 5 commits October 21, 2025 10:43

MGMT-21844: Introduce ClusterAnalyzer class

312c9e4

This class will be the parent class of LogAnalyzer and will contain the cluster metadata and events for a cluster.

MGMT-21844: Fix get all cluster events function

f3a3a3e

This function was doing what the get last install cluster event was also doing. Removed the partitioning so that it returns all cluster events as it is named.

MGMT-21844: Have log analyzer inherit from cluster analyzer

3b5f0d8

MGMT-21844: Migrate cluster and event functions to parent class

f68510b

Moves the functions that are common to both ClusterAnalyzer and LogAnalyzer to the parent class (ClusterAnalyzer) so both can access them.

CrystalChun force-pushed the logs-not-required-signatures branch from 0ee69cd to 4eb8866 Compare October 21, 2025 22:01

openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2025

CrystalChun force-pushed the logs-not-required-signatures branch from 4eb8866 to 55949bc Compare October 21, 2025 22:08

CrystalChun marked this pull request as ready for review October 21, 2025 22:08

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2025

openshift-ci Bot requested a review from eranco74 October 21, 2025 22:08

coderabbitai Bot reviewed Oct 21, 2025

View reviewed changes

Comment thread tests/test_log_analyzer.py

Comment thread tests/test_log_analyzer.py

CrystalChun force-pushed the logs-not-required-signatures branch from 55949bc to 16aa804 Compare October 21, 2025 22:25

coderabbitai Bot reviewed Oct 21, 2025

View reviewed changes

carbonin reviewed Oct 22, 2025

View reviewed changes

CrystalChun added 2 commits October 22, 2025 11:57

MGMT-21844: Modify analyze_cluster to use new analyzer

d9472bd

In analyze_cluster, adds logic to call the api first to see if logs are available, and initialize the corresponding analyzer. Also updates which signatures to run depending on the analyzer chosen.

MGMT-21844: Update test files to use correct mocking

1177fa1

CrystalChun force-pushed the logs-not-required-signatures branch from 16aa804 to b6530e0 Compare October 22, 2025 18:01

openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 22, 2025

CrystalChun force-pushed the logs-not-required-signatures branch from b6530e0 to 8760682 Compare October 22, 2025 18:08

CrystalChun added 2 commits October 22, 2025 13:10

MGMT-21844: Update the descriptions of the tools and functions

52f6df6

MGMT-21844: Add unit tests

b4faa94

Tests new cluster analyzer and running signatures when logs are not available for a cluster.

CrystalChun force-pushed the logs-not-required-signatures branch from 8760682 to b4faa94 Compare October 22, 2025 18:10

openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 22, 2025

coderabbitai Bot reviewed Oct 22, 2025

View reviewed changes

carbonin reviewed Oct 23, 2025

View reviewed changes

openshift-ci Bot closed this Oct 23, 2025

	if result.title in "Slow Image Download":
	if result.title == "Slow Image Download":

Conversation

CrystalChun commented Oct 17, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Oct 17, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Oct 17, 2025

Uh oh!

coderabbitai Bot commented Oct 17, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

openshift-ci-robot commented Oct 17, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carbonin commented Oct 17, 2025

Uh oh!

openshift-ci-robot commented Oct 17, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 21, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 21, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

carbonin Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

carbonin Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

CrystalChun Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 22, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Oct 22, 2025 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CrystalChun commented Oct 17, 2025 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Oct 17, 2025 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Oct 17, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 17, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 17, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 21, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 21, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 22, 2025 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Oct 22, 2025 •

edited by openshift-ci Bot

Loading