Skip to content

MGMT-21844: Enable running Signatures regardless of log bundle availability#128

Closed
CrystalChun wants to merge 9 commits intoopenshift-assisted:masterfrom
CrystalChun:logs-not-required-signatures
Closed

MGMT-21844: Enable running Signatures regardless of log bundle availability#128
CrystalChun wants to merge 9 commits intoopenshift-assisted:masterfrom
CrystalChun:logs-not-required-signatures

Conversation

@CrystalChun
Copy link
Copy Markdown
Contributor

@CrystalChun CrystalChun commented Oct 17, 2025

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

    • Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.
    • Select a subset of checks by name; unknown names produce warnings.
    • Troubleshooting tool updated to reflect broader cluster-data handling.
  • Improvements

    • More checks marked as not requiring logs, widening analysis coverage.
    • Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.
  • Tests

    • Expanded coverage for both log-present and log-missing paths.
  • Documentation

    • Docstring updated to reflect analysis from cluster data or logs.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 17, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 17, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from carbonin and keitwb October 17, 2025 20:18
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CrystalChun
Once this PR has been reviewed and has the lgtm label, please assign carbonin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 17, 2025

Walkthrough

Introduce a parameterless ClusterAnalyzer base class and expose it in the package API; refactor LogAnalyzer to subclass it and load archives when available; add per-signature logs_required flags and a filter_signatures helper; update main workflow, tests, and a troubleshooting tool rename.

Changes

Cohort / File(s) Summary
Core analyzers
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py
Add ClusterAnalyzer (parameterless ctor) with set_cluster_metadata(), set_cluster_events(), cluster_events property, get_all_cluster_events(), and static get_hostname(); add LogAnalyzer(ClusterAnalyzer) that stores logs_archive and loads metadata/events from the archive.
Package exports
assisted_service_mcp/src/utils/log_analyzer/__init__.py
Import ClusterAnalyzer and add it to __all__.
Main workflow
assisted_service_mcp/src/utils/log_analyzer/main.py
analyze_cluster() now selects ClusterAnalyzer when logs are not completed and LogAnalyzer when logs are available; adds filter_signatures() to map requested signature names and respect per-signature logs_required flags; loads events/metadata into analyzer when logs are unavailable.
Signature base
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py
Add logs_required = True on Signature; change format_time() to accept `str
Log-optional signatures
assisted_service_mcp/src/utils/log_analyzer/signatures/*.py
Mark signatures as log-optional by adding logs_required = False to: ComponentsVersionSignature, SNOHostnameHasEtcd, SNOMachineCidrSignature, SlowImageDownloadSignature, LibvirtRebootFlagSignature, and EventsInstallationAttempts.
Tests
tests/test_log_analyzer.py
Add/adjust tests for ClusterAnalyzer and LogAnalyzer metadata/events partitioning, hostname resolution, and analyze_cluster flows for both log-present and log-absent scenarios; use AsyncMock and updated mocks; expand signature execution tests.
Tools / API
assisted_service_mcp/src/tools/cluster_tools.py, assisted_service_mcp/src/mcp.py
Rename analyze_cluster_logstroubleshoot_cluster and update docstring/registration to use troubleshoot_cluster (tool registration updated in mcp.py).

Sequence Diagram(s)

sequenceDiagram
    participant Main as analyze_cluster
    participant API as assisted-service API
    participant ClusterA as ClusterAnalyzer
    participant LogA as LogAnalyzer
    participant Sig as Signature

    Main->>API: get_cluster(cluster_id)
    API-->>Main: cluster (with logs_info)

    alt logs_info.completed == True
        Main->>LogA: instantiate with logs_archive
        Note right of LogA #D6EAF8: Load metadata & events from archive
        LogA->>LogA: set metadata & events
        Main->>Sig: run selected signatures (all allowed)
    else logs not completed
        Main->>ClusterA: instantiate ClusterAnalyzer()
        Main->>ClusterA: set_cluster_metadata(metadata)
        Main->>ClusterA: set_cluster_events(parsed_events)
        Main->>Sig: run signatures where logs_required == False
    end

    Sig-->>Main: SignatureResult(s)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm

Suggested reviewers

  • carbonin

Poem

🐇
I hopped through clusters, metadata in paw,
When archives sleep, I still learn the law.
ClusterAnalyzer hums without a log,
Signatures sing, whether file or fog.
Hooray for rabbits, and tests that jog! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "MGMT-21844: Enable running Signatures regardless of log bundle availability" accurately captures the primary objective of the changeset. The PR introduces a logs_required attribute to mark signatures that don't need logs, restructures the analyzer hierarchy with a new ClusterAnalyzer base class, and updates the main analysis flow to conditionally use either ClusterAnalyzer or LogAnalyzer depending on log availability. This directly enables signatures to run regardless of whether a log bundle is present. The title is concise, specific, and clearly conveys the main change without vague language or unnecessary details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 17, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 17, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Cluster diagnostics now available even with incomplete or unavailable logs; the system intelligently adapts analysis based on available data.

  • Enhanced cluster event and metadata management for comprehensive diagnostics.

  • Improvements

  • System better handles scenarios with partial cluster information, ensuring analyses complete when possible.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

59-86: Consider extracting datetime parsing logic to reduce duplication.

The datetime/string handling pattern (lines 61-65 and 70-74) is duplicated and also appears in format_time. This could be extracted into a helper method.

Consider adding a helper method:

@staticmethod
def _parse_datetime(value: str | datetime) -> datetime:
    """Parse datetime from string or return datetime object."""
    if isinstance(value, str):
        return dateutil.parser.isoparse(value)
    return value

Then simplify the code:

-        install_started_at = md["cluster"]["install_started_at"]
-        # Handle both datetime objects and ISO strings
-        if isinstance(install_started_at, str):
-            installation_start_time = dateutil.parser.isoparse(install_started_at)
-        else:
-            # It's already a datetime object
-            installation_start_time = install_started_at
+        installation_start_time = self._parse_datetime(md["cluster"]["install_started_at"])

And similar for the deleted_at handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b37220e and 1a30d36.

📒 Files selected for processing (11)
  • assisted_service_mcp/src/utils/log_analyzer/__init__.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (2 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1 hunks)
  • tests/test_log_analyzer.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (1)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_cluster_logs (157-170)
  • get_events (173-214)
assisted_service_mcp/src/utils/log_analyzer/main.py (2)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_events (173-214)
  • get_cluster_logs (157-170)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
  • ClusterAnalyzer (22-141)
  • LogAnalyzer (144-271)
  • set_cluster_events (36-38)
  • set_cluster_metadata (29-34)
assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
  • ClusterAnalyzer (22-141)
  • LogAnalyzer (144-271)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
tests/test_log_analyzer.py (1)
  • get (10-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (17)
assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1)

39-40: LGTM! Correctly marked as not requiring logs.

The SNOHostnameHasEtcd signature only accesses cluster metadata and uses the get_hostname() helper, neither of which require log archives. Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)

82-83: LGTM! Correctly marked as not requiring logs.

The EventsInstallationAttempts signature only accesses cluster events via methods that work without log archives. Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1)

17-18: LGTM! Correctly marked as not requiring logs.

The ComponentsVersionSignature signature only accesses cluster metadata (release_tag, versions). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1)

22-23: LGTM! Correctly marked as not requiring logs.

The LibvirtRebootFlagSignature signature only accesses cluster metadata and host inventory JSON (from metadata). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)

30-31: LGTM! Correctly marked as not requiring logs.

The SNOMachineCidrSignature signature only accesses cluster metadata and host inventory (from metadata). Setting logs_required = False is appropriate.

assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1)

19-19: LGTM! Correctly marked as not requiring logs.

The SlowImageDownloadSignature signature only accesses cluster events to extract image download rates. Setting logs_required = False is appropriate.

tests/test_log_analyzer.py (2)

3-3: LGTM! Correct import for async mocking.

Adding AsyncMock import is necessary for properly mocking the async methods in the client (get_cluster, get_cluster_logs, get_events).


96-111: LGTM! Properly structured mocks for the updated flow.

The test correctly mocks the async client and cluster object:

  • AsyncMock is appropriate for async methods
  • fake_cluster.logs_info = "completed" triggers the logs-available code path
  • fake_cluster.to_dict() provides the expected cluster metadata structure
  • fake_archive.get() returns a JSON string matching the expected format

The test validates the happy path with an empty signatures list.

assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)

7-7: LGTM! Correctly exposes the new ClusterAnalyzer base class.

Adding ClusterAnalyzer to the imports makes the new base class available for consumers who need to analyze clusters without log archives.


12-12: LGTM! Correctly adds ClusterAnalyzer to the public API.

Including ClusterAnalyzer in __all__ properly exposes it as part of the module's public interface while maintaining backward compatibility with existing LogAnalyzer usage.

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

54-55: LGTM! Conservative default protects existing behavior.

The logs_required class attribute is well-designed with a safe default that ensures existing signatures continue to require logs unless explicitly marked otherwise.

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

6-6: LGTM! Imports aligned with refactored architecture.

Also applies to: 10-18


103-116: LGTM! Clean helper with appropriate warning for unknown signatures.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)

22-38: LGTM! Clean base class with dependency injection via setters.

The ClusterAnalyzer base class enables analysis without log archives by accepting metadata and events via setters. The metadata wrapping logic (lines 31-33) handles the structural difference between API responses and log bundle formats.


45-54: LGTM! Safe default behavior for missing events.

The get_all_cluster_events method returns an empty list when events haven't been set, which is a safe default that prevents downstream code from handling None.


130-141: LGTM! Robust hostname extraction with sensible fallbacks.

The method correctly tries requested_hostname, then parses the inventory JSON, and finally falls back to the host ID. The error handling appropriately catches JSON parsing failures.


144-188: LGTM! Clean subclass specialization for archive-based analysis.

The LogAnalyzer subclass appropriately overrides metadata and get_all_cluster_events to load from the log archive instead of requiring injection via setters. The error handling (line 186) aligns with the base class behavior.

Comment on lines +43 to +80
# first call the api to get the cluster and check if logs are available
cluster = await api_client.get_cluster(cluster_id)

if cluster.logs_info != "completed":
logger.info(
"Logs are not available for cluster: %s\nDefaulting to signatures that don't require logs",
cluster_id,
)

analyzer = ClusterAnalyzer()

# Call events API to get the events and set the events in the analyzer
events = await api_client.get_events(cluster_id)
analyzer.set_cluster_events(json.loads(events))

# Set the cluster metadata in the analyzer
analyzer.set_cluster_metadata(cluster.to_dict())

# Select signatures that don't require logs
signatures_to_run = [
sig for sig in ALL_SIGNATURES if sig.logs_required is False
]

# Initialize log analyzer
log_analyzer = LogAnalyzer(logs_archive)
else:
# Download logs
logs_archive = await api_client.get_cluster_logs(cluster_id)

# Determine which signatures to run
signatures_to_run = ALL_SIGNATURES
# Initialize log analyzer
analyzer = LogAnalyzer(logs_archive)

# Add all signatures to the list to run
signatures_to_run = ALL_SIGNATURES

# If specific signatures are provided, filter the signatures to run
if specific_signatures:
signature_classes = {sig.__name__: sig for sig in ALL_SIGNATURES}
signatures_to_run = []
for sig_name in specific_signatures:
if sig_name in signature_classes:
signatures_to_run.append(signature_classes[sig_name])
else:
logger.warning("Unknown signature: %s", sig_name)
signatures_to_run = filter_signatures(
signatures_to_run, specific_signatures
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, the code filters to signatures where logs_required=False (lines 62-64), then applies user-requested specific_signatures (lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without any feedback.

Consider adding a warning after line 80:

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
+            # Track which signatures were filtered out
+            before_filter = set(sig.__name__ for sig in signatures_to_run)
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            after_filter = set(sig.__name__ for sig in signatures_to_run)
+            skipped = set(specific_signatures) - after_filter
+            if skipped and cluster.logs_info != "completed":
+                logger.warning(
+                    "Skipped signatures requiring logs (logs unavailable): %s",
+                    ", ".join(sorted(skipped))
+                )
🤖 Prompt for AI Agents
In assisted_service_mcp/src/utils/log_analyzer/main.py around lines 43 to 80,
when logs are unavailable we restrict to signatures with logs_required=False and
then apply user-requested specific_signatures, which can silently skip requested
signatures that require logs; after applying the specific_signatures filter
(i.e., after line ~80) detect the case where cluster.logs_info != "completed"
and specific_signatures was provided, compute which requested signatures were
removed (requested minus resulting), and emit a logger.warning listing those
skipped signatures and that they were skipped because logs are unavailable;
ensure signatures_to_run is a concrete list before computing the difference so
the comparison works.

Comment thread assisted_service_mcp/src/utils/log_analyzer/signatures/base.py Outdated
@carbonin
Copy link
Copy Markdown
Collaborator

I didn't look deeply into this, but I will say if the goal is to do some analysis without the logs then we should probably change the description of the tool that calls all of this.

I think right now the description is very specific about needing logs available to run.

@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 1a30d36 to 0ee69cd Compare October 17, 2025 21:26
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 17, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Cluster diagnostics now run even when logs are incomplete or unavailable; events and metadata can be loaded directly for analysis.

  • Ability to select a subset of checks to run by name.

  • Improvements

  • Several checks updated to run without requiring logs, enabling broader analysis coverage.

  • Better handling and exposure of cluster events and metadata for more reliable diagnostics.

  • Tests

  • Tests updated to exercise log‑missing and log‑available paths more explicitly.

  • Documentation

  • Wording clarified to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

80-92: Type-safe time formatting and narrowed exceptions — LGTM.

Accepts str | datetime and avoids blanket Exception; matches earlier review guidance.

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

76-80: Warn when requested signatures are skipped due to missing logs.

Currently silent. Add a targeted warning to improve UX. (This mirrors a prior suggestion.)

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            # Inform users when requested signatures require logs but logs are unavailable
+            if cluster.logs_info != "completed":
+                requires_logs = {
+                    sig.__name__ for sig in ALL_SIGNATURES if getattr(sig, "logs_required", True)
+                }
+                skipped = sorted(set(specific_signatures) & requires_logs)
+                if skipped:
+                    logger.warning(
+                        "Skipped signatures requiring logs (logs unavailable): %s",
+                        ", ".join(skipped),
+                    )
🧹 Nitpick comments (4)
assisted_service_mcp/src/tools/cluster_tools.py (1)

367-388: Function name no longer matches its behavior; clarify vague docstring terms.

The updated docstring correctly reflects that logs are optional and addresses the reviewer's concern. However:

  1. Naming inconsistency: The function name analyze_cluster_logs implies logs are the primary or required input, but the docstring now states logs are optional ("if available"). This creates a mismatch between the function name and its actual behavior. Consider renaming to analyze_cluster to better reflect the broader scope, though be aware this would be a breaking API change.

  2. Vague terminology:

    • "cluster's data" (line 374) is unclear—does this refer to cluster metadata, events, configuration, or something else?
    • "Cluster is available (downloadable via the API)" (line 380) is ambiguous—what specifically is "downloadable"? The cluster metadata, the logs, or both?

Please clarify these terms in the docstring to help users understand what analysis is performed and what prerequisites are actually required.

assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)

54-55: Make logs_required a ClassVar and document the contract.

Prevents accidental instance shadowing and clarifies intent for subclass authors.

-from typing import Optional, Any, Sequence
+from typing import Optional, Any, Sequence, ClassVar
...
-class Signature(abc.ABC):
+class Signature(abc.ABC):
@@
-    logs_required = True
+    # Subclasses may override to opt into running without logs.
+    logs_required: ClassVar[bool] = True
assisted_service_mcp/src/utils/log_analyzer/main.py (1)

91-100: Log exceptions with stack traces for easier debugging.

Use logger.exception to include traceback.

-            except Exception as e:
-                logger.error(
-                    "Error running signature %s: %s", signature_class.__name__, e
-                )
+            except Exception:
+                logger.exception("Error running signature %s", signature_class.__name__)
@@
-    except Exception as e:
-        logger.error("Error analyzing cluster %s: %s", cluster_id, e)
+    except Exception:
+        logger.exception("Error analyzing cluster %s", cluster_id)
         raise
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

171-173: Prefer logger.exception for metadata load failures.

Adds traceback to aid debugging.

-            except Exception as e:
-                logger.error("Failed to load metadata: %s", e)
+            except Exception:
+                logger.exception("Failed to load metadata")
                 raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a30d36 and 0ee69cd.

📒 Files selected for processing (5)
  • assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3 hunks)
  • tests/test_log_analyzer.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_log_analyzer.py
🧰 Additional context used
🧬 Code graph analysis (2)
assisted_service_mcp/src/utils/log_analyzer/main.py (2)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_events (173-214)
  • get_cluster_logs (157-170)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
  • ClusterAnalyzer (22-141)
  • LogAnalyzer (144-271)
  • set_cluster_events (36-38)
  • set_cluster_metadata (29-34)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
tests/test_log_analyzer.py (1)
  • get (10-15)
assisted_service_mcp/src/tools/event_tools.py (1)
  • cluster_events (12-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (1)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

130-142: Hostname extraction utility — LGTM.

Clear precedence and safe fallbacks.

Comment thread assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py
@CrystalChun CrystalChun marked this pull request as draft October 20, 2025 14:14
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2025
logs_required will indicate if a signature
requires logs downloaded from the cluster to be run.
It will default to True for all signatures
unless marked otherwise.

The signatures that do not require logs have been
marked. The following are signatures that do not need
logs:
- ComponentsVersionSignature
- SlowImageDownloadSignature
- SNOMachineCidrSignature
- SNOHostnameHasEtcd
- EventsInstallationAttempts
- LibvirtRebootFlagSignature
This class will be the parent class of LogAnalyzer and
will contain the cluster metadata and events for a cluster.
This function was doing what the get last install
cluster event was also doing. Removed the partitioning
so that it returns all cluster events as it is named.
Moves the functions that are common to both ClusterAnalyzer and
LogAnalyzer to the parent class (ClusterAnalyzer) so both
can access them.
@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 0ee69cd to 4eb8866 Compare October 21, 2025 22:01
@openshift-ci openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2025
@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 4eb8866 to 55949bc Compare October 21, 2025 22:08
@CrystalChun CrystalChun marked this pull request as ready for review October 21, 2025 22:08
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2025
@openshift-ci openshift-ci Bot requested a review from eranco74 October 21, 2025 22:08
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 21, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

  • Select a subset of checks by name; unknown names produce warnings.

  • Improvements

  • More checks no longer require logs, widening analysis coverage.

  • Improved handling and exposure of cluster events, metadata, and host identification for more reliable diagnostics.

  • Tests

  • Tests expanded to cover both log‑available and log‑missing flows.

  • Documentation

  • Clarified wording to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

57-97: Add error handling for timestamp parsing and missing hosts.

The current implementation has two issues:

  1. Lines 64-73, 78-85: The match/case blocks handle type checking but don't catch parsing exceptions from dateutil.parser.isoparse(), which can raise ValueError or other exceptions.

  2. Line 89: Assumes md["cluster"]["hosts"] exists, which could raise KeyError.

Apply this diff to add proper error handling:

     @staticmethod
     def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        install_started_at = md.get("cluster", {}).get("install_started_at")
-        if not install_started_at:
-            return md
-
-        match install_started_at:
-            case str():
-                installation_start_time = dateutil.parser.isoparse(install_started_at)
-            case datetime():
-                installation_start_time = install_started_at
-            case _:
-                logger.error(
-                    "Unable to parse install started at: %s", install_started_at
-                )
-                return md
+        cluster = md.get("cluster", {})
+        install_started_at = cluster.get("install_started_at")
+        installation_start_time = None
+        
+        if install_started_at:
+            try:
+                match install_started_at:
+                    case str():
+                        installation_start_time = dateutil.parser.isoparse(install_started_at)
+                    case datetime():
+                        installation_start_time = install_started_at
+                    case _:
+                        logger.debug(
+                            "Unable to parse install_started_at (unknown type): %s", install_started_at
+                        )
+            except (ValueError, TypeError) as e:
+                logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e)

         def host_deleted_before_installation_started(host):
+            if not installation_start_time:
+                return False
             if deleted_at := host.get("deleted_at"):
-                # Handle both datetime objects and ISO strings
-                match deleted_at:
-                    case str():
-                        deleted_at_time = dateutil.parser.isoparse(deleted_at)
-                    case datetime():
-                        deleted_at_time = deleted_at
-                    case _:
-                        logger.error("Unable to parse deleted at: %s", deleted_at)
-                        return False
-                return deleted_at_time < installation_start_time
+                try:
+                    match deleted_at:
+                        case str():
+                            deleted_at_time = dateutil.parser.isoparse(deleted_at)
+                        case datetime():
+                            deleted_at_time = deleted_at
+                        case _:
+                            logger.debug("Unable to parse deleted_at (unknown type): %s", deleted_at)
+                            return False
+                    return deleted_at_time < installation_start_time
+                except (ValueError, TypeError) as e:
+                    logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e)
+                    return False
             return False

-        all_hosts = md["cluster"]["hosts"]
-        md["cluster"]["deleted_hosts"] = [
+        all_hosts = list(cluster.get("hosts", []))
+        cluster["deleted_hosts"] = [
             h for h in all_hosts if host_deleted_before_installation_started(h)
         ]
-        md["cluster"]["hosts"] = [
+        cluster["hosts"] = [
             h for h in all_hosts if not host_deleted_before_installation_started(h)
         ]
+        
+        md["cluster"] = cluster
         return md
assisted_service_mcp/src/utils/log_analyzer/main.py (1)

77-80: Add warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, signatures are filtered to those with logs_required=False (lines 62-64), then user-requested specific_signatures are applied (lines 77-80). If a user requests a signature requiring logs when logs aren't available, it will be silently skipped.

Consider logging which requested signatures were skipped:

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
+            before_filter = {sig.__name__ for sig in signatures_to_run}
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            after_filter = {sig.__name__ for sig in signatures_to_run}
+            skipped = set(specific_signatures) - after_filter
+            if skipped and cluster.logs_info != "completed":
+                logger.warning(
+                    "Skipped signatures requiring logs (logs unavailable): %s",
+                    ", ".join(sorted(skipped))
+                )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee69cd and 55949bc.

📒 Files selected for processing (12)
  • assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/__init__.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/main.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py (1 hunks)
  • tests/test_log_analyzer.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • assisted_service_mcp/src/utils/log_analyzer/signatures/platform_specific.py
  • assisted_service_mcp/src/utils/log_analyzer/signatures/base.py
  • assisted_service_mcp/src/utils/log_analyzer/signatures/basic_info.py
  • assisted_service_mcp/src/utils/log_analyzer/signatures/error_detection.py
  • assisted_service_mcp/src/tools/cluster_tools.py
  • assisted_service_mcp/src/utils/log_analyzer/signatures/performance.py
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (3)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
  • ClusterAnalyzer (23-152)
  • set_cluster_metadata (30-35)
  • set_cluster_events (37-39)
  • metadata (42-44)
  • metadata (171-185)
  • cluster_events (47-49)
  • get_all_cluster_events (51-55)
  • get_all_cluster_events (187-199)
  • get_last_install_cluster_events (99-110)
  • get_events_by_host (133-139)
  • get_hostname (142-152)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_cluster_logs (157-170)
  • get_events (173-214)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
  • analyze_cluster (21-100)
assisted_service_mcp/src/utils/log_analyzer/main.py (3)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_events (173-214)
  • get_cluster_logs (157-170)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
  • ClusterAnalyzer (23-152)
  • LogAnalyzer (155-282)
  • set_cluster_events (37-39)
  • set_cluster_metadata (30-35)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)
  • SignatureResult (15-48)
  • Signature (51-90)
  • analyze (61-70)
assisted_service_mcp/src/utils/log_analyzer/__init__.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
  • ClusterAnalyzer (23-152)
  • LogAnalyzer (155-282)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)
tests/test_log_analyzer.py (1)
  • get (10-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (12)
assisted_service_mcp/src/utils/log_analyzer/signatures/advanced_analysis.py (1)

82-83: LGTM! Correctly declares the signature does not require logs.

The EventsInstallationAttempts signature only uses event and metadata APIs (get_all_cluster_events(), partition_cluster_events(), get_last_install_cluster_events()) that are available on the base ClusterAnalyzer, so logs_required = False is accurate.

assisted_service_mcp/src/utils/log_analyzer/signatures/networking.py (1)

30-31: LGTM! Correctly declares the signature does not require logs.

The SNOMachineCidrSignature signature only accesses log_analyzer.metadata, which is available on the base ClusterAnalyzer, so logs_required = False is accurate.

assisted_service_mcp/src/utils/log_analyzer/__init__.py (1)

7-16: LGTM! Correctly exposes the new base class in the public API.

The changes appropriately extend the package's public API to include ClusterAnalyzer alongside the existing LogAnalyzer, supporting the refactoring that introduces a base analyzer without log dependencies.

tests/test_log_analyzer.py (4)

20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.

The test thoroughly validates the base ClusterAnalyzer functionality including metadata handling, deleted host separation, event partitioning, and hostname extraction.


183-211: LGTM! Test correctly validates the logs-available path.

The test properly mocks the cluster object and logs archive, ensuring the main flow executes the logs-available path with LogAnalyzer.


214-248: LGTM! Test validates the no-logs execution path.

The test correctly mocks an unavailable logs scenario and verifies that signatures with logs_required = False execute properly using ClusterAnalyzer.


250-284: LGTM! Test validates signature execution with logs available.

The test correctly verifies that a signature requiring logs executes when logs are available.

assisted_service_mcp/src/utils/log_analyzer/main.py (2)

43-65: LGTM! Log availability check and ClusterAnalyzer initialization are correct.

The code properly checks log availability via cluster.logs_info and initializes ClusterAnalyzer with metadata and events when logs are unavailable, correctly filtering to signatures with logs_required = False.


103-116: LGTM! Signature filtering logic is correct.

The filter_signatures function properly maps signature names to classes and warns about unknown signatures.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3)

23-56: LGTM! ClusterAnalyzer base class API is well-designed.

The parameterless constructor and explicit setter methods (set_cluster_metadata, set_cluster_events) provide a clean separation between cluster-level analysis (no logs) and log-based analysis, with good defensive programming practices.


155-199: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.

The subclass properly overrides metadata and get_all_cluster_events to load from the logs archive while maintaining the same interface, with appropriate error handling.


141-152: LGTM! Robust hostname extraction with proper fallbacks.

The get_hostname method handles multiple hostname sources with appropriate error handling and fallback logic.

Comment thread tests/test_log_analyzer.py
Comment thread tests/test_log_analyzer.py
@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 55949bc to 16aa804 Compare October 21, 2025 22:25
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 21, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

  • Select a subset of checks by name; unknown names produce warnings.

  • Improvements

  • More checks marked as not requiring logs, widening analysis coverage.

  • Unified handling and clearer exposure of cluster events, metadata, and host identification for more reliable diagnostics.

  • Tests

  • Expanded tests covering both log‑available and log‑missing flows.

  • Documentation

  • Clarified docstring to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55949bc and 16aa804.

📒 Files selected for processing (1)
  • tests/test_log_analyzer.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_log_analyzer.py (2)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
  • ClusterAnalyzer (23-152)
  • set_cluster_metadata (30-35)
  • set_cluster_events (37-39)
  • metadata (42-44)
  • metadata (171-185)
  • cluster_events (47-49)
  • get_all_cluster_events (51-55)
  • get_all_cluster_events (187-199)
  • get_last_install_cluster_events (99-110)
  • get_events_by_host (133-139)
  • get_hostname (142-152)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
  • analyze_cluster (21-100)
🔇 Additional comments (6)
tests/test_log_analyzer.py (6)

1-18: LGTM!

The addition of AsyncMock import and the make_archive helper function are well-structured for the test scenarios.


20-109: LGTM!

Comprehensive test coverage for the new ClusterAnalyzer base class. The test validates metadata handling, event partitioning (including reset events), deleted host filtering, and hostname resolution.


183-212: LGTM!

The test has been properly updated to use the new mock structure with logs_info and to_dict() return value, correctly triggering the logs-available path.


251-285: LGTM! Past review issues addressed.

The test correctly verifies that ApiInvalidCertificateSignature runs when logs are available. The mock setup properly configures to_dict.return_value and logs_info, and the signature name has been corrected from the previous review.


287-318: LGTM! Past review issues resolved.

The test correctly validates that ApiInvalidCertificateSignature does not run when logs are unavailable. The mock configuration has been fixed per previous feedback: to_dict.return_value is properly set and logs_info = "not_completed" correctly triggers the no-logs path.


320-348: LGTM! Past review issues addressed.

The test properly validates SlowImageDownloadSignature behavior without logs. The mock setup has been corrected per previous feedback, with to_dict.return_value and logs_info properly configured.

assert len(results) == 2
for result in results:
assert result.title in ["No etcd in SNO hostname", "Slow Image Download"]
if result.title in "Slow Image Download":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the string comparison operator.

Line 247 uses in for string comparison, which checks substring membership. This should be an equality check.

Apply this diff:

-        if result.title in "Slow Image Download":
+        if result.title == "Slow Image Download":
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if result.title in "Slow Image Download":
if result.title == "Slow Image Download":
🤖 Prompt for AI Agents
In tests/test_log_analyzer.py around line 247, the code uses the membership
operator `in` to compare result.title to the literal "Slow Image Download",
which tests substring membership rather than equality; change the comparison to
an equality check (use ==) so it only matches when result.title exactly equals
"Slow Image Download".

"""Get cluster metadata."""
if self._metadata is None:
try:
metadata_content = self.logs_archive.get("cluster_metadata.json")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to me that we now sometimes get the metadata from the API and other times get it from the log bundle.

Did you consider making this consistent (I guess just always getting it from the API)?

I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.

Also it's a bit odd that the user of the ClusterAnalyzer is responsible for fetching the metadata and events from outside the object and setting them where LogAnalyzer handles that for the caller. We should probably be more consistent there as well.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then a follow on from this would be that maybe we don't need separate classes at all.

If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.

Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making this consistent (I guess just always getting it from the API)?
I could see a situation where if the logs are available we're actually acting on old data where if they are not then we get the data from the live cluster.

Yes that's exactly it, the data from the live cluster might differ from what's in the log bundle.

I've considered making it toggle-able as in either choose to use the api data or the log bundle data if it's available for the cluster metadata/events. However, this would then need to be something that the model decides when calling the tool, and I'm not sure how to do that...


If the behavior is the same for getting the data (cluster and events) the Analyzer could detect if the logs are available and make that information public. Then the caller could filter the signatures depending on if the Analyzer detected that logs are available or not.

Sure we could do it this way instead too, I was just hesitant to pass in the api client to the analyzer since the client is used outside of it.

Or even better the signature could check analyzer.logs_available if/when it needs them and just no-op if the logs are required. Then we don't need to do any filtering at all.

Yeah that works too, but it just requires more changes where we'd need to update all of the signatures that do require logs to do this check.

Comment thread assisted_service_mcp/src/utils/log_analyzer/signatures/base.py Outdated
Comment thread assisted_service_mcp/src/tools/cluster_tools.py Outdated
In analyze_cluster, adds logic to call the api first to see
if logs are available, and initialize the corresponding analyzer.

Also updates which signatures to run depending on the analyzer
chosen.
@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 16aa804 to b6530e0 Compare October 22, 2025 18:01
@openshift-ci openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 22, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 22, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

  • Select a subset of checks by name; unknown names produce warnings.

  • Improvements

  • More checks marked as not requiring logs, widening analysis coverage.

  • Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.

  • Tests

  • Expanded coverage for both log-present and log-missing paths.

  • Documentation

  • Docstring updated to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from b6530e0 to 8760682 Compare October 22, 2025 18:08
Tests new cluster analyzer and running signatures when logs
are not available for a cluster.
@CrystalChun CrystalChun force-pushed the logs-not-required-signatures branch from 8760682 to b4faa94 Compare October 22, 2025 18:10
@openshift-ci openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 22, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 22, 2025

@CrystalChun: This pull request references MGMT-21844 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Certain Signatures do not require the log bundle to run.

This introduces a way to identify which Signatures don't require logs, and to enable them to run even if the log bundle is not available.

The log_analyzer can determine if a log bundle is available for the cluster and will select the corresponding Signatures.

NOTE: This PR is broken up into multiple commits to help the reviewer view the changes one step at a time.

Summary by CodeRabbit

  • New Features

  • Diagnostics run even when logs are incomplete or unavailable; events and metadata can be loaded directly.

  • Select a subset of checks by name; unknown names produce warnings.

  • Troubleshooting tool updated to reflect broader cluster-data handling.

  • Improvements

  • More checks marked as not requiring logs, widening analysis coverage.

  • Unified handling and clearer exposure of cluster events, metadata, and hostname resolution for more reliable diagnostics.

  • Tests

  • Expanded coverage for both log-present and log-missing paths.

  • Documentation

  • Docstring updated to reflect analysis from cluster data or logs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)

43-80: Consider warning when user-requested signatures are skipped due to log unavailability.

When logs are unavailable, the code filters to signatures where logs_required=False (lines 62-64), then applies user-requested specific_signatures (lines 77-80). If a user requests a signature that requires logs when logs aren't available, it will be silently skipped without feedback.

This matches a concern raised in past review comments. Consider adding a warning after line 80 to inform users which requested signatures were skipped:

         # If specific signatures are provided, filter the signatures to run
         if specific_signatures:
+            before_filter = {sig.__name__ for sig in signatures_to_run}
             signatures_to_run = filter_signatures(
                 signatures_to_run, specific_signatures
             )
+            after_filter = {sig.__name__ for sig in signatures_to_run}
+            skipped = set(specific_signatures) - after_filter
+            if skipped and cluster.logs_info != "completed":
+                logger.warning(
+                    "Skipped signatures requiring logs (logs unavailable): %s",
+                    ", ".join(sorted(skipped))
+                )
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (1)

56-76: Add error handling for missing or invalid timestamps.

The _clean_metadata_json method assumes md["cluster"]["install_started_at"] exists and can be parsed. If the field is missing or contains an invalid timestamp, this will raise KeyError or parsing exceptions.

This was flagged in previous reviews. Consider adding defensive checks:

     @staticmethod
     def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        installation_start_time = dateutil.parser.isoparse(
-            str(md["cluster"]["install_started_at"])
-        )
+        cluster = md.get("cluster", {})
+        install_started_at = cluster.get("install_started_at")
+        installation_start_time = None
+        if install_started_at:
+            try:
+                installation_start_time = dateutil.parser.isoparse(str(install_started_at))
+            except (ValueError, TypeError) as e:
+                logger.debug("Unable to parse install_started_at %s: %s", install_started_at, e)
 
         def host_deleted_before_installation_started(host):
+            if not installation_start_time:
+                return False
             if deleted_at := host.get("deleted_at"):
-                return dateutil.parser.isoparse(deleted_at) < installation_start_time
+                try:
+                    return dateutil.parser.isoparse(deleted_at) < installation_start_time
+                except (ValueError, TypeError) as e:
+                    logger.debug("Unable to parse deleted_at %s: %s", deleted_at, e)
+                    return False
             return False
 
-        all_hosts = md["cluster"]["hosts"]
-        md["cluster"]["deleted_hosts"] = [
+        all_hosts = list(cluster.get("hosts", []))
+        cluster["deleted_hosts"] = [
             h for h in all_hosts if host_deleted_before_installation_started(h)
         ]
-        md["cluster"]["hosts"] = [
+        cluster["hosts"] = [
             h for h in all_hosts if not host_deleted_before_installation_started(h)
         ]
+        md["cluster"] = cluster
         return md

Note: Previous review comments indicated this was addressed in commits 7547111 to 55949bc, but the current code still lacks these safeguards.

tests/test_log_analyzer.py (1)

214-249: Fix string comparison on line 247.

Line 247 uses the in operator for string comparison, which checks substring membership rather than equality.

Apply this diff:

         if result.title in "Slow Image Download":
-            assert "Detected slow image download rate (MBps):" in result.content
+        if result.title == "Slow Image Download":
+            assert "Detected slow image download rate (MBps):" in result.content

Note: This issue was flagged in a previous review but remains unresolved.

🧹 Nitpick comments (1)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)

62-64: Consider more idiomatic boolean check.

The expression sig.logs_required is False works but could be more Pythonic.

Apply this diff:

             # Select signatures that don't require logs
             signatures_to_run = [
-                sig for sig in ALL_SIGNATURES if sig.logs_required is False
+                sig for sig in ALL_SIGNATURES if not sig.logs_required
             ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16aa804 and b4faa94.

📒 Files selected for processing (5)
  • assisted_service_mcp/src/mcp.py (1 hunks)
  • assisted_service_mcp/src/tools/cluster_tools.py (1 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (3 hunks)
  • assisted_service_mcp/src/utils/log_analyzer/main.py (4 hunks)
  • tests/test_log_analyzer.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_log_analyzer.py (3)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (11)
  • ClusterAnalyzer (22-131)
  • set_cluster_metadata (29-34)
  • set_cluster_events (36-38)
  • metadata (41-43)
  • metadata (150-164)
  • cluster_events (46-48)
  • get_all_cluster_events (50-54)
  • get_all_cluster_events (166-178)
  • get_last_install_cluster_events (78-89)
  • get_events_by_host (112-118)
  • get_hostname (121-131)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_cluster_logs (157-170)
  • get_events (173-214)
assisted_service_mcp/src/utils/log_analyzer/main.py (1)
  • analyze_cluster (21-100)
assisted_service_mcp/src/mcp.py (1)
assisted_service_mcp/src/tools/cluster_tools.py (1)
  • troubleshoot_cluster (368-388)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)
tests/test_log_analyzer.py (1)
  • get (10-15)
assisted_service_mcp/src/tools/event_tools.py (1)
  • cluster_events (12-41)
assisted_service_mcp/src/utils/log_analyzer/main.py (3)
assisted_service_mcp/src/service_client/assisted_service_api.py (3)
  • get_cluster (117-141)
  • get_events (173-214)
  • get_cluster_logs (157-170)
assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (4)
  • ClusterAnalyzer (22-131)
  • LogAnalyzer (134-261)
  • set_cluster_events (36-38)
  • set_cluster_metadata (29-34)
assisted_service_mcp/src/utils/log_analyzer/signatures/base.py (3)
  • SignatureResult (15-48)
  • Signature (51-85)
  • analyze (61-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (8)
assisted_service_mcp/src/mcp.py (1)

78-78: LGTM! Tool registration updated to reflect broader troubleshooting scope.

The change from analyze_cluster_logs to troubleshoot_cluster correctly reflects the new capability to analyze clusters even when logs are unavailable.

assisted_service_mcp/src/tools/cluster_tools.py (1)

368-388: LGTM! Function rename and documentation updates accurately reflect the new behavior.

The rename to troubleshoot_cluster and the updated docstring correctly communicate that this function can now analyze cluster data even when logs are unavailable.

assisted_service_mcp/src/utils/log_analyzer/main.py (1)

103-116: LGTM! Clean signature filtering implementation.

The filter_signatures helper correctly maps signature names to classes and warns about unknown signatures.

tests/test_log_analyzer.py (3)

20-109: LGTM! Comprehensive test coverage for ClusterAnalyzer.

The test validates metadata handling, deleted host separation, event partitioning, and hostname resolution.


251-318: LGTM! Good test coverage for log-present and log-absent scenarios.

These tests correctly verify that signatures requiring logs run only when logs are available and are skipped when logs are unavailable.


320-348: LGTM! Test validates log-independent signature execution.

The test correctly verifies that SlowImageDownloadSignature runs when logs are not available.

assisted_service_mcp/src/utils/log_analyzer/log_analyzer.py (2)

22-132: LGTM! Clean base class design for cluster analysis.

The ClusterAnalyzer base class provides a clear separation of concerns, allowing analysis with or without log archives.


134-178: LGTM! LogAnalyzer correctly extends ClusterAnalyzer.

The refactored LogAnalyzer properly inherits from ClusterAnalyzer and loads metadata and events from the log archive.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Oct 22, 2025

@CrystalChun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/eval-test b4faa94 link false /test eval-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

# first call the api to get the cluster and check if logs are available
cluster = await api_client.get_cluster(cluster_id)

if cluster.logs_info != "completed":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to actually check if we can download the logs.

I say this because when you reset a cluster the logs_info gets reset, but the actual log bundle is still present and we should be able to use it.

@CrystalChun
Copy link
Copy Markdown
Contributor Author

/close

The log bundle will always be available. Events and metadata are uploaded to the "log bundle" on demand when the GET logs API is called.

@openshift-ci openshift-ci Bot closed this Oct 23, 2025
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Oct 23, 2025

@CrystalChun: Closed this PR.

Details

In response to this:

/close

The log bundle will always be available. Events and metadata are uploaded to the "log bundle" on demand when the GET logs API is called.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants