Skip to content

Add log-analyzer to repo#120

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift-assisted:masterfrom
keitwb:log-analyzer
Oct 14, 2025
Merged

Add log-analyzer to repo#120
openshift-merge-bot[bot] merged 1 commit intoopenshift-assisted:masterfrom
keitwb:log-analyzer

Conversation

@keitwb
Copy link
Copy Markdown
Contributor

@keitwb keitwb commented Oct 7, 2025

Summary by CodeRabbit

  • New Features

    • Added a Log Analyzer that inspects cluster logs and produces human-readable findings across basics, networking, performance, platform-specific, error detection, and advanced analyses.
    • Added a runtime troubleshooting tool (gated by ENABLE_TROUBLESHOOTING_TOOLS) to run analyses and display formatted results.
  • Chores

    • Included the log analyzer in the container image, centralized type-check config, updated tooling/dependencies (added log/archive/tableing libs), simplified make targets, and removed the pydocstyle CI workflow.

@openshift-ci openshift-ci Bot requested review from eranco74 and omertuc October 7, 2025 20:12
@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 7, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 7, 2025

Walkthrough

Introduces a log analysis subsystem (LogAnalyzer, signature framework, many signatures), CLI/server entrypoints to run analyses, InventoryClient support to download cluster logs, packaging and container updates, CI/mypy config moved to pyproject.toml, and a feature gate to enable troubleshooting tools.

Changes

Cohort / File(s) Summary
CI & dev tooling
\.github/workflows/mypy.yaml, .github/workflows/pydocstyle.yaml, Makefile, pyproject.toml
Replace explicit mypy flags with --config-file pyproject.toml and add [tool.mypy] configuration in pyproject.toml; remove pydocstyle workflow; bump/add dev/runtime deps and update Makefile targets (check-types, verify, format).
Container image
Dockerfile
Add COPY log_analyzer ./log_analyzer/ to include the package in the image.
Package entry & metadata
log_analyzer/__init__.py
Add package docstring, __version__, __all__, and re-export LogAnalyzer, ALL_SIGNATURES, SignatureResult.
Core analyzer
log_analyzer/log_analyzer.py
New LogAnalyzer class: lazy metadata loading/cleaning, event extraction/partitioning, host grouping, and helpers to read host logs, journals, controller logs and must-gather supporting new/old bundle layouts.
Orchestration / CLI / server
log_analyzer/main.py, server.py, template.yaml
Add analyze_cluster(...) and print_results(...); add analyze_cluster_logs tool and conditionally register it when ENABLE_TROUBLESHOOTING_TOOLS is enabled; template.yaml adds parameter and env binding.
Service client
service_client/assisted_service_api.py
Add InventoryClient.get_cluster_logs(cluster_id) returning a nestedarchive.RemoteNestedArchive from a presigned URL; import nestedarchive and PresignedUrl.
Signatures framework & exports
log_analyzer/signatures/base.py, log_analyzer/signatures/__init__.py
Add Signature, ErrorSignature, SignatureResult with helpers (table formatting, time parsing); signatures package auto-collects and exposes ALL_SIGNATURES.
Signatures: basic info
log_analyzer/signatures/basic_info.py
Add signatures for hosts status, component versions, failure description, host extra details, interfaces, and storage details.
Signatures: error detection
log_analyzer/signatures/error_detection.py
Add multiple error-detection signatures (certificates, ignition/pull failures, release pull errors, cleanupInstallDevice, missing MC, pod RW layer errors, etc.) and a pattern-search helper.
Signatures: networking
log_analyzer/signatures/networking.py
Add networking signatures (SNO machine CIDR, duplicate VIPs, nameserver overlap, MTU mismatch, dual-stack errors, UMN load-balancer checks).
Signatures: performance
log_analyzer/signatures/performance.py
Add performance signatures (slow image downloads, OS installation time, disk FIO detection) with event extraction helpers and a NoEventFound flow.
Signatures: advanced analysis
log_analyzer/signatures/advanced_analysis.py
Add cross-source analyses and operator/controller-focused signatures (installation attempts, must-gather checks, controller warnings, user-login checks, etc.).
Signatures: platform-specific
log_analyzer/signatures/platform_specific.py
Add Libvirt reboot-flag detection and IP-changed-after-reboot signature (journal parsing, inventory address mapping).
Signatures package initializer
log_analyzer/signatures/__init__.py
Auto-collect signature classes, sort deterministically, and expose ALL_SIGNATURES.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Main as log_analyzer.main
  participant API as InventoryClient
  participant NA as RemoteNestedArchive
  participant LA as LogAnalyzer
  participant SIG as Signatures (ALL_SIGNATURES)

  User->>Main: analyze_cluster(cluster_id, api_client)
  Main->>API: get_cluster_logs(cluster_id)
  API->>NA: RemoteNestedArchive(presigned_url, init_download=True)
  API-->>Main: RemoteNestedArchive
  Main->>LA: LogAnalyzer(archive)
  loop for each signature
    Main->>SIG: instantiate & analyze(la)
    SIG->>LA: request logs/metadata
    LA-->>SIG: logs / metadata / events
    SIG-->>Main: Optional SignatureResult
  end
  Main-->>User: List[SignatureResult]
Loading
sequenceDiagram
  autonumber
  actor Operator
  participant Server
  participant Tool as analyze_cluster_logs()
  participant Main as analyze_cluster()
  participant API as InventoryClient

  Note over Server: registered only if ENABLE_TROUBLESHOOTING_TOOLS=1
  Operator->>Server: invoke analyze_cluster_logs(cluster_id)
  Server->>Tool: analyze_cluster_logs(cluster_id)
  Tool->>Main: analyze_cluster(cluster_id, api_client)
  Main->>API: get_cluster_logs(cluster_id)
  API-->>Main: RemoteNestedArchive
  Main-->>Tool: formatted results
  Tool-->>Operator: analysis text
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

lgtm

Poem

I thump my paw on tarballs tight,
I sniff through logs by day and night.
Signatures hop and flags appear,
I point the oddities loud and clear.
Enable the gate — I’ll lend a bite. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.53% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add log-analyzer to repo" succinctly captures the main change of introducing the log-analyzer component into the repository, is clear and concise, and avoids unnecessary detail while conveying the primary intent to any reviewer scanning the history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (12)
pyproject.toml (1)

81-89: Consider incremental type-checking for log_analyzer.

Excluding the entire log_analyzer/ package from type checking creates a maintenance burden and reduces code quality. While the PR description notes "too many errors," a better approach would be to:

  1. Enable type checking with less strict settings initially (e.g., only disallow_untyped_defs = false)
  2. Add # type: ignore comments for specific problematic lines
  3. Incrementally address type errors over time

This prevents the excluded code from accumulating more type issues.

Consider this incremental approach:

 [tool.mypy]
-exclude = "log_analyzer/"
 follow_imports = "skip"
 explicit_package_bases = true
 disallow_untyped_calls = true
 disallow_untyped_defs = true 
 disallow_incomplete_defs = true
 ignore_missing_imports = true 
 disable_error_code = ["attr-defined"]
+
+# Less strict settings for log_analyzer during migration
+[[tool.mypy.overrides]]
+module = "log_analyzer.*"
+disallow_untyped_defs = false
+disallow_incomplete_defs = false
log_analyzer/signatures/__init__.py (2)

9-14: Consider explicit imports instead of wildcards.

While wildcard imports with # noqa are acceptable in package __init__.py files for re-exporting, they can:

  1. Hide unused imports and make it harder to track which symbols are actually used
  2. Make static analysis tools less effective
  3. Import unexpected symbols if signature modules export additional items

Consider using explicit imports:

-from .basic_info import *  # noqa
-from .error_detection import *  # noqa
-from .performance import *  # noqa
-from .networking import *  # noqa
-from .advanced_analysis import *  # noqa
-from .platform_specific import *  # noqa
+from . import basic_info
+from . import error_detection
+from . import performance
+from . import networking
+from . import advanced_analysis
+from . import platform_specific

Then update the discovery loop to scan these module objects instead of the current module.


16-31: Dynamic signature discovery is clever but has tradeoffs.

The dynamic discovery of Signature subclasses using inspect is flexible and automatically includes new signatures. However:

Pros:

  • Automatically includes new signature classes without manual registration
  • Clean and minimal code

Cons:

  • Less explicit; developers must understand the magic
  • Harder to debug if a signature doesn't appear in ALL_SIGNATURES
  • Can't easily control ordering beyond alphabetical

Consider whether an explicit registry would be clearer for maintainability:

# Alternative explicit approach
from .basic_info import BasicInfoSignature  # example
from .error_detection import ErrorDetectionSignature  # example
# ... more imports

ALL_SIGNATURES = [
    BasicInfoSignature,
    ErrorDetectionSignature,
    # ... explicit list
]

This makes it immediately clear which signatures are included and allows custom ordering.

log_analyzer/main.py (1)

16-23: Prefer logger.exception in exception handlers to include stack traces.

Improves debuggability for signature failures and overall analysis errors.

Apply this diff:

 def setup_logging(verbose: bool = False) -> None:
-    """Set up colored logging."""
+    """Set up basic logging."""
@@
-        except Exception as e:
-            logger.error(
-                "Error running signature %s: %s", signature_class.__name__, e
-            )
+        except Exception:
+            logger.exception("Error running signature %s", signature_class.__name__)
@@
-    except Exception as e:
-        logger.error("Error analyzing cluster %s: %s", cluster_id, e)
+    except Exception:
+        logger.exception("Error analyzing cluster %s", cluster_id)
         raise
@@
-    except Exception as e:
-        logging.getLogger(__name__).error("Analysis failed: %s", e)
+    except Exception:
+        logging.getLogger(__name__).exception("Analysis failed")
         return 1

Also applies to: 75-79, 82-85, 167-169

log_analyzer/signatures/platform_specific.py (1)

62-74: Harden parsing (regex, JSON, IPs) and journal decompression.

Prevents false negatives and crashes on common log shapes/corruption.

Apply this diff:

-    agent_log_regex = re.compile(
-        r"Sending step <inventory-[0-9a-f]{8}> reply output <(.+)> error <.*> exit-code <0>"
-    )
+    agent_log_regex = re.compile(
+        r"Sending step <inventory-[0-9a-f]{8}> reply output <(.+?)> error <.*> exit-code <0>",
+        re.DOTALL,
+    )
@@
-    def _get_inventory(self, log_analyzer, host_id):
+    def _get_inventory(self, log_analyzer, host_id):
         try:
             agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
         except FileNotFoundError:
             return None
-        match = self.agent_log_regex.search(agent_logs)
-        return json.loads(match.group(1).replace('\\"', '"')) if match else None
+        match = self.agent_log_regex.search(agent_logs)
+        if not match:
+            return None
+        try:
+            return json.loads(match.group(1).replace('\\"', '"'))
+        except json.JSONDecodeError:
+            return None
@@
-    def _get_address_map(self, inventory):
+    def _get_address_map(self, inventory):
         address_map = {}
-        interfaces = inventory.get("interfaces")
-        if interfaces:
-            for interface in interfaces:
-                for key in ["ipv4_addresses", "ipv6_addresses"]:
-                    addresses = interface.get(key)
-                    if addresses:
-                        for addr in addresses:
-                            intf = ipaddress.ip_interface(addr)
-                            address_map[str(intf.ip)] = str(intf.network)
+        interfaces = inventory.get("interfaces") or []
+        for interface in interfaces:
+            for key in ["ipv4_addresses", "ipv6_addresses"]:
+                addresses = interface.get(key) or []
+                for addr in addresses:
+                    try:
+                        intf = ipaddress.ip_interface(addr)
+                    except ValueError:
+                        continue
+                    address_map[str(intf.ip)] = str(intf.network)
         return address_map
@@
-                except FileNotFoundError:
+                except (FileNotFoundError, OSError, gzip.BadGzipFile):
                     continue

Also applies to: 75-86, 101-109

log_analyzer/signatures/basic_info.py (2)

221-227: Avoid KeyError for checked_in_at

checked_in_at may be absent; use get with default.

-                        last_contacted=self.format_time(host["checked_in_at"]),
+                        last_contacted=self.format_time(host.get("checked_in_at", "")),

298-306: Prefer plain strings over json.dumps for interface fields

json.dumps adds quotes/brackets; format readable strings instead.

-            interfaces_details["mac_address"].append(
-                json.dumps(interface.get("mac_address"))
-            )
-            interfaces_details["ipv4_addresses"].append(
-                json.dumps(interface.get("ipv4_addresses", []))
-            )
-            interfaces_details["ipv6_addresses"].append(
-                json.dumps(interface.get("ipv6_addresses", []))
-            )
+            interfaces_details["mac_address"].append(
+                str(interface.get("mac_address", ""))
+            )
+            interfaces_details["ipv4_addresses"].append(
+                ", ".join(interface.get("ipv4_addresses", []))
+            )
+            interfaces_details["ipv6_addresses"].append(
+                ", ".join(interface.get("ipv6_addresses", []))
+            )
log_analyzer/signatures/error_detection.py (2)

153-172: Also scan kube-apiserver-*.log variant (path parity with networking signature)

Covers both bootstrap-control-plane/kube-apiserver.log and kube-apiserver-*.log.

-        new_logs_path = f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log"
-        old_logs_path = f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log"
-        for path in (new_logs_path, old_logs_path):
+        paths = (
+            f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log",
+            f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log",
+            f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/kube-apiserver-*.log",
+            f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/kube-apiserver-*.log",
+        )
+        for path in paths:

23-31: Remove unused helper _search_patterns_in_string It’s defined but never used; delete to clean up dead code.

log_analyzer/signatures/performance.py (1)

246-250: Avoid double regex work per event

Compute duration once in the generator.

-        return (
-            (event, get_duration(event))
-            for event in events
-            if get_duration(event) is not None
-        )
+        return (
+            (event, duration)
+            for event in events
+            if (duration := get_duration(event)) is not None
+        )
log_analyzer/signatures/advanced_analysis.py (2)

242-249: Make node condition formatting resilient to missing keys

Avoid KeyError when reason/message/status absent.

-                def get_by_type(t, node):
+                def get_by_type(t, node):
                     conds = node.get("status", {}).get("conditions", [])
 
                     c = next((c for c in conds if c.get("type") == t), None)
                     if not c:
                         return "(Condition not found)"
-                    return f"Status {c['status']} with reason {c['reason']}, message {c['message']}"
+                    return (
+                        f"Status {c.get('status', '?')} "
+                        f"with reason {c.get('reason', '?')}, "
+                        f"message {c.get('message', '')}"
+                    )

407-413: Default conditions to list, not dict

Prevents accidental iteration over dict keys.

-                ready = [
+                ready = [
                     condition.get("status") == "True"
-                    for condition in controller_pod.get("status", {}).get(
-                        "conditions", {}
-                    )
+                    for condition in controller_pod.get("status", {}).get(
+                        "conditions", []
+                    )
                     if condition.get("type") == "Ready"
                 ][0]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 986abb7 and fc76c98.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/api_client.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🧰 Additional context used
🧬 Code graph analysis (11)
log_analyzer/__init__.py (3)
log_analyzer/api_client.py (1)
  • AssistedInstallerAPIClient (14-78)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
server.py (2)
log_analyzer/main.py (2)
  • main (101-169)
  • analyze_cluster (26-84)
metrics/metrics.py (1)
  • track_tool_usage (51-65)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_host_log_file (134-165)
  • get_journal_log (167-197)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
log_analyzer/main.py (1)
log_analyzer/api_client.py (2)
  • AssistedInstallerAPIClient (14-78)
  • download_logs (58-78)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/basic_info.py (6)
  • analyze (20-81)
  • analyze (125-162)
  • analyze (168-205)
  • analyze (211-249)
  • analyze (255-286)
  • analyze (314-366)
log_analyzer/signatures/error_detection.py (1)
  • analyze (37-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (12)
Dockerfile (1)

17-17: LGTM!

The addition of the log_analyzer directory to the Docker image is correctly placed and consistent with the other package COPY statements.

pyproject.toml (2)

18-19: LGTM!

The new dependencies nestedarchive and tabulate are appropriate for the log analyzer functionality. Based on learnings, these are stable packages suitable for processing nested archives and formatting tabular output.


25-25: No changes required. mypy 1.18.2 is the current release and has no known security advisories.

template.yaml (1)

46-48: LGTM!

The new ENABLE_TROUBLESHOOTING_TOOLS parameter and corresponding environment variable are correctly configured. The default value of "true" enables the feature by default, which is appropriate for a troubleshooting capability.

Also applies to: 107-108

.github/workflows/mypy.yaml (1)

22-22: LGTM!

Migrating to config-file-based mypy invocation is a best practice that centralizes type-checking configuration and ensures consistency between CI and local development.

Makefile (3)

58-58: LGTM!

Aligning the check-types target with the centralized mypy configuration in pyproject.toml ensures consistency across CI and local development environments.


60-60: Note: Removed docstyle from verify target.

The PR description indicates that pydocstyle was disabled as "very low value." While this is a reasonable decision for some projects, docstrings do provide value for API documentation and maintainability. Consider whether selective docstring enforcement (e.g., only for public APIs) might be a middle ground.


64-64: LGTM!

Adding ruff check --fix to the format target is a helpful improvement that automatically fixes linting issues during the formatting pass.

server.py (2)

41-43: LGTM!

The environment variable handling for ENABLE_TROUBLESHOOTING_TOOLS is correctly implemented with a sensible default and proper boolean conversion.


1049-1051: LGTM!

The conditional registration of the troubleshooting tool is correctly implemented and aligns with the environment-based feature flag approach.

log_analyzer/__init__.py (1)

1-17: LGTM!

The package API is well-defined with appropriate exports. The use of __all__ ensures a clean public interface, and the version follows semantic versioning conventions.

log_analyzer/signatures/advanced_analysis.py (1)

80-87: get_all_cluster_events returns the full event list; partition_cluster_events will correctly separate multiple installation attempts.

Comment thread log_analyzer/api_client.py Outdated
Comment on lines +17 to +21
def __init__(
self,
base_url: str = "https://api.openshift.com",
auth_token: Optional[str] = None,
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit HTTP timeouts and validate presigned URL field.

Requests without timeouts can hang indefinitely; also guard against missing 'url' in the response.

Apply this diff (uses a sane default timeout):

 class AssistedInstallerAPIClient:
@@
-    def __init__(
+    def __init__(
         self,
         base_url: str = "https://api.openshift.com",
         auth_token: Optional[str] = None,
-    ):
+        timeout: float | tuple[float, float] = (10.0, 120.0),
+    ):
@@
         self.base_url = base_url.rstrip("/")
         self.auth_token = auth_token
         self.session = requests.Session()
+        self.timeout = timeout
@@
-        response = self.session.get(url, params=params)
+        response = self.session.get(url, params=params, timeout=self.timeout)
         response.raise_for_status()
@@
-        download_info = self.get_logs_download_url(cluster_id)
-        logs_url = download_info["url"]
+        download_info = self.get_logs_download_url(cluster_id)
+        logs_url = download_info.get("url")
+        if not logs_url:
+            raise ValueError("Presigned URL response missing 'url' field")

Based on learnings: always set explicit timeouts with requests for reliability.

Also applies to: 29-35, 53-55, 71-78

Comment thread log_analyzer/log_analyzer.py
Comment thread log_analyzer/main.py Outdated
Comment on lines +26 to +31
def analyze_cluster(
cluster_id: str,
auth_token: Optional[str] = None,
specific_signatures: Optional[List[str]] = None,
) -> List[SignatureResult]:
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix: --api-url is parsed but ignored; pass it through to the API client.

Currently, AssistedInstallerAPIClient is always constructed with the default base URL, making --api-url ineffective. Wire the value through analyze_cluster and into the client.

Apply this diff:

 def analyze_cluster(
     cluster_id: str,
     auth_token: Optional[str] = None,
-    specific_signatures: Optional[List[str]] = None,
+    specific_signatures: Optional[List[str]] = None,
+    api_url: str = "https://api.openshift.com",
 ) -> List[SignatureResult]:
@@
-    Args:
+    Args:
         cluster_id: UUID of the cluster to analyze
         auth_token: Authentication token for API access
         specific_signatures: List of specific signature names to run (None for all)
+        api_url: Base URL for the OpenShift API
@@
-    api_client = AssistedInstallerAPIClient(auth_token=auth_token)
+    api_client = AssistedInstallerAPIClient(base_url=api_url, auth_token=auth_token)
@@
-        results = analyze_cluster(
+        results = analyze_cluster(
             cluster_id=args.cluster_id,
             auth_token=args.auth_token,
             specific_signatures=args.signatures,
+            api_url=args.api_url,
         )

Also applies to: 46-47, 116-121, 151-157

🤖 Prompt for AI Agents
In log_analyzer/main.py around lines 26-31 (and also adjust lines 46-47,
116-121, 151-157), the --api-url CLI flag is parsed but never forwarded to
AssistedInstallerAPIClient; update analyze_cluster to accept an optional
api_url: Optional[str] parameter, pass that api_url into the
AssistedInstallerAPIClient constructor inside analyze_cluster, and propagate the
new parameter through all callers (update the CLI invocation and any other calls
at the indicated line ranges) so the client is constructed with the provided
base URL instead of the default.

Comment thread log_analyzer/signatures/basic_info.py
Comment thread log_analyzer/signatures/basic_info.py
Comment thread log_analyzer/signatures/networking.py
Comment thread log_analyzer/signatures/platform_specific.py
Comment thread server.py
Copy link
Copy Markdown
Collaborator

@carbonin carbonin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the approach. We can try to address the linting stuff in a followup. Mainly we should address the coderabbit comments that affect actual functionality (of which there are a few) and ensure the API url matches between where the MCP server is pointing to and the one in the analyzer tool.

Comment thread log_analyzer/signatures/base.py Outdated

header = f"=== {self.title} ==="
if self.severity == "error":
header = f"🔴 {header}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these icons useful for the model? Maybe it would be better to add the severity as text?

Comment thread log_analyzer/signatures/basic_info.py
Comment thread log_analyzer/signatures/basic_info.py
Comment thread server.py Outdated
Comment thread server.py Outdated
@keitwb keitwb force-pushed the log-analyzer branch 2 times, most recently from b1723fd to 50f40ee Compare October 13, 2025 13:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Dockerfile (1)

1-1: Python version mismatch will break builds (image uses 3.11; pyproject requires >=3.13).

Container is based on Python 3.11 but pyproject.toml requires >=3.13. uv sync/install will likely fail. Align versions.

Fix option A (recommended): lower project requirement to 3.11+

- requires-python = ">=3.13"
+ requires-python = ">=3.11"
pyproject.toml (1)

6-6: Align Python requirement with runtime (Dockerfile is Python 3.11).

Set requires-python to >=3.11 to match the base image and avoid uv sync failures.

-requires-python = ">=3.13"
+requires-python = ">=3.11"
♻️ Duplicate comments (2)
log_analyzer/log_analyzer.py (1)

55-75: Fix event loading semantics and guard missing install_started_at.

  • _clean_metadata_json should handle missing cluster.install_started_at safely.
  • get_all_cluster_events must return all events; selecting last partition belongs only in get_last_install_cluster_events.

Apply:

     @staticmethod
     def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        installation_start_time = dateutil.parser.isoparse(
-            md["cluster"]["install_started_at"]
-        )
+        installation_started_at = md.get("cluster", {}).get("install_started_at")
+        if not installation_started_at:
+            return md
+        installation_start_time = dateutil.parser.isoparse(installation_started_at)
@@
-    def get_last_install_cluster_events(self) -> List[Dict[str, Any]]:
+    def get_last_install_cluster_events(self) -> List[Dict[str, Any]]:
         """Get the cluster installation events for the most recent attempt."""
         try:
-            all_events = self.get_all_cluster_events()
-
-            # Get the last partition (latest installation attempt)
-            events = self.partition_cluster_events(all_events)[-1]
+            all_events = self.get_all_cluster_events()
+            events = self.partition_cluster_events(all_events)[-1]
         except Exception as e:
             logger.error("Failed to load cluster events: %s", e)
             return []
@@
-    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
+    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
         """Get all the cluster installation events."""
-        if self._cluster_events is None:
+        if getattr(self, "_all_events", None) is None:
             try:
                 events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
-
-                # Get the last partition (latest installation attempt)
-                self._cluster_events = self.partition_cluster_events(all_events)[-1]
+                self._all_events = json.loads(cast(str | bytes, events_content))
             except Exception as e:
                 logger.error("Failed to load cluster events: %s", e)
-                self._cluster_events = []
-
-        return self._cluster_events
+                self._all_events = []
+        return self._all_events  # type: ignore[attr-defined]

Also applies to: 77-104

log_analyzer/signatures/networking.py (1)

41-51: Guard machine_networks and skip invalid gateways (prevents spurious failures)

Indexing cluster["machine_networks"][0]["cidr"] can raise on missing/empty data. Also, parsing invalid gateway IPs should not abort the analysis.

-            # The first one is the machine_cidr that will be used for network configuration
-            machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"])
+            # The first entry (if present) is used for network configuration
+            mn = cluster.get("machine_networks") or []
+            if not mn or not mn[0].get("cidr"):
+                return None
+            machine_cidr = ipaddress.ip_network(mn[0]["cidr"])
 
-            for route in inventory.get("routes", []):
+            for route in inventory.get("routes", []):
                 # currently only relevant for ipv4
-                if (
-                    route.get("destination") == "0.0.0.0"
-                    and route.get("gateway")
-                    and ipaddress.ip_address(route["gateway"]) in machine_cidr
-                ):
-                    return None
+                try:
+                    if (
+                        route.get("destination") == "0.0.0.0"
+                        and route.get("gateway")
+                        and ipaddress.ip_address(route["gateway"]) in machine_cidr
+                    ):
+                        return None
+                except ValueError:
+                    # Ignore invalid gateway values
+                    continue
🧹 Nitpick comments (4)
log_analyzer/signatures/basic_info.py (2)

299-307: Avoid json.dumps in interface fields; produces quoted strings in output

json.dumps adds quotes and brackets. Use plain strings and join lists for readability.

-            interfaces_details["mac_address"].append(
-                json.dumps(interface.get("mac_address"))
-            )
-            interfaces_details["ipv4_addresses"].append(
-                json.dumps(interface.get("ipv4_addresses", []))
-            )
-            interfaces_details["ipv6_addresses"].append(
-                json.dumps(interface.get("ipv6_addresses", []))
-            )
+            interfaces_details["mac_address"].append(interface.get("mac_address", "") or "")
+            interfaces_details["ipv4_addresses"].append(
+                ", ".join(interface.get("ipv4_addresses", []))
+            )
+            interfaces_details["ipv6_addresses"].append(
+                ", ".join(interface.get("ipv6_addresses", []))
+            )

181-190: Prefer “N/A” over empty strings for missing timestamps

Empty strings render poorly. Use “N/A” defaults; format_time already returns the input if parsing fails.

-                "Installation Started At": self.format_time(
-                    cluster.get("install_started_at", "")
-                ),
-                "Failed On": self.format_time(cluster.get("status_updated_at", "")),
+                "Installation Started At": self.format_time(
+                    cluster.get("install_started_at", "N/A")
+                ),
+                "Failed On": self.format_time(cluster.get("status_updated_at", "N/A")),
log_analyzer/signatures/networking.py (1)

106-112: Tighten VIP matching to avoid substring false positives

Using if vip in ip_addr can match substrings (e.g., 10.0.0.1 in 110.0.0.10). Prefer IP-boundary-aware regex or parse addresses from ip-addr output.

Option A (quick): for IPv4, use word boundaries; for IPv6, use hex/colon boundaries.

import ipaddress, re
def vip_in_text(vip: str, text: str) -> bool:
    try:
        ip = ipaddress.ip_address(vip)
        if isinstance(ip, ipaddress.IPv4Address):
            pat = re.compile(rf"(?<!\d){re.escape(vip)}(?!\d)")
        else:
            pat = re.compile(rf"(?<![0-9a-fA-F:]){re.escape(vip)}(?![0-9a-fA-F:])")
        return bool(pat.search(text))
    except ValueError:
        return vip in text  # fallback

Replace checks:

  • if vip in ip_addr -> if vip_in_text(vip, ip_addr)
  • Same for bootstrap and OLD paths.

Please verify against a few real ip-addr samples to confirm no regressions.

Also applies to: 123-131, 137-141, 147-151

log_analyzer/signatures/performance.py (1)

246-250: Avoid double computation in _get_fio_events

get_duration(event) is called twice per event. Compute once and reuse.

-        return (
-            (event, get_duration(event))
-            for event in events
-            if get_duration(event) is not None
-        )
+        def gen():
+            for event in events:
+                duration = get_duration(event)
+                if duration is not None:
+                    yield event, duration
+        return gen()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc76c98 and 50f40ee.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
  • log_analyzer/signatures/init.py
  • .github/workflows/mypy.yaml
  • log_analyzer/signatures/base.py
  • log_analyzer/signatures/platform_specific.py
  • log_analyzer/signatures/error_detection.py
  • template.yaml
  • server.py
🧰 Additional context used
🧬 Code graph analysis (7)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
  • sanitize_exceptions (24-58)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
  • get_cluster_logs (161-174)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (2)
  • SignatureResult (15-48)
  • analyze (59-68)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (3)
Dockerfile (1)

17-17: LGTM: include log_analyzer in image.

COPY of the new package is correct.

log_analyzer/__init__.py (1)

7-15: Public API surface looks good.

Re-exports and all are precise and useful.

Makefile (1)

57-61: Type-check invocation via config looks good.

Using --config-file aligns with pyproject settings; verify target order is sensible.

Comment thread log_analyzer/main.py
Comment thread log_analyzer/signatures/advanced_analysis.py
Comment on lines +30 to +43
inventory = json.loads(host["inventory"])
if host.get("bootstrap", False):
role = "bootstrap"

hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard JSON decode and missing keys to avoid aborting the whole signature

A single bad/missing inventory JSON will throw and cause the signature to return None. Parse defensively and default to {} so the table can still render for other hosts.

-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
                 if host.get("bootstrap", False):
                     role = "bootstrap"
 
                 hosts.append(
                     OrderedDict(
                         id=host["id"],
                         hostname=log_analyzer.get_hostname(host),
                         progress=host["progress"]["current_stage"],
                         status=host["status"],
                         role=role,
-                        boot_mode=inventory.get("boot", {}).get(
+                        boot_mode=inventory.get("boot", {}).get(
                             "current_boot_mode", "N/A"
                         ),
                         status_info=str(info),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inventory = json.loads(host["inventory"])
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
try:
inventory = json.loads(host.get("inventory", "{}"))
except json.JSONDecodeError:
inventory = {}
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
status_info=str(info),
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 30 to 43, the code calls
json.loads(host["inventory"]) and indexes host keys directly which will raise on
missing or invalid inventory and abort the signature; wrap the inventory parse
in a try/except (catch json.JSONDecodeError and ValueError) and default
inventory = {} on failure (or use host.get("inventory", "{}") as the input), and
replace direct indexing of host keys with .get(..., default) or safe nested
.get(...) calls (e.g., progress = host.get("progress", {}).get("current_stage",
"N/A"), status = host.get("status", "N/A"), and boot_mode =
inventory.get("boot", {}).get("current_boot_mode", "N/A")); ensure role has a
sensible default when host.get("bootstrap") is missing so a bad inventory does
not cause the whole signature to return None.

Comment on lines +220 to +236
inventory = json.loads(host["inventory"])
hosts.append(
OrderedDict(
id=host["id"],
hostname=inventory["hostname"],
requested_hostname=host.get("requested_hostname", "N/A"),
last_contacted=self.format_time(host["checked_in_at"]),
installation_disk=host.get("installation_disk_path", "N/A"),
product_name=inventory["system_vendor"].get(
"product_name", "Unavailable"
),
manufacturer=inventory["system_vendor"].get(
"manufacturer", "Unavailable"
),
virtual_host=inventory["system_vendor"].get("virtual", False),
disks_count=len(inventory["disks"]),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden host extra details: robust inventory parsing and safe lookups

Direct indexing (inventory["hostname"], inventory["system_vendor"], host["checked_in_at"], inventory["disks"]) can KeyError and drop the entire signature.

-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
+                system_vendor = inventory.get("system_vendor", {})
                 hosts.append(
                     OrderedDict(
                         id=host["id"],
-                        hostname=inventory["hostname"],
+                        hostname=log_analyzer.get_hostname(host),
                         requested_hostname=host.get("requested_hostname", "N/A"),
-                        last_contacted=self.format_time(host["checked_in_at"]),
+                        last_contacted=self.format_time(host.get("checked_in_at", "")),
                         installation_disk=host.get("installation_disk_path", "N/A"),
-                        product_name=inventory["system_vendor"].get(
-                            "product_name", "Unavailable"
-                        ),
-                        manufacturer=inventory["system_vendor"].get(
-                            "manufacturer", "Unavailable"
-                        ),
-                        virtual_host=inventory["system_vendor"].get("virtual", False),
-                        disks_count=len(inventory["disks"]),
+                        product_name=system_vendor.get("product_name", "Unavailable"),
+                        manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+                        virtual_host=system_vendor.get("virtual", False),
+                        disks_count=len(inventory.get("disks", [])),
                     )
                 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inventory = json.loads(host["inventory"])
hosts.append(
OrderedDict(
id=host["id"],
hostname=inventory["hostname"],
requested_hostname=host.get("requested_hostname", "N/A"),
last_contacted=self.format_time(host["checked_in_at"]),
installation_disk=host.get("installation_disk_path", "N/A"),
product_name=inventory["system_vendor"].get(
"product_name", "Unavailable"
),
manufacturer=inventory["system_vendor"].get(
"manufacturer", "Unavailable"
),
virtual_host=inventory["system_vendor"].get("virtual", False),
disks_count=len(inventory["disks"]),
)
try:
inventory = json.loads(host.get("inventory", "{}"))
except json.JSONDecodeError:
inventory = {}
system_vendor = inventory.get("system_vendor", {})
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
requested_hostname=host.get("requested_hostname", "N/A"),
last_contacted=self.format_time(host.get("checked_in_at", "")),
installation_disk=host.get("installation_disk_path", "N/A"),
product_name=system_vendor.get("product_name", "Unavailable"),
manufacturer=system_vendor.get("manufacturer", "Unavailable"),
virtual_host=system_vendor.get("virtual", False),
disks_count=len(inventory.get("disks", [])),
)
)
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 220 to 236, the code
assumes inventory and several nested keys always exist which can raise
KeyError/TypeError and abort the signature; instead, wrap
json.loads(host.get("inventory", "{}")) in a try/except (fall back to {} on
parse error), use .get with sensible defaults for top-level lookups (hostname,
system_vendor, disks) and for nested system_vendor fields (product_name,
manufacturer, virtual), verify types (e.g., ensure disks is a list before taking
len, else 0), and guard host["checked_in_at"] via host.get(...) so format_time
is only called when a value exists (fallback like "N/A"); update the OrderedDict
fields to use these safe lookups and defaults so missing or malformed inventory
data cannot raise exceptions.

Comment on lines +323 to +341
inventory = json.loads(host["inventory"])
disks = inventory["disks"]

disks_details = {
"type": [],
"bootable": [],
"name": [],
"path": [],
"by-path": [],
}

for d in disks:
disk_type = d.get("drive_type", "Not available")
disks_details["type"].append(disk_type)
disks_details["bootable"].append(str(d.get("bootable", False)))
disks_details["name"].append(d.get("name", "Not available"))
disks_details["path"].append(d.get("path", "Not available"))
disks_details["by-path"].append(d.get("by_path", "Not available"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard missing/invalid inventory and disks list to prevent KeyError

Directly indexing inventory["disks"] can fail; parse safely and default to [].

-                inventory = json.loads(host["inventory"])
-                disks = inventory["disks"]
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
+                disks = inventory.get("disks", [])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inventory = json.loads(host["inventory"])
disks = inventory["disks"]
disks_details = {
"type": [],
"bootable": [],
"name": [],
"path": [],
"by-path": [],
}
for d in disks:
disk_type = d.get("drive_type", "Not available")
disks_details["type"].append(disk_type)
disks_details["bootable"].append(str(d.get("bootable", False)))
disks_details["name"].append(d.get("name", "Not available"))
disks_details["path"].append(d.get("path", "Not available"))
disks_details["by-path"].append(d.get("by_path", "Not available"))
try:
inventory = json.loads(host.get("inventory", "{}"))
except json.JSONDecodeError:
inventory = {}
disks = inventory.get("disks", [])
disks_details = {
"type": [],
"bootable": [],
"name": [],
"path": [],
"by-path": [],
}
for d in disks:
disk_type = d.get("drive_type", "Not available")
disks_details["type"].append(disk_type)
disks_details["bootable"].append(str(d.get("bootable", False)))
disks_details["name"].append(d.get("name", "Not available"))
disks_details["path"].append(d.get("path", "Not available"))
disks_details["by-path"].append(d.get("by_path", "Not available"))
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 323 to 341, the code
assumes host["inventory"] exists and that inventory["disks"] is present and a
list; this can raise KeyError or TypeError. Safely load inventory using
host.get("inventory", "{}") and wrap json.loads in try/except (or default to {}
on failure), then obtain disks with inventory.get("disks", []) and coerce to an
empty list if it's None or not a list; proceed to iterate over that safe disks
list so missing/invalid inventory or disks won't crash the code.

Comment thread service_client/assisted_service_api.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (9)
log_analyzer/log_analyzer.py (2)

90-103: Method name misleading: get_all_cluster_events returns only last partition.

The method is named get_all_cluster_events but line 98 actually returns only the last partition (latest installation attempt), not all events. This contradicts the docstring "Get all the cluster installation events" and could confuse callers expecting all events.

Consider this approach to fix the semantic mismatch:

+    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
+        """Get all the cluster installation events (all attempts)."""
+        if self._all_events is None:
+            try:
+                events_content = self.logs_archive.get("cluster_events.json")
+                self._all_events = json.loads(cast(str | bytes, events_content))
+            except Exception as e:
+                logger.error("Failed to load cluster events: %s", e)
+                self._all_events = []
+        return self._all_events
+
-    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
-        """Get all the cluster installation events."""
+    def _get_last_partition_events(self) -> List[Dict[str, Any]]:
+        """Get events from the last installation partition only."""
         if self._cluster_events is None:
             try:
-                events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
+                all_events = self.get_all_cluster_events()

                 # Get the last partition (latest installation attempt)
                 self._cluster_events = self.partition_cluster_events(all_events)[-1]
             except Exception as e:
                 logger.error("Failed to load cluster events: %s", e)
                 self._cluster_events = []

         return self._cluster_events

Then update get_last_install_cluster_events (line 80) to call self._get_last_partition_events() instead of re-partitioning.


55-75: Guard against missing install_started_at field.

Line 58-60 assumes md["cluster"]["install_started_at"] exists without checking. If this field is missing or null, the code will raise a KeyError or pass None to isoparse, causing a crash.

Apply this diff to add a safe fallback:

 @staticmethod
 def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
     """Clean metadata JSON by separating deleted hosts."""
-    installation_start_time = dateutil.parser.isoparse(
-        md["cluster"]["install_started_at"]
-    )
+    installation_started_at = md.get("cluster", {}).get("install_started_at")
+    if not installation_started_at:
+        # Cannot partition deleted hosts without installation start time
+        return md
+    installation_start_time = dateutil.parser.isoparse(installation_started_at)
server.py (1)

1018-1027: Add error handling for robustness.

The function lacks error handling around analyze_cluster, which could cause the MCP tool to fail ungracefully if log download or analysis fails. Consider wrapping in a try-except to return user-friendly error messages.

Apply this diff to add error handling:

 @track_tool_usage()
 async def analyze_cluster_logs(
     cluster_id: Annotated[str, Field(description="The ID of the cluster")],
 ) -> str:
     """
     Analyze the cluster logs for the given cluster_id and return the results.
+    
+    Returns:
+        str: A formatted string containing signature analysis results.
     """
-    client = InventoryClient(get_access_token())
-    results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
-    return "\n\n".join([str(r) for r in results])
+    try:
+        client = InventoryClient(get_access_token())
+        results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
+        if not results:
+            return "No issues detected in cluster logs."
+        return "\n\n".join([str(r) for r in results])
+    except Exception as e:
+        log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True)
+        return f"Error analyzing cluster logs: {str(e)}"
service_client/assisted_service_api.py (1)

160-174: Security concern: Presigned URL logged with credentials.

Line 171 logs the full presigned URL, which includes query parameters containing temporary credentials. This could leak credentials in log aggregation systems.

Apply this diff to redact sensitive query parameters:

+from urllib.parse import urlparse
+
 @sanitize_exceptions
 async def get_cluster_logs(
     self, cluster_id: str
 ) -> nestedarchive.RemoteNestedArchive:
     result = await self._api_call(
         self._installer_api().v2_get_presigned_for_cluster_files,
         cluster_id=cluster_id,
         file_name="logs",
     )

     logs_url = cast(PresignedUrl, result).url
-    log.info("Downloading logs from %s", logs_url)
+    parsed = urlparse(cast(str, logs_url))
+    safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+    log.info("Downloading logs from %s", safe_url)
     return nestedarchive.RemoteNestedArchive(
         cast(str, logs_url), init_download=True
     )

Note: The import for urlparse is already present at line 12.

log_analyzer/signatures/basic_info.py (3)

30-49: Harden host parsing: guard inventory JSON and missing keys

A bad/missing host["inventory"] or missing progress/status will crash the signature. Parse defensively and use .get with defaults.

-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (ValueError, json.JSONDecodeError):
+                    inventory = {}
                 if host.get("bootstrap", False):
                     role = "bootstrap"
 
                 hosts.append(
                     OrderedDict(
                         id=host["id"],
                         hostname=log_analyzer.get_hostname(host),
-                        progress=host["progress"]["current_stage"],
-                        status=host["status"],
+                        progress=host.get("progress", {}).get("current_stage", "N/A"),
+                        status=host.get("status", "N/A"),
                         role=role,
                         boot_mode=inventory.get("boot", {}).get(
                             "current_boot_mode", "N/A"
                         ),
                         status_info=str(info),
                         logs_info=host.get("logs_info", ""),
-                        last_checked_in_at=self.format_time(
-                            host.get("checked_in_at", str(datetime.min))
-                        ),
+                        last_checked_in_at=self.format_time(host.get("checked_in_at", "")),
                     )
                 )

218-236: Harden HostsExtraDetailSignature: robust inventory parsing and safe lookups

This block can KeyError/JSONDecodeError; parse inventory safely and guard nested keys.

-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (ValueError, json.JSONDecodeError):
+                    inventory = {}
+                system_vendor = inventory.get("system_vendor", {})
                 hosts.append(
                     OrderedDict(
                         id=host["id"],
-                        hostname=inventory["hostname"],
+                        hostname=log_analyzer.get_hostname(host),
                         requested_hostname=host.get("requested_hostname", "N/A"),
-                        last_contacted=self.format_time(host["checked_in_at"]),
+                        last_contacted=self.format_time(host.get("checked_in_at", "")),
                         installation_disk=host.get("installation_disk_path", "N/A"),
-                        product_name=inventory["system_vendor"].get(
-                            "product_name", "Unavailable"
-                        ),
-                        manufacturer=inventory["system_vendor"].get(
-                            "manufacturer", "Unavailable"
-                        ),
-                        virtual_host=inventory["system_vendor"].get("virtual", False),
-                        disks_count=len(inventory["disks"]),
+                        product_name=system_vendor.get("product_name", "Unavailable"),
+                        manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+                        virtual_host=system_vendor.get("virtual", False),
+                        disks_count=len(inventory.get("disks", [])),
                     )
                 )

323-325: Guard missing/invalid inventory and disks list

inventory["disks"] can raise; default to [].

-                inventory = json.loads(host["inventory"])
-                disks = inventory["disks"]
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (ValueError, json.JSONDecodeError):
+                    inventory = {}
+                disks = inventory.get("disks", [])
log_analyzer/signatures/platform_specific.py (1)

38-39: Don’t abort signature on bootstrap host; skip it

Returning None drops other valid matches. Use continue.

-                if host.get("role") == "bootstrap":
-                    return None
+                if host.get("role") == "bootstrap":
+                    continue
log_analyzer/signatures/advanced_analysis.py (1)

176-193: Guard missing validation-name matches to avoid exceptions

Don’t assume .match() succeeds; skip when it doesn’t.

-                succeed_to_failing_counter = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.succeed_to_failing_regexp.match(event["message"])
-                )
+                succeed_to_failing_counter = Counter(
+                    m.groups()[0]
+                    for event in events
+                    if self.succeed_to_failing_regexp.match(event["message"])
+                    for m in [self.validation_name_regexp.match(event["message"])]
+                    if m is not None
+                )
 
-                now_fixed = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.now_fixed_regexp.match(event["message"])
-                )
+                now_fixed = Counter(
+                    m.groups()[0]
+                    for event in events
+                    if self.now_fixed_regexp.match(event["message"])
+                    for m in [self.validation_name_regexp.match(event["message"])]
+                    if m is not None
+                )
🧹 Nitpick comments (3)
log_analyzer/signatures/basic_info.py (1)

291-307: Safer interface extraction; avoid json.dumps in table fields

Parse inventory defensively; present addresses plainly instead of JSON-encoded strings.

-        inventory = json.loads(host["inventory"])
+        try:
+            inventory = json.loads(host.get("inventory", "{}"))
+        except (ValueError, json.JSONDecodeError):
+            inventory = {}
         interfaces_details = defaultdict(list)
 
         for interface in inventory.get("interfaces", []):
             name = interface.get("name")
             if not name:
                 continue
             interfaces_details["name"].append(name)
-            interfaces_details["mac_address"].append(
-                json.dumps(interface.get("mac_address"))
-            )
-            interfaces_details["ipv4_addresses"].append(
-                json.dumps(interface.get("ipv4_addresses", []))
-            )
-            interfaces_details["ipv6_addresses"].append(
-                json.dumps(interface.get("ipv6_addresses", []))
-            )
+            interfaces_details["mac_address"].append(
+                str(interface.get("mac_address", ""))
+            )
+            interfaces_details["ipv4_addresses"].append(
+                ", ".join(interface.get("ipv4_addresses", []))
+            )
+            interfaces_details["ipv6_addresses"].append(
+                ", ".join(interface.get("ipv6_addresses", []))
+            )
log_analyzer/signatures/error_detection.py (1)

23-31: Escape patterns when building combined regex

Without re.escape, metacharacters in patterns can change matching unintentionally.

-    combined_regex = re.compile(
-        f'({"|".join(fr".*{pattern}.*" for pattern in patterns)})'
-    )
+    escaped = [re.escape(p) for p in patterns]
+    combined_regex = re.compile(f'({"|".join(fr".*{p}.*" for p in escaped)})')
log_analyzer/signatures/advanced_analysis.py (1)

259-275: Try both bundle paths before emitting a warning

Currently returns a warning after the first empty nodes.json, skipping the fallback path. Continue to next base and only warn if both have no items.

-            if nodes_table:
-                return SignatureResult(
-                    signature_name=self.name,
-                    title="Collected nodes.json from installer gather",
-                    content=self.generate_table(nodes_table),
-                    severity="info",
-                )
-            return SignatureResult(
-                signature_name=self.name,
-                title="Collected nodes.json from installer gather",
-                content=(
-                    "The nodes.json file doesn't have any node resources in it. You should probably check the kubelet logs for the 2 non-bootstrap control-plane hosts"
-                ),
-                severity="warning",
-            )
-        return None
+            if nodes_table:
+                return SignatureResult(
+                    signature_name=self.name,
+                    title="Collected nodes.json from installer gather",
+                    content=self.generate_table(nodes_table),
+                    severity="info",
+                )
+            # No nodes found in this file; try the next base path
+            continue
+        # Neither path had node items; emit a warning
+        return SignatureResult(
+            signature_name=self.name,
+            title="Collected nodes.json from installer gather",
+            content=(
+                "The nodes.json file doesn't have any node resources in it. You should probably check the kubelet logs for the 2 non-bootstrap control-plane hosts"
+            ),
+            severity="warning",
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50f40ee and c5ef1b3.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • template.yaml
  • .github/workflows/mypy.yaml
  • log_analyzer/signatures/networking.py
  • log_analyzer/signatures/init.py
  • log_analyzer/signatures/performance.py
  • Makefile
🧰 Additional context used
🧬 Code graph analysis (9)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
server.py (3)
log_analyzer/main.py (1)
  • analyze_cluster (14-71)
metrics/metrics.py (1)
  • track_tool_usage (51-65)
service_client/assisted_service_api.py (1)
  • InventoryClient (27-561)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_host_log_file (134-165)
  • get_journal_log (167-197)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/basic_info.py (6)
  • analyze (20-81)
  • analyze (125-163)
  • analyze (169-206)
  • analyze (212-250)
  • analyze (256-287)
  • analyze (315-367)
log_analyzer/signatures/error_detection.py (1)
  • analyze (37-87)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
  • get_cluster_logs (161-174)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (2)
  • SignatureResult (15-48)
  • analyze (59-68)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
  • sanitize_exceptions (24-58)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (14)
pyproject.toml (2)

18-19: LGTM! Dependencies align with established versions.

The added dependencies (nestedarchive and tabulate) use versions that are the latest official releases on PyPI. Based on learnings, nestedarchive 0.2.4 (Aug 2022) and tabulate 0.9.0 (Oct 2022) are stable and appropriate for this use case.


81-89: LGTM! Mypy configuration is appropriate for gradual typing adoption.

The new mypy configuration excludes log_analyzer/ (as noted in the PR description) while enabling strict type checking for the rest of the codebase. This is a reasonable approach for integrating copied code that may need additional type annotation work.

Dockerfile (1)

17-17: LGTM! Docker layer appropriately includes the new package.

The new COPY instruction correctly bundles the log_analyzer directory into the container image, making the log analysis modules available at runtime.

server.py (2)

41-43: LGTM! Sensible default for troubleshooting feature gate.

The TROUBLESHOOTING_ENABLED flag defaults to disabled ("0"), which is appropriate for a new troubleshooting feature that should be explicitly opt-in.


1047-1048: LGTM! Appropriate conditional registration of troubleshooting tool.

The feature gate correctly enables the log analysis tool only when TROUBLESHOOTING_ENABLED is true, preventing accidental exposure of the troubleshooting feature in production.

log_analyzer/__init__.py (1)

1-15: LGTM! Clean package interface with appropriate public API surface.

The __init__.py properly defines the public API by re-exporting key classes and setting __all__. The version string "1.0.0" is appropriate for an initial release.

log_analyzer/main.py (2)

14-72: LGTM! Well-structured log analysis orchestration.

The analyze_cluster function properly:

  • Downloads logs via the API client
  • Handles signature filtering when specific signatures are requested
  • Includes per-signature error handling to prevent one failure from blocking others
  • Logs appropriate information at each step

74-85: LGTM! Clean result presentation.

The print_results function provides a clear, formatted output with appropriate handling of empty results.

log_analyzer/signatures/base.py (3)

15-48: LGTM! Clear and consistent result representation.

The SignatureResult class provides a clean interface for signature results with appropriate severity handling in the string representation.


51-83: LGTM! Solid base class with useful utilities.

The Signature abstract base class provides helpful utilities:

  • generate_table for consistent formatting
  • format_time with graceful fallback for parsing errors

86-120: LGTM! Appropriate specialization for error signatures.

The ErrorSignature class provides a convenient create_result helper that defaults to "error" severity, which is appropriate for error-focused signatures.

log_analyzer/log_analyzer.py (3)

134-165: LGTM! Robust path fallback for different log formats.

The get_host_log_file method properly handles both new and old log path formats with appropriate fallback logic.


167-197: LGTM! Consistent fallback pattern for journal logs.

The get_journal_log method follows the same robust pattern as get_host_log_file, handling both new and old path formats.


217-228: LGTM! Hostname resolution with sensible fallback.

The get_hostname static method provides a clear fallback chain: requested_hostname → inventory hostname → host ID → "unknown".

Comment on lines +407 to +421
ready = [
condition.get("status") == "True"
for condition in controller_pod.get("status", {}).get(
"conditions", {}
)
if condition.get("type") == "Ready"
][0]
except Exception:
ready = False
conditions_tbl = self.generate_table(
controller_pod.get("status", {}).get("conditions", [])
)
containers_tbl = self.generate_table(
controller_pod.get("status", {}).get("containerStatuses", [])
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use list default for conditions/containerStatuses to avoid type issues

Defaulting to {} leads to iterating keys and AttributeError on condition.get.

-                ready = [
-                    condition.get("status") == "True"
-                    for condition in controller_pod.get("status", {}).get(
-                        "conditions", {}
-                    )
+                ready = [
+                    condition.get("status") == "True"
+                    for condition in controller_pod.get("status", {}).get(
+                        "conditions", []
+                    )
                 ][0]
@@
-            conditions_tbl = self.generate_table(
-                controller_pod.get("status", {}).get("conditions", [])
-            )
-            containers_tbl = self.generate_table(
-                controller_pod.get("status", {}).get("containerStatuses", [])
-            )
+            conditions_tbl = self.generate_table(
+                controller_pod.get("status", {}).get("conditions", [])
+            )
+            containers_tbl = self.generate_table(
+                controller_pod.get("status", {}).get("containerStatuses", [])
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ready = [
condition.get("status") == "True"
for condition in controller_pod.get("status", {}).get(
"conditions", {}
)
if condition.get("type") == "Ready"
][0]
except Exception:
ready = False
conditions_tbl = self.generate_table(
controller_pod.get("status", {}).get("conditions", [])
)
containers_tbl = self.generate_table(
controller_pod.get("status", {}).get("containerStatuses", [])
)
ready = [
condition.get("status") == "True"
for condition in controller_pod.get("status", {}).get(
"conditions", []
)
if condition.get("type") == "Ready"
][0]
except Exception:
ready = False
conditions_tbl = self.generate_table(
controller_pod.get("status", {}).get("conditions", [])
)
containers_tbl = self.generate_table(
controller_pod.get("status", {}).get("containerStatuses", [])
)
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 407-421, the code
uses {} as the default for "conditions" (and should also use [] for
"containerStatuses") which causes iteration over dict keys and AttributeError
when calling condition.get; replace the defaults with empty lists (use
.get("conditions", []) and .get("containerStatuses", [])) so the list
comprehension iterates over a list and generate_table calls receive lists; keep
existing try/except behavior for empty results.

Comment thread log_analyzer/signatures/basic_info.py
Comment on lines +86 to +93
some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)

for host in hosts:
if host["role"] not in ["master", "bootstrap"]:
continue
if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard host summary against missing keys and progress

Direct indexing of host["progress"]["current_stage"], host["role"], host["status"] can raise. Use safe access.

-        some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)
+        some_done = any(
+            host.get("progress", {}).get("current_stage") == "Done" for host in hosts
+        )
 
         for host in hosts:
-            if host["role"] not in ["master", "bootstrap"]:
+            role = host.get("role")
+            if role not in ["master", "bootstrap"]:
                 continue
-            if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
+            current_stage = host.get("progress", {}).get("current_stage", "")
+            if current_stage == "Done" or host.get("status") != "error":
                 continue
 
-            host_comment = {
+            host_comment = {
                 "Waiting for bootkube": "bootkube.service never completed",
                 "Rebooting": "Node never pulled Ignition",
                 "Configuring": "Node pulled Ignition, but never started kubelet",
                 "Joined": (
                     "The Node k8s resource associated with this host is not Ready"
                     + (
                         " or the Assisted Controller is not running on the cluster"
                         if not some_done
                         else ""
                     )
                 ),
                 "Waiting for control plane": "Masters never formed 2-node cluster",
                 "Waiting for controller": "Assisted installer controller pod never started",
                 "Writing image to disk": "Image probably failed to be written on disk",
-            }.get(host["progress"]["current_stage"], "Unknown")
+            }.get(current_stage, "Unknown")
 
             host_summary.append(
                 OrderedDict(
                     hostname=log_analyzer.get_hostname(host),
-                    role=host["role"],
+                    role=role or "N/A",
                     description=host_comment,
                 )
             )

Also applies to: 111-117

🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 86-93 (and similarly
111-117), direct indexing like host["progress"]["current_stage"], host["role"],
and host["status"] can raise KeyError or TypeError when keys or progress are
missing; update these checks to use safe accessors (e.g., host.get("role"),
host.get("status"), and host.get("progress", {}).get("current_stage")) and guard
against non-dict progress values before comparing, defaulting to a safe value
(None or "") so the conditions become robust and skip hosts without the expected
keys.

Comment thread log_analyzer/signatures/error_detection.py
Comment on lines +31 to +37
inventory = json.loads(host.get("inventory", "{}"))
if (
len(inventory.get("disks", [])) == 1
and "KVM" in inventory.get("system_vendor", {}).get("product_name", "")
and host.get("progress", {}).get("current_stage") == "Rebooting"
and host.get("status") == "error"
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard inventory JSON and nested lookups

A bad/missing inventory will crash the loop. Parse safely.

-            inventory = json.loads(host.get("inventory", "{}"))
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (ValueError, json.JSONDecodeError):
+                inventory = {}
             if (
                 len(inventory.get("disks", [])) == 1
                 and "KVM" in inventory.get("system_vendor", {}).get("product_name", "")
                 and host.get("progress", {}).get("current_stage") == "Rebooting"
                 and host.get("status") == "error"
             ):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inventory = json.loads(host.get("inventory", "{}"))
if (
len(inventory.get("disks", [])) == 1
and "KVM" in inventory.get("system_vendor", {}).get("product_name", "")
and host.get("progress", {}).get("current_stage") == "Rebooting"
and host.get("status") == "error"
):
try:
inventory = json.loads(host.get("inventory", "{}"))
except (ValueError, json.JSONDecodeError):
inventory = {}
if (
len(inventory.get("disks", [])) == 1
and "KVM" in inventory.get("system_vendor", {}).get("product_name", "")
and host.get("progress", {}).get("current_stage") == "Rebooting"
and host.get("status") == "error"
):
🤖 Prompt for AI Agents
In log_analyzer/signatures/platform_specific.py around lines 31 to 37, the code
assumes host["inventory"] is valid JSON and that nested keys exist, which will
raise on bad/missing inventory; wrap json.loads in a try/except and fall back to
an empty dict on failure, ensure the parsed inventory is a dict before using
.get, and replace chained index/lookup assumptions with safe .get calls and type
checks (e.g., treat disks only if inventory.get("disks") is a list,
system_vendor as dict) so that on malformed/missing inventory you skip this
branch (continue) instead of crashing the loop.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pyproject.toml (2)

6-6: Fix Python version mismatch (breaks uv sync).

Base image runs Python 3.11, but requires-python is ">=3.13". uv sync will fail. Align with the runtime.

-requires-python = ">=3.13"
+requires-python = ">=3.11"

7-20: Add runtime dependency: python-dateutil.

log_analyzer imports dateutil.parser at runtime; only types are in dev. This will crash at import without python-dateutil installed.

 dependencies = [
     "assisted-service-client>=2.41.0.post3",
     "mcp>=1.15.0",
     "netaddr>=1.3.0",
     "requests>=2.32.3",
     "retry>=0.9.2",
     "types-requests>=2.32.4.20250611",
     "prometheus_client>=0.22.1",
     "pyyaml>=6",
     "jinja2>=3.1",
     "pydantic>=2",
     "nestedarchive>=0.2.4",
     "tabulate>=0.9.0",
+    "python-dateutil>=2.9.0",
 ]

As per retrieved learnings: python-dateutil 2.9.0 is stable and compatible with Python 3.11. [Based on learnings]

♻️ Duplicate comments (4)
server.py (1)

1018-1028: Add error handling and clearer output for analyze_cluster_logs.

Catch failures from analyze_cluster and return a user-friendly string. Also handle empty results.

 @track_tool_usage()
 async def analyze_cluster_logs(
     cluster_id: Annotated[str, Field(description="The ID of the cluster")],
 ) -> str:
     """
-    Analyze the cluster logs for the given cluster_id and return the results.
+    Analyze the cluster logs for the given cluster_id and return the results.
+
+    Returns:
+        str: A formatted string containing signature analysis results with severity indicators.
     """
-    client = InventoryClient(get_access_token())
-    results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
-    return "\n\n".join([str(r) for r in results])
+    try:
+        client = InventoryClient(get_access_token())
+        results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
+        if not results:
+            return "No issues detected in cluster logs."
+        return "\n".join(str(r) for r in results if str(r).strip())
+    except Exception as e:
+        log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True)
+        return f"Error analyzing cluster logs: {e}"
service_client/assisted_service_api.py (1)

160-175: Redact presigned URL in logs; keep full URL only for download.

Logging the full presigned URL leaks credentials. Log only scheme+host+path.

     result = await self._api_call(
         self._installer_api().v2_get_presigned_for_cluster_files,
         cluster_id=cluster_id,
         file_name="logs",
     )
 
-    logs_url = cast(PresignedUrl, result).url
-    log.info("Downloading logs from %s", logs_url)
-    return nestedarchive.RemoteNestedArchive(
-        cast(str, logs_url), init_download=True
-    )
+    logs_url = cast(PresignedUrl, result).url
+    parsed = urlparse(cast(str, logs_url))
+    safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+    log.info("Downloading logs from %s", safe_url)
+    return nestedarchive.RemoteNestedArchive(cast(str, logs_url), init_download=True)
log_analyzer/log_analyzer.py (2)

55-75: Guard missing install_started_at in metadata cleaning.

Current code assumes the key exists and may KeyError. Fall back safely.

-    def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
+    def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
         """Clean metadata JSON by separating deleted hosts."""
-        installation_start_time = dateutil.parser.isoparse(
-            md["cluster"]["install_started_at"]
-        )
+        installation_started_at = md.get("cluster", {}).get("install_started_at")
+        if not installation_started_at:
+            return md
+        installation_start_time = dateutil.parser.isoparse(installation_started_at)

90-104: Fix event loading semantics; return all events here.

get_all_cluster_events currently returns only the last partition; rename semantics or return all and partition downstream.

-    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
+    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
         """Get all the cluster installation events."""
-        if self._cluster_events is None:
+        if self._all_events is None:
             try:
                 events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
-
-                # Get the last partition (latest installation attempt)
-                self._cluster_events = self.partition_cluster_events(all_events)[-1]
+                self._all_events = json.loads(cast(str | bytes, events_content))
             except Exception as e:
                 logger.error("Failed to load cluster events: %s", e)
-                self._cluster_events = []
-
-        return self._cluster_events
+                self._all_events = []
+        return self._all_events
🧹 Nitpick comments (4)
server.py (1)

41-43: Make feature flag parsing more flexible (optional).

Accept common truthy values ("1", "true", "yes") to ease ops.

-TROUBLESHOOTING_ENABLED = (
-    os.environ.get("ENABLE_TROUBLESHOOTING_TOOLS", "0").lower() == "1"
-)
+TROUBLESHOOTING_ENABLED = os.environ.get(
+    "ENABLE_TROUBLESHOOTING_TOOLS", "0"
+).lower() in ("1", "true", "yes")
log_analyzer/log_analyzer.py (1)

34-37: Initialize cache for all events.

Prepare a separate cache for all events to avoid overloading _cluster_events with partitioned data.

-        self._metadata = None
-        self._cluster_events = None
+        self._metadata = None
+        self._all_events: List[Dict[str, Any]] | None = None
+        self._cluster_events = None
log_analyzer/signatures/base.py (1)

92-102: Remove unused attributes or document their purpose.

The _function_impact_label and _label attributes are stored but never referenced in create_result or anywhere else in the codebase. Either remove these unused parameters or document how subclasses should utilize them.

Apply this diff to remove the unused attributes:

-    def __init__(self, function_impact_label=None, label=None):
+    def __init__(self):
         """
         Initialize error signature.
-
-        Args:
-            function_impact_label: Optional function impact classification
-            label: Optional custom label for categorization
         """
         super().__init__()
-        self._function_impact_label = function_impact_label
-        self._label = label
log_analyzer/signatures/advanced_analysis.py (1)

309-321: Use safe key access for host ID.

Direct indexing of host["id"] on line 313 assumes the key always exists. Use .get("id") with a fallback to handle missing keys gracefully.

Apply this diff:

         for host in cluster.get("hosts", []):
-            host_id = host["id"]
+            host_id = host.get("id")
+            if not host_id:
+                continue
             try:
                 journal_logs = log_analyzer.get_host_log_file(host_id, "journal.logs")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5ef1b3 and 4aea738.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • Makefile
  • log_analyzer/signatures/init.py
  • .github/workflows/mypy.yaml
  • log_analyzer/main.py
  • log_analyzer/signatures/error_detection.py
🧰 Additional context used
🧬 Code graph analysis (9)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
  • sanitize_exceptions (24-58)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_host_log_file (134-165)
  • get_journal_log (167-197)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/basic_info.py (6)
  • analyze (20-81)
  • analyze (125-163)
  • analyze (169-206)
  • analyze (212-250)
  • analyze (256-287)
  • analyze (315-367)
log_analyzer/signatures/error_detection.py (1)
  • analyze (37-87)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
server.py (3)
log_analyzer/main.py (1)
  • analyze_cluster (14-71)
metrics/metrics.py (1)
  • track_tool_usage (51-65)
service_client/assisted_service_api.py (1)
  • InventoryClient (27-561)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (4)
Dockerfile (1)

17-17: Include analyzer in image — LGTM.

Copying the log_analyzer package into the image is correct and matches imports in server.py.

template.yaml (1)

46-49: Gating parameter and env wiring — LGTM.

Parameter ENABLE_TROUBLESHOOTING_TOOLS and its env passthrough correctly gate tool registration in server.py.

Also applies to: 107-108

log_analyzer/__init__.py (1)

1-15: Public API exports — LGTM.

Re-exporting LogAnalyzer, ALL_SIGNATURES, and SignatureResult simplifies usage.

log_analyzer/signatures/platform_specific.py (1)

67-73: Guard against None match result.

If agent_log_regex.search(agent_logs) returns None, accessing match.group(1) on line 73 will raise AttributeError. The conditional on line 73 checks if match is truthy but only after attempting to parse, not before calling .group().

Apply this diff to fix:

     def _get_inventory(self, log_analyzer, host_id):
         try:
             agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
         except FileNotFoundError:
             return None
         match = self.agent_log_regex.search(agent_logs)
-        return json.loads(match.group(1).replace('\\"', '"')) if match else None
+        if not match:
+            return None
+        return json.loads(match.group(1).replace('\\"', '"'))

Likely an incorrect or invalid review comment.

Comment on lines +133 to +136
if (
bootstrap_node["progress"]["current_stage"]
not in eligible_bootstrap_stages
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard nested progress access.

Direct indexing of bootstrap_node["progress"]["current_stage"] can raise KeyError if the progress key is missing or the value is not a dict.

Apply this diff to use safe access:

-            if (
-                bootstrap_node["progress"]["current_stage"]
-                not in eligible_bootstrap_stages
-            ):
+            current_stage = bootstrap_node.get("progress", {}).get("current_stage", "")
+            if current_stage not in eligible_bootstrap_stages:
                 return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (
bootstrap_node["progress"]["current_stage"]
not in eligible_bootstrap_stages
):
current_stage = bootstrap_node.get("progress", {}).get("current_stage", "")
if current_stage not in eligible_bootstrap_stages:
return None
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 133 to 136, the
code directly indexes bootstrap_node["progress"]["current_stage"], which can
raise KeyError or TypeError if progress is missing or not a dict; change the
check to safely access progress by first retrieving progress =
bootstrap_node.get("progress") and confirming it's a dict (or defaulting to an
empty dict), then read current_stage = progress.get("current_stage") (or None)
and use that variable in the if condition so missing or non-dict progress won’t
cause an exception.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2025
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Oct 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, keitwb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@carbonin
Copy link
Copy Markdown
Collaborator

/retest

@carbonin
Copy link
Copy Markdown
Collaborator

carbonin commented Oct 13, 2025

Something in the container build is not happy:

         Compiling pydantic-core v2.33.2
      (/opt/app-root/src/.cache/uv/sdists-v9/pypi/pydantic-core/2.33.2/v5IoqqZTLjjZSmoT1jThD/src)
      error: failed to run custom build command for `pyo3-ffi v0.24.1`

      Caused by:
        process didn't exit successfully:
      `/opt/app-root/src/.cache/uv/sdists-v9/pypi/pydantic-core/2.33.2/v5IoqqZTLjjZSmoT1jThD/src/target/release/build/pyo3-ffi-c8d07e87bb1f672f/build-script-build`
      (exit status: 1)
        --- stdout
        cargo:rustc-check-cfg=cfg(Py_LIMITED_API)
        cargo:rustc-check-cfg=cfg(Py_GIL_DISABLED)
        cargo:rustc-check-cfg=cfg(PyPy)
        cargo:rustc-check-cfg=cfg(GraalPy)
        cargo:rustc-check-cfg=cfg(py_sys_config, values("Py_DEBUG",
      "Py_REF_DEBUG", "Py_TRACE_REFS", "COUNT_ALLOCS"))
        cargo:rustc-check-cfg=cfg(pyo3_disable_reference_pool)
        cargo:rustc-check-cfg=cfg(pyo3_leak_on_drop_without_reference_pool)
        cargo:rustc-check-cfg=cfg(Py_3_7)
        cargo:rustc-check-cfg=cfg(Py_3_8)
        cargo:rustc-check-cfg=cfg(Py_3_9)
        cargo:rustc-check-cfg=cfg(Py_3_10)
        cargo:rustc-check-cfg=cfg(Py_3_11)
        cargo:rustc-check-cfg=cfg(Py_3_12)
        cargo:rustc-check-cfg=cfg(Py_3_13)
        cargo:rustc-check-cfg=cfg(Py_3_14)
        cargo:rerun-if-env-changed=PYO3_CROSS
        cargo:rerun-if-env-changed=PYO3_CROSS_LIB_DIR
        cargo:rerun-if-env-changed=PYO3_CROSS_PYTHON_VERSION
        cargo:rerun-if-env-changed=PYO3_CROSS_PYTHON_IMPLEMENTATION
        cargo:rerun-if-env-changed=PYO3_PRINT_CONFIG
        cargo:rerun-if-env-changed=PYO3_USE_ABI3_FORWARD_COMPATIBILITY

        --- stderr
        error: the configured Python interpreter version (3.14) is newer than
      PyO3's maximum supported version (3.13)
        = help: please check if an updated version of PyO3 is available.
      Current version: 0.24.1
        = help: set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 to suppress this
      check and build anyway using the stable ABI
      warning: build failed, waiting for other jobs to finish...
      💥 maturin failed
        Caused by: Failed to build a native library through cargo
        Caused by: Cargo build finished with "exit status:
      101": `env -u CARGO PYO3_BUILD_EXTENSION_MODULE="1"
      PYO3_ENVIRONMENT_SIGNATURE="cpython-3.14-64bit"
      PYO3_PYTHON="/opt/app-root/src/.cache/uv/builds-v0/.tmpMeQG0Q/bin/python"
      PYTHON_SYS_EXECUTABLE="/opt/app-root/src/.cache/uv/builds-v0/.tmpMeQG0Q/bin/python"
      "cargo" "rustc" "--features" "pyo3/extension-module"
      "--message-format" "json-render-diagnostics" "--manifest-path"
      "/opt/app-root/src/.cache/uv/sdists-v9/pypi/pydantic-core/2.33.2/v5IoqqZTLjjZSmoT1jThD/src/Cargo.toml"
      "--release" "--lib" "--crate-type" "cdylib"`
      Error: command ['maturin', 'pep517', 'build-wheel', '-i',
      '/opt/app-root/src/.cache/uv/builds-v0/.tmpMeQG0Q/bin/python',
      '--compatibility', 'off'] returned non-zero exit status 1

      hint: This usually indicates a problem with the package or the build
      environment.
  help: `pydantic-core` (v2.33.2) was included because `assisted-service-mcp`
        (v0.1.0) depends on `pydantic` (v2.11.10) which depends on
        `pydantic-core`
subprocess exited with status 1
subprocess exited with status 1
Error: building at STEP "RUN uv sync": exit status 1

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server.py (1)

861-869: Avoid logging full presigned URLs (sensitive).

result likely contains the URL; log a redacted form.

Apply:

-    log.info(
-        "Successfully retrieved presigned URL for cluster %s credentials file %s - %s",
-        cluster_id,
-        file_name,
-        result,
-    )
+    # Redact query/credentials when logging
+    safe = result
+    try:
+        from urllib.parse import urlparse
+        parsed = urlparse(result.url or "")
+        safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+    except Exception:
+        safe_url = "<redacted>"
+    log.info(
+        "Successfully retrieved presigned URL for cluster %s credentials file %s - %s",
+        cluster_id,
+        file_name,
+        safe_url,
+    )
♻️ Duplicate comments (14)
log_analyzer/signatures/advanced_analysis.py (2)

176-192: Handle unmatched validation messages in FlappingValidations

self.validation_name_regexp.match(event["message"]) may return None; calling .groups() on it raises and causes the analyzer to bail out. Filter out unmatched lines before dereferencing the match.

-                succeed_to_failing_counter = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.succeed_to_failing_regexp.match(event["message"])
-                )
+                succeed_to_failing_counter = Counter(
+                    match.group(1)
+                    for event in events
+                    if self.succeed_to_failing_regexp.match(event["message"])
+                    for match in [self.validation_name_regexp.match(event["message"])]
+                    if match is not None
+                )
@@
-                now_fixed = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.now_fixed_regexp.match(event["message"])
-                )
+                now_fixed = Counter(
+                    match.group(1)
+                    for event in events
+                    if self.now_fixed_regexp.match(event["message"])
+                    for match in [self.validation_name_regexp.match(event["message"])]
+                    if match is not None
+                )

128-136: Guard bootstrap progress lookup before comparing stages

bootstrap_node["progress"]["current_stage"] will raise when progress is missing/None, which happens in incomplete bundles and aborts the signature. Please fetch the stage via .get() (with an empty dict default) before checking membership.

-            if (
-                bootstrap_node["progress"]["current_stage"]
-                not in eligible_bootstrap_stages
-            ):
+            current_stage = (
+                (bootstrap_node.get("progress") or {}).get("current_stage")
+            )
+            if current_stage not in eligible_bootstrap_stages:
                 return None
log_analyzer/signatures/error_detection.py (1)

47-71: Protect MasterFailedToPullIgnitionSignature from missing host fields

A single host lacking progress, inventory, or status data will throw and drop the entire signature. Please add safe lookups and reuse get_hostname so malformed inventory blobs don’t explode the analysis.

-                if (
-                    host["role"] == "bootstrap"
-                    and host["progress"]["current_stage"] == "WaitingForControlPlane"
-                ):
+                role = host.get("role")
+                current_stage = (host.get("progress") or {}).get("current_stage")
+                if (
+                    role == "bootstrap"
+                    and current_stage == "WaitingForControlPlane"
+                ):
                     # Probably failed due to other issues
                     return None
 
-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
                 boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A")
 
                 if (
-                    host["role"] == "master"
-                    and host["progress"]["current_stage"] == "Rebooting"
-                    and host["status"] == "error"
+                    role == "master"
+                    and current_stage == "Rebooting"
+                    and host.get("status") == "error"
                     and boot_mode == "bios"
                 ):
                     hosts.append(
                         OrderedDict(
-                            HostID=host["id"],
-                            Hostname=inventory["hostname"],
-                            Progress=host["progress"]["current_stage"],
+                            HostID=host.get("id", "unknown"),
+                            Hostname=log_analyzer.get_hostname(host),
+                            Progress=current_stage or "N/A",
                         )
                     )
log_analyzer/signatures/basic_info.py (4)

27-48: Parse host inventory defensively before building the hosts table

A single host with a missing/invalid inventory, progress, or status field will raise and abort the entire signature, so the analyzer silently returns None. Please harden the loop before appending rows.

-                info = host["status_info"]
-                role = host["role"]
-                inventory = json.loads(host["inventory"])
+                info = host.get("status_info", "")
+                role = host.get("role", "unknown")
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
                 if host.get("bootstrap", False):
                     role = "bootstrap"
 
                 hosts.append(
                     OrderedDict(
-                        id=host["id"],
+                        id=host.get("id", "unknown"),
                         hostname=log_analyzer.get_hostname(host),
-                        progress=host["progress"]["current_stage"],
-                        status=host["status"],
+                        progress=(host.get("progress") or {}).get(
+                            "current_stage", "N/A"
+                        ),
+                        status=host.get("status", "unknown"),
                         role=role,

…and keep using the guarded inventory, info, and .get() defaults for the remaining fields in the OrderedDict.


218-236: Harden host extra details against malformed inventory

Directly indexing host["inventory"], inventory["hostname"], or host["checked_in_at"] raises and drops the whole signature when any field is missing. Please wrap the JSON load and switch to safe lookups.

-                inventory = json.loads(host["inventory"])
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
+                system_vendor = inventory.get("system_vendor", {}) or {}
+                disks = inventory.get("disks", [])
+                disks_count = len(disks) if isinstance(disks, list) else 0
                 hosts.append(
                     OrderedDict(
-                        id=host["id"],
-                        hostname=inventory["hostname"],
+                        id=host.get("id", "unknown"),
+                        hostname=log_analyzer.get_hostname(host),
                         requested_hostname=host.get("requested_hostname", "N/A"),
-                        last_contacted=self.format_time(host["checked_in_at"]),
+                        last_contacted=self.format_time(host.get("checked_in_at", "")),
                         installation_disk=host.get("installation_disk_path", "N/A"),
-                        product_name=inventory["system_vendor"].get(
-                            "product_name", "Unavailable"
-                        ),
-                        manufacturer=inventory["system_vendor"].get(
-                            "manufacturer", "Unavailable"
-                        ),
-                        virtual_host=inventory["system_vendor"].get("virtual", False),
-                        disks_count=len(inventory["disks"]),
+                        product_name=system_vendor.get("product_name", "Unavailable"),
+                        manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+                        virtual_host=system_vendor.get("virtual", False),
+                        disks_count=disks_count,
                     )
                 )

323-340: Do not assume disks exists or is well-formed

Some bundles omit inventory["disks"] or send non-list values; the current code raises and the entire storage report disappears. Guard the JSON load and default disks to an empty list.

-                inventory = json.loads(host["inventory"])
-                disks = inventory["disks"]
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except json.JSONDecodeError:
+                    inventory = {}
+                disks = inventory.get("disks", [])
+                if not isinstance(disks, list):
+                    disks = []

86-117: Guard host progress/role lookups in the summary builder

host["progress"]["current_stage"] and similar subscripts will throw when the key is absent or progress is not a dict, which is common in partially collected bundles. Please switch to .get() with sensible defaults before building the summary.

-        host_summary = []
-        some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)
+        host_summary = []
+        some_done = any(
+            (host.get("progress") or {}).get("current_stage") == "Done"
+            for host in hosts
+        )
 
         for host in hosts:
-            if host["role"] not in ["master", "bootstrap"]:
+            role = host.get("role")
+            if role not in ["master", "bootstrap"]:
                 continue
-            if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
+            current_stage = (host.get("progress") or {}).get("current_stage")
+            if current_stage == "Done" or host.get("status") != "error":
                 continue
@@
-            }.get(host["progress"]["current_stage"], "Unknown")
+            }.get(current_stage or "", "Unknown")
 
             host_summary.append(
                 OrderedDict(
                     hostname=log_analyzer.get_hostname(host),
-                    role=host["role"],
+                    role=role or "N/A",
                     description=host_comment,
                 )
             )
service_client/assisted_service_api.py (2)

14-14: Import looks correct.

Using the top-level nestedarchive import resolves prior mis-import.


170-174: Redact presigned URL when logging (omit query/credentials).

Log only scheme+host+path; return the full URL to the client unchanged.

Apply:

-        logs_url = cast(PresignedUrl, result).url
-        log.info("Downloading logs from %s", logs_url)
-        return nestedarchive.RemoteNestedArchive(
-            cast(str, logs_url), init_download=True
-        )
+        logs_url = cast(PresignedUrl, result).url
+        parsed = urlparse(cast(str, logs_url))
+        safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+        log.info("Downloading logs from %s", safe_url)
+        return nestedarchive.RemoteNestedArchive(cast(str, logs_url), init_download=True)
server.py (1)

1018-1027: Handle errors and improve empty-results formatting in analyze_cluster_logs.

Add try/except and return a friendly message on empty results.

Apply:

 @track_tool_usage()
 async def analyze_cluster_logs(
     cluster_id: Annotated[str, Field(description="The ID of the cluster")],
 ) -> str:
     """
     Analyze the cluster logs for the given cluster_id and return the results.
     """
-    client = InventoryClient(get_access_token())
-    results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
-    return "\n\n".join([str(r) for r in results])
+    try:
+        client = InventoryClient(get_access_token())
+        results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
+        if not results:
+            return "No issues detected in cluster logs."
+        return "\n".join(str(r) for r in results if str(r).strip())
+    except Exception as e:
+        log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True)
+        return f"Error analyzing cluster logs: {e}"
log_analyzer/log_analyzer.py (3)

90-104: Fix get_all_cluster_events to return all events (not just last partition).

Current implementation returns only the last partition, conflicting with name.

Apply:

-    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
-        """Get all the cluster installation events."""
-        if self._cluster_events is None:
-            try:
-                events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
-
-                # Get the last partition (latest installation attempt)
-                self._cluster_events = self.partition_cluster_events(all_events)[-1]
-            except Exception as e:
-                logger.error("Failed to load cluster events: %s", e)
-                self._cluster_events = []
-
-        return self._cluster_events
+    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
+        """Get all cluster installation events (unpartitioned)."""
+        if self._all_events is None:
+            try:
+                events_content = self.logs_archive.get("cluster_events.json")
+                self._all_events = json.loads(cast(str | bytes, events_content))
+            except Exception as e:
+                logger.error("Failed to load cluster events: %s", e)
+                self._all_events = []
+        return self._all_events

55-61: Guard missing install_started_at to avoid KeyError.

Clusters without this field will crash current code.

Apply:

-        installation_start_time = dateutil.parser.isoparse(
-            md["cluster"]["install_started_at"]
-        )
+        installation_started_at = md.get("cluster", {}).get("install_started_at")
+        if not installation_started_at:
+            return md
+        installation_start_time = dateutil.parser.isoparse(installation_started_at)

25-37: Maintain separate caches for all events vs. last attempt.

Prevents double-partitioning and clarifies semantics.

Apply:

-    _metadata: dict[str, Any] | None
+    _metadata: dict[str, Any] | None
+    _all_events: list[dict[str, Any]] | None
@@
-        self._metadata = None
-        self._cluster_events = None
+        self._metadata = None
+        self._all_events = None
+        self._cluster_events = None
log_analyzer/signatures/networking.py (1)

36-44: Guard machine_networks and inventory JSON (duplicate of prior feedback).

Indexing cluster["machine_networks"][0]["cidr"] and unguarded JSON load on host["inventory"] can raise; handle gracefully instead of relying on the broad except. Also consider strict=False for ip_network.

Apply this diff:

-            if not cluster.get("hosts"):
-                return None
-            host = cluster["hosts"][0]
-            inventory = json.loads(host["inventory"])
-
-            # The first one is the machine_cidr that will be used for network configuration
-            machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"])
+            hosts = cluster.get("hosts") or []
+            if not hosts:
+                return None
+            host = hosts[0]
+
+            inv_raw = host.get("inventory")
+            if not inv_raw:
+                return None
+            try:
+                inventory = json.loads(inv_raw)
+            except json.JSONDecodeError:
+                return None
+
+            # The first one is the machine_cidr that will be used for network configuration
+            mn = cluster.get("machine_networks") or []
+            if not mn or not mn[0].get("cidr"):
+                return None
+            machine_cidr = ipaddress.ip_network(mn[0]["cidr"], strict=False)
🧹 Nitpick comments (8)
server.py (1)

17-21: CI build failure: pydantic-core doesn’t build on Python 3.14 with PyO3 0.24.1.

Fix options:

  • Prefer: use Python <= 3.13 in the build image.
  • Or: upgrade pydantic/pydantic-core to a release that supports 3.14.
  • Or (interim): set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 during build (Dockerfile/CI env) before uv sync.

Given the failure log, the minimal unblock is the env var; long-term, pin a compatible Python/runtime or bump deps.

log_analyzer/signatures/__init__.py (1)

9-14: Avoid import * and narrow discovery scope for clarity.

Import modules explicitly and inspect only those to reduce namespace noise.

Apply:

-from .basic_info import *  # noqa
-from .error_detection import *  # noqa
-from .performance import *  # noqa
-from .networking import *  # noqa
-from .advanced_analysis import *  # noqa
-from .platform_specific import *  # noqa
+from . import basic_info, error_detection, performance, networking, advanced_analysis, platform_specific
@@
-ALL_SIGNATURES = []
-
-current_module = sys.modules[__name__]
-for name, obj in inspect.getmembers(current_module):
+ALL_SIGNATURES = []
+for module in (
+    basic_info,
+    error_detection,
+    performance,
+    networking,
+    advanced_analysis,
+    platform_specific,
+):
+    for name, obj in inspect.getmembers(module):
         if (
             inspect.isclass(obj)
             and issubclass(obj, Signature)
             and obj is not Signature
             and obj is not ErrorSignature
             and obj is not SignatureResult
         ):
             ALL_SIGNATURES.append(obj)

Also applies to: 19-31

log_analyzer/signatures/networking.py (2)

88-110: Avoid substring false-positives when matching VIPs in ip-addr output.

Checking if vip in ip_addr can match partial IPs (e.g., 10.0.0.1 in 10.0.0.10). Use a word-boundary-like regex per VIP.

Apply this diff:

-            vips = [vip["ip"] for vip in cluster.get("api_vips", [])]
+            vips = [vip["ip"] for vip in cluster.get("api_vips", [])]
+            vip_patterns = {vip: re.compile(rf"(?<!\d){re.escape(vip)}(?!\d)") for vip in vips}
@@
-                    node_ip = os.path.basename(node_dir)
+                    node_ip = getattr(node_dir, "name", os.path.basename(str(node_dir)))
@@
-                        for vip in vips:
-                            if vip in ip_addr:
+                        for vip in vips:
+                            if vip_patterns[vip].search(ip_addr):
                             collisions.append((vip, node_ip))
@@
-                for vip in vips:
-                    if vip in bootstrap_ip_addr:
+                for vip in vips:
+                    if vip_patterns[vip].search(bootstrap_ip_addr):
                     collisions.append((vip, "bootstrap"))
@@
-                        for vip in vips:
-                            if vip in ip_addr:
+                        for vip in vips:
+                            if vip_patterns[vip].search(ip_addr):
                             collisions.append((vip, node_ip))
@@
-                for vip in vips:
-                    if vip in bootstrap_ip_addr:
+                for vip in vips:
+                    if vip_patterns[vip].search(bootstrap_ip_addr):
                     collisions.append((vip, "bootstrap"))

Also, please confirm what type iterdir() yields from logs_archive.get(...) (Path-like vs custom object). If not Path-like, prefer str(node_dir) for basename extraction.

Also applies to: 123-151, 158-169


181-201: Tighten nameserver parsing to ignore comments and trailing text.

Current regex captures entire line; IP parsing then relies on exceptions. Capture only the token.

-    NAMESERVER_PATTERN = re.compile(r"^nameserver (.*)$")
+    NAMESERVER_PATTERN = re.compile(r"^\s*nameserver\s+([^\s#;]+)")
log_analyzer/signatures/performance.py (4)

56-66: Use search instead of match for robustness.

Event messages may have prefixes; search is safer than start-anchored match.

-        def get_image_download_info(event):
-            match = cls.image_download_regex.match(event["message"])
+        def get_image_download_info(event):
+            match = cls.image_download_regex.search(event["message"])
             if match:
                 return match.groupdict()
             return None

90-96: Guard missing/empty hosts in metadata.

Avoid KeyError if cluster["hosts"] is missing/empty.

-            host_entries = []
-            for host in cluster["hosts"]:
+            host_entries = []
+            hosts = cluster.get("hosts") or []
+            if not hosts:
+                return None
+            for host in hosts:
                 host_id = host["id"]
                 host_events = events_by_host.get(host_id, [])
                 entry = self._create_host_entry(host, host_events, log_analyzer)
                 host_entries.append(entry)

246-250: Avoid double regex evaluation in generator.

get_duration(event) runs twice per event. Use walrus.

-        return (
-            (event, get_duration(event))
-            for event in events
-            if get_duration(event) is not None
-        )
+        return (
+            (event, duration)
+            for event in events
+            if (duration := get_duration(event)) is not None
+        )

6-12: CI failure: Python 3.14 + pydantic-core (PyO3 0.24.1) workaround.

Container build fails compiling pydantic-core due to PyO3 supporting up to 3.13. Options:

  • Prefer: pin base Python to 3.13 for now.
  • Or: set ENV PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 before running uv/pip to use stable ABI.
  • Or: upgrade to a pydantic/pydantic-core release with wheels/pyO3 that support 3.14 (if available).

Example Dockerfile tweak:

- RUN uv sync
+ ENV PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
+ RUN uv sync

Please verify which route you prefer and whether newer pydantic-core wheels for cp314 are available in your build environment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4aea738 and aaa72fd.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • template.yaml
  • log_analyzer/main.py
  • .github/workflows/mypy.yaml
  • log_analyzer/signatures/platform_specific.py
  • Makefile
  • Dockerfile
🧰 Additional context used
🧬 Code graph analysis (10)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/basic_info.py (6)
  • analyze (20-81)
  • analyze (125-163)
  • analyze (169-206)
  • analyze (212-250)
  • analyze (256-287)
  • analyze (315-367)
log_analyzer/signatures/error_detection.py (1)
  • analyze (37-87)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
server.py (3)
log_analyzer/main.py (1)
  • analyze_cluster (14-71)
metrics/metrics.py (1)
  • track_tool_usage (51-65)
service_client/assisted_service_api.py (1)
  • InventoryClient (27-561)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
  • sanitize_exceptions (24-58)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (1)
  • SignatureResult (15-48)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
🔇 Additional comments (2)
log_analyzer/__init__.py (1)

7-15: Public surface looks good.

Re-exporting LogAnalyzer, ALL_SIGNATURES, and SignatureResult is appropriate.

log_analyzer/log_analyzer.py (1)

77-84: Derive last attempt from partitioning of all events.

After fixing get_all_cluster_events, this remains correct; keep as-is or clarify via local var names.

Comment on lines +289 to +307
def _get_interfaces(self, host):
"""Extract interface information from host."""
inventory = json.loads(host["inventory"])
interfaces_details = defaultdict(list)

for interface in inventory.get("interfaces", []):
name = interface.get("name")
if not name:
continue
interfaces_details["name"].append(name)
interfaces_details["mac_address"].append(
json.dumps(interface.get("mac_address"))
)
interfaces_details["ipv4_addresses"].append(
json.dumps(interface.get("ipv4_addresses", []))
)
interfaces_details["ipv6_addresses"].append(
json.dumps(interface.get("ipv6_addresses", []))
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make _get_interfaces resilient to bad inventory blobs

Parsing host["inventory"] without guards can still throw and short-circuit the parent signature. Load via .get() and catch JSONDecodeError before iterating interfaces.

-        inventory = json.loads(host["inventory"])
+        try:
+            inventory = json.loads(host.get("inventory", "{}"))
+        except json.JSONDecodeError:
+            inventory = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_interfaces(self, host):
"""Extract interface information from host."""
inventory = json.loads(host["inventory"])
interfaces_details = defaultdict(list)
for interface in inventory.get("interfaces", []):
name = interface.get("name")
if not name:
continue
interfaces_details["name"].append(name)
interfaces_details["mac_address"].append(
json.dumps(interface.get("mac_address"))
)
interfaces_details["ipv4_addresses"].append(
json.dumps(interface.get("ipv4_addresses", []))
)
interfaces_details["ipv6_addresses"].append(
json.dumps(interface.get("ipv6_addresses", []))
)
def _get_interfaces(self, host):
"""Extract interface information from host."""
try:
inventory = json.loads(host.get("inventory", "{}"))
except json.JSONDecodeError:
inventory = {}
interfaces_details = defaultdict(list)
for interface in inventory.get("interfaces", []):
name = interface.get("name")
if not name:
continue
interfaces_details["name"].append(name)
interfaces_details["mac_address"].append(
json.dumps(interface.get("mac_address"))
)
interfaces_details["ipv4_addresses"].append(
json.dumps(interface.get("ipv4_addresses", []))
)
interfaces_details["ipv6_addresses"].append(
json.dumps(interface.get("ipv6_addresses", []))
)
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 289 to 307,
_get_interfaces currently calls json.loads(host["inventory"]) directly which can
raise KeyError or JSONDecodeError and abort the signature; change it to use
host.get("inventory") with a safe default (e.g., "{}" or empty dict), attempt
json.loads inside a try/except that catches json.JSONDecodeError (and TypeError
if non-string) and fall back to an empty dict, then proceed to iterate
inventory.get("interfaces", []) so the function is resilient to missing or
malformed inventory blobs.

Comment thread pyproject.toml Outdated
Comment on lines +17 to +19
"pydantic>=2.11.9",
"nestedarchive>=0.2.4",
"tabulate>=0.9.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add runtime dependency on python-dateutil

log_analyzer/signatures/base.py imports dateutil.parser, but only the typing stub (types-python-dateutil) was added. Without the real python-dateutil wheel the analyzer aborts with ModuleNotFoundError. Please add the runtime package next to the other new dependencies.

     "pydantic>=2.11.9",
+    "python-dateutil>=2.9.0.post0",
     "nestedarchive>=0.2.4",

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"pydantic>=2.11.9",
"nestedarchive>=0.2.4",
"tabulate>=0.9.0",
"pydantic>=2.11.9",
"python-dateutil>=2.9.0.post0",
"nestedarchive>=0.2.4",
"tabulate>=0.9.0",
🤖 Prompt for AI Agents
In pyproject.toml around lines 17 to 19, the code imports dateutil.parser but
only the typing stub was added; add the runtime dependency "python-dateutil"
(pin to a compatible minimum version, e.g. >=2.8.2) to the same dependencies
list alongside pydantic, nestedarchive and tabulate so the real package is
installed at runtime and prevents ModuleNotFoundError.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2025
@keitwb
Copy link
Copy Markdown
Contributor Author

keitwb commented Oct 14, 2025

Something in the container build is not happy:

Fixed it by upgrading pydantic to latest.

- Copied it in from https://github.com/carbonin/assisted-log-analyzer
- Merged API Client method into existing api client
- Got rid of cli main method
- Disabled pydocstyle checker as it is very low value
- Skipping mypy check for the new log_analyzer package as there are too many errors
- Moving mypy config to pyproject.toml
- Adding tool to call the analyze logs function and return the string dump of the signatures.
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (14)
log_analyzer/signatures/advanced_analysis.py (3)

176-192: Make FlappingValidations resilient to unexpected message formats.

Lines 176-183 and 185-192 use cast(re.Match[str], self.validation_name_regexp.match(...)) which assumes the match always succeeds. If validation_name_regexp doesn't match a message (even though succeed_to_failing_regexp did), the cast will fail and crash the signature.

Apply this diff to filter out non-matching messages:

             succeed_to_failing_counter = Counter(
-                cast(
-                    re.Match[str],
-                    self.validation_name_regexp.match(event["message"]),
-                ).groups()[0]
+                m.groups()[0]
                 for event in events
                 if self.succeed_to_failing_regexp.match(event["message"])
+                for m in [self.validation_name_regexp.match(event["message"])]
+                if m is not None
             )

             now_fixed = Counter(
-                cast(
-                    re.Match[str],
-                    self.validation_name_regexp.match(event["message"]),
-                ).groups()[0]
+                m.groups()[0]
                 for event in events
                 if self.now_fixed_regexp.match(event["message"])
+                for m in [self.validation_name_regexp.match(event["message"])]
+                if m is not None
             )

Based on past review comments.


133-136: Guard nested progress access to avoid KeyError.

Line 134 directly indexes bootstrap_node["progress"]["current_stage"] without checking if the progress key exists or is a dict. This will raise KeyError if progress is missing or TypeError if it's not a dict.

Apply this diff:

-        if (
-            bootstrap_node["progress"]["current_stage"]
-            not in eligible_bootstrap_stages
-        ):
+        current_stage = bootstrap_node.get("progress", {}).get("current_stage", "")
+        if current_stage not in eligible_bootstrap_stages:
             return None

Based on past review comments.


407-421: Use list default for conditions/containerStatuses to avoid iteration errors.

Lines 410 and 417 use {} as the default for "conditions" and "containerStatuses". Since these are lists in the Kubernetes API, defaulting to {} will cause the code to iterate over dict keys instead of list items, leading to AttributeError when calling condition.get(...).

Apply this diff:

             ready = [
                 condition.get("status") == "True"
                 for condition in controller_pod.get("status", {}).get(
-                    "conditions", {}
+                    "conditions", []
                 )
                 if condition.get("type") == "Ready"
             ][0]
@@
         conditions_tbl = self.generate_table(
-            controller_pod.get("status", {}).get("conditions", [])
+            controller_pod.get("status", {}).get("conditions", [])
         )
         containers_tbl = self.generate_table(
-            controller_pod.get("status", {}).get("containerStatuses", [])
+            controller_pod.get("status", {}).get("containerStatuses", [])
         )

Based on past review comments.

server.py (1)

1018-1028: Add error handling and improve result formatting.

The function has several issues flagged in past reviews:

  1. No error handling: If analyze_cluster raises an exception (e.g., network error, invalid cluster ID), it will crash the MCP tool call with an unhandled exception.
  2. Suboptimal formatting: Using list str() representation produces output like [SignatureResult(...), ...] instead of formatted analysis results.

Apply this diff:

 @track_tool_usage()
 async def analyze_cluster_logs(
     cluster_id: Annotated[str, Field(description="The ID of the cluster")],
 ) -> str:
     """
     Analyze the cluster logs for the given cluster_id and return the results.
+    
+    Returns:
+        str: Formatted string containing signature analysis results with severity indicators.
     """
-    client = InventoryClient(get_access_token())
-    results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
-    return "\n\n".join([str(r) for r in results])
+    try:
+        client = InventoryClient(get_access_token())
+        results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
+        if not results:
+            return "No issues detected in cluster logs."
+        return "\n\n".join(str(result) for result in results)
+    except Exception as e:
+        log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True)
+        return f"Error analyzing cluster logs: {str(e)}"

Based on past review comments.

log_analyzer/log_analyzer.py (2)

90-103: Fix incorrect event loading semantics.

The method get_all_cluster_events should return all events, but line 98 filters to only the last partition. This creates confusion:

  1. The method name says "all" but returns only the latest attempt
  2. Line 83 in get_last_install_cluster_events calls this method and partitions again (unnecessary)
  3. The cache variable _cluster_events stores only the last partition, not all events

Apply this diff to fix the semantics:

+    _all_events: List[Dict[str, Any]] | None
+    _cluster_events: List[Dict[str, Any]] | None
+
     def __init__(self, logs_archive: nestedarchive.RemoteNestedArchive):
         """
         Initialize the log analyzer.

         Args:
             logs_archive: RemoteNestedArchive containing the cluster logs
         """
         self.logs_archive = logs_archive
         self._metadata = None
+        self._all_events = None
         self._cluster_events = None
@@
     def get_all_cluster_events(self) -> List[Dict[str, Any]]:
         """Get all the cluster installation events."""
-        if self._cluster_events is None:
+        if self._all_events is None:
             try:
                 events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
-
-                # Get the last partition (latest installation attempt)
-                self._cluster_events = self.partition_cluster_events(all_events)[-1]
+                self._all_events = json.loads(cast(str | bytes, events_content))
             except Exception as e:
                 logger.error("Failed to load cluster events: %s", e)
-                self._cluster_events = []
-
-        return self._cluster_events
+                self._all_events = []
+        return self._all_events

Based on past review comments.


55-75: Guard missing install_started_at to prevent crashes.

Lines 58-60 assume install_started_at exists in the metadata. If this field is missing (which can happen with incomplete or corrupted metadata), KeyError will crash the metadata loading.

Apply this diff:

 @staticmethod
 def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]:
     """Clean metadata JSON by separating deleted hosts."""
-    installation_start_time = dateutil.parser.isoparse(
-        md["cluster"]["install_started_at"]
-    )
+    install_started_at = md.get("cluster", {}).get("install_started_at")
+    if not install_started_at:
+        # If no installation start time, we can't partition deleted hosts
+        return md
+    
+    installation_start_time = dateutil.parser.isoparse(install_started_at)

Based on past review comments.

log_analyzer/signatures/error_detection.py (1)

49-72: Harden host parsing to prevent crashes on malformed data.

Lines 49-72 have multiple unsafe accesses that can crash on missing keys or invalid JSON:

  1. Lines 50-51: Direct indexing of host["role"] and host["progress"]["current_stage"] without .get()
  2. Line 56: json.loads(host["inventory"]) without try/except for JSONDecodeError
  3. Line 68: Direct indexing of inventory["hostname"]

Apply this diff to handle missing/malformed data gracefully:

         for host in cluster_hosts:
             if (
-                host["role"] == "bootstrap"
-                and host["progress"]["current_stage"] == "WaitingForControlPlane"
+                host.get("role") == "bootstrap"
+                and host.get("progress", {}).get("current_stage") == "WaitingForControlPlane"
             ):
                 # Probably failed due to other issues
                 return None

-            inventory = json.loads(host["inventory"])
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (ValueError, json.JSONDecodeError):
+                logger.warning("Invalid inventory for host %s", host.get("id"))
+                continue
             boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A")

             if (
-                host["role"] == "master"
-                and host["progress"]["current_stage"] == "Rebooting"
-                and host["status"] == "error"
+                host.get("role") == "master"
+                and host.get("progress", {}).get("current_stage") == "Rebooting"
+                and host.get("status") == "error"
                 and boot_mode == "bios"
             ):
                 hosts.append(
                     OrderedDict(
                         HostID=host["id"],
-                        Hostname=inventory["hostname"],
-                        Progress=host["progress"]["current_stage"],
+                        Hostname=log_analyzer.get_hostname(host),
+                        Progress=host.get("progress", {}).get("current_stage", "N/A"),
                     )
                 )

Based on past review comments.

log_analyzer/signatures/platform_specific.py (1)

31-49: Guard JSON parsing to prevent signature crashes.

Line 31 parses host.get("inventory", "{}") without error handling. If inventory contains invalid JSON (which can happen with corrupted or incomplete data), json.loads will raise JSONDecodeError and abort the entire signature analysis, preventing other signatures from running.

Apply this diff to handle malformed inventory gracefully:

     hosts = []
     for host in cluster.get("hosts", []):
-        inventory = json.loads(host.get("inventory", "{}"))
+        try:
+            inventory = json.loads(host.get("inventory", "{}"))
+        except (ValueError, json.JSONDecodeError):
+            logger.warning("Invalid inventory JSON for host %s, skipping", host.get("id"))
+            continue
         if (
             len(inventory.get("disks", [])) == 1

Based on past review comments that explicitly requested this fix.

log_analyzer/signatures/basic_info.py (5)

26-50: Don’t let a single bad host abort the whole signature; parse inventory and keys defensively

Guard JSON decoding and missing keys so one malformed host doesn’t drop the entire HostsStatusSignature result.

Apply:

-            for host in cluster["hosts"]:
-                info = host["status_info"]
-                role = host["role"]
-                inventory = json.loads(host["inventory"])
-                if host.get("bootstrap", False):
-                    role = "bootstrap"
+            for host in cluster.get("hosts", []):
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    inventory = {}
+                info = host.get("status_info", "")
+                role = "bootstrap" if host.get("bootstrap", False) else host.get("role")
@@
-                        id=host["id"],
+                        id=host.get("id", "unknown"),
                         hostname=log_analyzer.get_hostname(host),
-                        progress=host["progress"]["current_stage"],
-                        status=host["status"],
-                        role=role,
+                        progress=host.get("progress", {}).get("current_stage", "N/A"),
+                        status=host.get("status", "N/A"),
+                        role=role or "N/A",
                         boot_mode=inventory.get("boot", {}).get(
                             "current_boot_mode", "N/A"
                         ),
                         status_info=str(info),
                         logs_info=host.get("logs_info", ""),
-                        last_checked_in_at=self.format_time(
-                            host.get("checked_in_at", str(datetime.min))
-                        ),
+                        last_checked_in_at=self.format_time(host.get("checked_in_at", "")),

Based on learnings


86-117: Harden hosts summary: safe progress/role/status access to avoid KeyError

Use .get for nested keys and compute current_stage once.

-        some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)
+        some_done = any(
+            host.get("progress", {}).get("current_stage") == "Done" for host in hosts
+        )
 
-        for host in hosts:
-            if host["role"] not in ["master", "bootstrap"]:
+        for host in hosts:
+            role = host.get("role")
+            if role not in ["master", "bootstrap"]:
                 continue
-            if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
+            current_stage = host.get("progress", {}).get("current_stage", "")
+            if current_stage == "Done" or host.get("status") != "error":
                 continue
@@
-            }.get(host["progress"]["current_stage"], "Unknown")
+            }.get(current_stage, "Unknown")
@@
-                    role=host["role"],
+                    role=role or "N/A",

218-237: Harden host extra details: robust inventory parsing and safe lookups

Avoid KeyError/JSON errors; prefer get_hostname and defaults.

-            for host in cluster["hosts"]:
-                inventory = json.loads(host["inventory"])
+            for host in cluster.get("hosts", []):
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    inventory = {}
+                system_vendor = inventory.get("system_vendor", {})
                 hosts.append(
                     OrderedDict(
-                        id=host["id"],
-                        hostname=inventory["hostname"],
+                        id=host.get("id", "unknown"),
+                        hostname=log_analyzer.get_hostname(host),
                         requested_hostname=host.get("requested_hostname", "N/A"),
-                        last_contacted=self.format_time(host["checked_in_at"]),
+                        last_contacted=self.format_time(host.get("checked_in_at", "")),
                         installation_disk=host.get("installation_disk_path", "N/A"),
-                        product_name=inventory["system_vendor"].get(
-                            "product_name", "Unavailable"
-                        ),
-                        manufacturer=inventory["system_vendor"].get(
-                            "manufacturer", "Unavailable"
-                        ),
-                        virtual_host=inventory["system_vendor"].get("virtual", False),
-                        disks_count=len(inventory["disks"]),
+                        product_name=system_vendor.get("product_name", "Unavailable"),
+                        manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+                        virtual_host=system_vendor.get("virtual", False),
+                        disks_count=len(inventory.get("disks", [])),
                     )
                 )

289-309: Make _get_interfaces resilient; avoid quoting MAC address

Guard JSON parse and avoid json.dumps for scalars to prevent extra quotes.

-    def _get_interfaces(self, host):
+    def _get_interfaces(self, host):
         """Extract interface information from host."""
-        inventory = json.loads(host["inventory"])
+        try:
+            inventory = json.loads(host.get("inventory", "{}"))
+        except (json.JSONDecodeError, TypeError):
+            inventory = {}
         interfaces_details = defaultdict(list)
 
         for interface in inventory.get("interfaces", []):
             name = interface.get("name")
             if not name:
                 continue
             interfaces_details["name"].append(name)
-            interfaces_details["mac_address"].append(
-                json.dumps(interface.get("mac_address"))
-            )
+            interfaces_details["mac_address"].append(str(interface.get("mac_address", "")))
             interfaces_details["ipv4_addresses"].append(
                 json.dumps(interface.get("ipv4_addresses", []))
             )
             interfaces_details["ipv6_addresses"].append(
                 json.dumps(interface.get("ipv6_addresses", []))
             )

321-350: Guard inventory/disks; default to empty list

Prevent KeyError/TypeError on missing or malformed inventory.

-            for host in cluster["hosts"]:
-                inventory = json.loads(host["inventory"])
-                disks = inventory["disks"]
+            for host in cluster.get("hosts", []):
+                try:
+                    inventory = json.loads(host.get("inventory", "{}"))
+                except (json.JSONDecodeError, TypeError):
+                    inventory = {}
+                disks = inventory.get("disks", []) or []
log_analyzer/signatures/networking.py (1)

36-45: Guard machine_networks and inventory parsing to avoid IndexError/KeyError

Handle empty hosts/machine_networks and malformed inventory gracefully.

-            if not cluster.get("hosts"):
+            if not cluster.get("hosts"):
                 return None
             host = cluster["hosts"][0]
-            inventory = json.loads(host["inventory"])
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, TypeError):
+                inventory = {}
 
-            # The first one is the machine_cidr that will be used for network configuration
-            machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"])
+            # The first one is the machine_cidr that will be used for network configuration
+            mn = cluster.get("machine_networks") or []
+            if not mn or not mn[0].get("cidr"):
+                return None
+            machine_cidr = ipaddress.ip_network(mn[0]["cidr"])
🧹 Nitpick comments (6)
log_analyzer/signatures/base.py (1)

40-46: Minor: Simplify redundant else branch.

Line 46 assigns header to itself unnecessarily. The else branch can be removed since lines 41-44 already handle the error and warning cases.

Apply this diff:

     header = f"=== {self.title} ==="
     if self.severity == "error":
         header = f"ERROR: {header}"
     elif self.severity == "warning":
         header = f"WARNING: {header}"
-    else:
-        header = f"{header}"
log_analyzer/signatures/basic_info.py (1)

175-190: Optional: use safe accessors to avoid aborting on missing fields

FailureDescription will crash on missing created_at/status/status_info. Prefer .get defaults.

-                "Cluster ID": cluster["id"],
+                "Cluster ID": cluster.get("id", "N/A"),
@@
-                "Created At": self.format_time(cluster["created_at"]),
+                "Created At": self.format_time(cluster.get("created_at", "")),
@@
-                "Status": cluster["status"],
-                "Status Info": cluster["status_info"],
+                "Status": cluster.get("status", "N/A"),
+                "Status Info": cluster.get("status_info", "N/A"),
log_analyzer/signatures/networking.py (1)

6-12: CI build fix for Python 3.14 + pydantic-core/PyO3

Container build fails compiling pydantic-core due to PyO3 0.24.1 max Python 3.13. Options:

  • Set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 in the build stage (uv sync step).
  • Or pin Python base image to 3.13.
  • Or upgrade pydantic-core/pydantic to a version with Python 3.14 wheels.
log_analyzer/signatures/performance.py (3)

33-35: Avoid ValueError on non-numeric download_rate

Guard float conversion so a single odd event doesn’t drop the signature.

-            for image_info in image_info_list:
-                if float(image_info["download_rate"]) < self.minimum_download_rate_mb:
-                    abnormal_image_info.append(image_info)
+            for image_info in image_info_list:
+                try:
+                    rate = float(image_info["download_rate"])
+                except (TypeError, ValueError):
+                    continue
+                if rate < self.minimum_download_rate_mb:
+                    abnormal_image_info.append(image_info)

126-135: Per-host resilience: don’t abort on timestamp parsing errors

Catch generic exceptions when deriving start/end to keep other hosts’ results.

-        try:
+        try:
             start_time = cls._get_start_event_timestamp(host_events)
             end_time = cls._get_end_event_timestamp(host_events)
             total_duration_seconds = (end_time - start_time).total_seconds()
-        except cls.NoEventFound:
+        except (cls.NoEventFound, Exception):
             return entry

246-250: Avoid double regex work in _get_fio_events

Compute duration once with the walrus operator.

-        return (
-            (event, get_duration(event))
-            for event in events
-            if get_duration(event) is not None
-        )
+        return (
+            (event, duration)
+            for event in events
+            if (duration := get_duration(event)) is not None
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aaa72fd and cc2c14d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • log_analyzer/signatures/init.py
  • pyproject.toml
  • log_analyzer/init.py
  • service_client/assisted_service_api.py
  • .github/workflows/mypy.yaml
  • Dockerfile
🧰 Additional context used
🧬 Code graph analysis (9)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
  • get_cluster_logs (161-174)
log_analyzer/log_analyzer.py (1)
  • LogAnalyzer (22-228)
log_analyzer/signatures/base.py (2)
  • SignatureResult (15-48)
  • analyze (59-68)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/basic_info.py (6)
  • analyze (20-81)
  • analyze (125-163)
  • analyze (169-206)
  • analyze (212-250)
  • analyze (256-287)
  • analyze (315-367)
log_analyzer/signatures/error_detection.py (1)
  • analyze (37-87)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
server.py (2)
log_analyzer/main.py (1)
  • analyze_cluster (14-71)
service_client/assisted_service_api.py (1)
  • InventoryClient (27-561)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_host_log_file (134-165)
  • get_journal_log (167-197)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (5)
Makefile (1)

57-64: Good call aligning mypy and format targets.

Pointing mypy at pyproject.toml and extending format with Ruff autofixes keeps the workflow consistent with the new configuration.

template.yaml (1)

46-48: LGTM!

The new parameter and environment variable configuration correctly implement the troubleshooting tools feature flag, defaulting to disabled ("0") as requested in past reviews.

Also applies to: 107-108

log_analyzer/main.py (2)

14-71: LGTM!

The analyze_cluster function is well-structured with:

  • Proper error handling at both the outer level (lines 69-71) and per-signature level (lines 62-65)
  • Clear logging for debugging
  • Graceful handling of unknown signatures
  • Clean async/await pattern

74-85: LGTM!

The print_results function provides clear, user-friendly output formatting with appropriate handling of empty results.

log_analyzer/signatures/basic_info.py (1)

125-159: LGTM: versions/release_tag access is wrap-aware and robust

Using metadata.get(...) with cluster_md fallback is correct.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (15)
service_client/assisted_service_api.py (1)

160-174: Redact credentials from presigned URL before logging.

Line 171 logs the full presigned URL, which typically includes temporary credentials in the query string (e.g., X-Amz-Signature, tokens). Logging these credentials can expose them in log aggregation systems.

Apply this diff to log only the safe parts of the URL:

+from urllib.parse import urlparse
+
 @sanitize_exceptions
 async def get_cluster_logs(
     self, cluster_id: str
 ) -> nestedarchive.RemoteNestedArchive:
     result = await self._api_call(
         self._installer_api().v2_get_presigned_for_cluster_files,
         cluster_id=cluster_id,
         file_name="logs",
     )

     logs_url = cast(PresignedUrl, result).url
-    log.info("Downloading logs from %s", logs_url)
+    parsed = urlparse(cast(str, logs_url))
+    safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
+    log.info("Downloading logs from %s", safe_url)
     return nestedarchive.RemoteNestedArchive(
         cast(str, logs_url), init_download=True
     )
pyproject.toml (1)

17-19: Add runtime dependency on python-dateutil.

log_analyzer/signatures/base.py imports dateutil.parser, but only the typing stub (types-python-dateutil in dev dependencies) is present. Without the runtime package, the analyzer will fail with ModuleNotFoundError.

Based on learnings

Apply this diff:

     "pydantic>=2.12.1",
+    "python-dateutil>=2.9.0.post0",
     "nestedarchive>=0.2.4",
log_analyzer/signatures/error_detection.py (1)

37-87: Harden host parsing with safe JSON decoding and key access.

Lines 50-51, 56, 60-63, 68-69 directly access dictionaries and parse JSON without guards. This will crash the signature on missing keys or invalid inventory.

Apply this diff:

         hosts = []
         for host in cluster_hosts:
             if (
-                host["role"] == "bootstrap"
-                and host["progress"]["current_stage"] == "WaitingForControlPlane"
+                host.get("role") == "bootstrap"
+                and host.get("progress", {}).get("current_stage") == "WaitingForControlPlane"
             ):
                 # Probably failed due to other issues
                 return None

-            inventory = json.loads(host["inventory"])
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, ValueError):
+                inventory = {}
             boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A")

             if (
-                host["role"] == "master"
-                and host["progress"]["current_stage"] == "Rebooting"
-                and host["status"] == "error"
+                host.get("role") == "master"
+                and host.get("progress", {}).get("current_stage") == "Rebooting"
+                and host.get("status") == "error"
                 and boot_mode == "bios"
             ):
                 hosts.append(
                     OrderedDict(
                         HostID=host["id"],
-                        Hostname=inventory["hostname"],
-                        Progress=host["progress"]["current_stage"],
+                        Hostname=log_analyzer.get_hostname(host),
+                        Progress=host.get("progress", {}).get("current_stage", "N/A"),
                     )
                 )
server.py (1)

1018-1027: Add error handling and improve docstring.

The function lacks exception handling for analyze_cluster, which can cause the MCP tool to fail with an unhandled exception. The docstring should also describe the return format.

Apply this diff:

 @track_tool_usage()
 async def analyze_cluster_logs(
     cluster_id: Annotated[str, Field(description="The ID of the cluster")],
 ) -> str:
     """
     Analyze the cluster logs for the given cluster_id and return the results.
+    
+    Returns:
+        str: A formatted string containing signature analysis results with severity indicators,
+             or an error message if analysis fails.
     """
-    client = InventoryClient(get_access_token())
-    results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
-    return "\n\n".join([str(r) for r in results])
+    try:
+        client = InventoryClient(get_access_token())
+        results = await analyze_cluster(cluster_id=cluster_id, api_client=client)
+        if not results:
+            return "No issues detected in cluster logs."
+        return "\n\n".join(str(r) for r in results)
+    except Exception as e:
+        log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True)
+        return f"Error analyzing cluster logs: {str(e)}"
log_analyzer/signatures/basic_info.py (4)

289-309: Guard JSON parsing in _get_interfaces helper.

Line 291 directly parses JSON without error handling. A malformed inventory will cause the helper to crash.

Apply this diff:

     def _get_interfaces(self, host):
         """Extract interface information from host."""
-        inventory = json.loads(host["inventory"])
+        try:
+            inventory = json.loads(host.get("inventory", "{}"))
+        except (json.JSONDecodeError, ValueError):
+            inventory = {}
         interfaces_details = defaultdict(list)

83-119: Use safe dictionary access in host summary generation.

Lines 86, 89, 91, 109, 114 directly index dictionaries without checking for key existence. This can raise KeyError when keys are missing.

Apply this diff:

     def _generate_hosts_summary(self, hosts, log_analyzer):
         """Generate a summary of host issues."""
         host_summary = []
-        some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)
+        some_done = any(
+            host.get("progress", {}).get("current_stage") == "Done" for host in hosts
+        )

         for host in hosts:
-            if host["role"] not in ["master", "bootstrap"]:
+            role = host.get("role")
+            if role not in ["master", "bootstrap"]:
                 continue
-            if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
+            current_stage = host.get("progress", {}).get("current_stage", "")
+            if current_stage == "Done" or host.get("status") != "error":
                 continue

             host_comment = {
                 "Waiting for bootkube": "bootkube.service never completed",
                 "Rebooting": "Node never pulled Ignition",
                 "Configuring": "Node pulled Ignition, but never started kubelet",
                 "Joined": (
                     "The Node k8s resource associated with this host is not Ready"
                     + (
                         " or the Assisted Controller is not running on the cluster"
                         if not some_done
                         else ""
                     )
                 ),
                 "Waiting for control plane": "Masters never formed 2-node cluster",
                 "Waiting for controller": "Assisted installer controller pod never started",
                 "Writing image to disk": "Image probably failed to be written on disk",
-            }.get(host["progress"]["current_stage"], "Unknown")
+            }.get(current_stage, "Unknown")

             host_summary.append(
                 OrderedDict(
                     hostname=log_analyzer.get_hostname(host),
-                    role=host["role"],
+                    role=role or "N/A",
                     description=host_comment,
                 )
             )

209-250: Add defensive parsing in HostsExtraDetailSignature.

Lines 220-236 directly parse JSON and access nested keys without error handling. Missing or malformed inventory will crash the signature.

Apply this diff:

         hosts = []
         for host in cluster["hosts"]:
-            inventory = json.loads(host["inventory"])
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, ValueError):
+                inventory = {}
+            system_vendor = inventory.get("system_vendor", {})
             hosts.append(
                 OrderedDict(
                     id=host["id"],
-                    hostname=inventory["hostname"],
+                    hostname=log_analyzer.get_hostname(host),
                     requested_hostname=host.get("requested_hostname", "N/A"),
-                    last_contacted=self.format_time(host["checked_in_at"]),
+                    last_contacted=self.format_time(host.get("checked_in_at", "")),
                     installation_disk=host.get("installation_disk_path", "N/A"),
-                    product_name=inventory["system_vendor"].get(
-                        "product_name", "Unavailable"
-                    ),
-                    manufacturer=inventory["system_vendor"].get(
-                        "manufacturer", "Unavailable"
-                    ),
-                    virtual_host=inventory["system_vendor"].get("virtual", False),
-                    disks_count=len(inventory["disks"]),
+                    product_name=system_vendor.get("product_name", "Unavailable"),
+                    manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+                    virtual_host=system_vendor.get("virtual", False),
+                    disks_count=len(inventory.get("disks", [])),
                 )
             )

312-367: Add error handling in StorageDetailSignature.

Lines 323-324 directly parse JSON and access the disks array without guards. Missing or invalid inventory will crash the signature.

Apply this diff:

         hosts = []
         for host in cluster["hosts"]:
-            inventory = json.loads(host["inventory"])
-            disks = inventory["disks"]
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, ValueError):
+                inventory = {}
+            disks = inventory.get("disks", [])
log_analyzer/signatures/platform_specific.py (1)

22-56: Add JSON parse error handling in LibvirtRebootFlagSignature.

Line 31 calls json.loads without a try/except. If the inventory contains invalid JSON, the signature will crash instead of gracefully handling the error.

Apply this diff:

         hosts = []
         for host in cluster.get("hosts", []):
-            inventory = json.loads(host.get("inventory", "{}"))
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, ValueError):
+                inventory = {}
             if (
log_analyzer/signatures/advanced_analysis.py (3)

407-413: Use list default for pod conditions to prevent type issues.

Defaulting to {} leads to iterating dict keys and failures on condition.get.

Apply this diff:

@@
-                ready = [
+                ready = [
                     condition.get("status") == "True"
                     for condition in controller_pod.get("status", {}).get(
-                        "conditions", {}
+                        "conditions", []
                     )
                     if condition.get("type") == "Ready"
                 ][0]

176-193: Make FlappingValidations robust to unexpected message formats.

validation_name_regexp may not match; current code assumes a match and will raise.

Apply this diff:

@@
-                succeed_to_failing_counter = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.succeed_to_failing_regexp.match(event["message"])
-                )
+                succeed_to_failing_counter = Counter(
+                    m.groups()[0]
+                    for event in events
+                    if self.succeed_to_failing_regexp.match(event["message"])
+                    for m in [self.validation_name_regexp.match(event["message"])]
+                    if m is not None
+                )
@@
-                now_fixed = Counter(
-                    cast(
-                        re.Match[str],
-                        self.validation_name_regexp.match(event["message"]),
-                    ).groups()[0]
-                    for event in events
-                    if self.now_fixed_regexp.match(event["message"])
-                )
+                now_fixed = Counter(
+                    m.groups()[0]
+                    for event in events
+                    if self.now_fixed_regexp.match(event["message"])
+                    for m in [self.validation_name_regexp.match(event["message"])]
+                    if m is not None
+                )

133-136: Guard bootstrap progress access in MissingMustGatherLogs.

Direct indexing bootstrap_node["progress"]["current_stage"] can KeyError.

Apply this diff:

@@
-            if (
-                bootstrap_node["progress"]["current_stage"]
-                not in eligible_bootstrap_stages
-            ):
+            current_stage = bootstrap_node.get("progress", {}).get("current_stage", "")
+            if current_stage not in eligible_bootstrap_stages:
                 return None
log_analyzer/log_analyzer.py (2)

34-37: Cache all events separately; don’t conflate with “last attempt”.

Current get_all_cluster_events returns only the last partition; this breaks callers (e.g., EventsInstallationAttempts) and contradicts the docstring. Also add a dedicated cache for all events.

Apply this diff:

@@
-        self.logs_archive = logs_archive
-        self._metadata = None
-        self._cluster_events = None
+        self.logs_archive = logs_archive
+        self._metadata = None
+        self._all_events = None
+        self._cluster_events = None

And update get_all_cluster_events:

@@
-    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
+    def get_all_cluster_events(self) -> List[Dict[str, Any]]:
         """Get all the cluster installation events."""
-        if self._cluster_events is None:
+        if self._all_events is None:
             try:
                 events_content = self.logs_archive.get("cluster_events.json")
-                all_events = json.loads(cast(str | bytes, events_content))
-
-                # Get the last partition (latest installation attempt)
-                self._cluster_events = self.partition_cluster_events(all_events)[-1]
+                self._all_events = json.loads(cast(str | bytes, events_content))
             except Exception as e:
                 logger.error("Failed to load cluster events: %s", e)
-                self._cluster_events = []
-
-        return self._cluster_events
+                self._all_events = []
+        return self._all_events

55-75: Guard missing install_started_at in metadata cleaning.

accessing md["cluster"]["install_started_at"] will KeyError when absent; entire analysis fails.

Apply this diff:

@@
-        installation_start_time = dateutil.parser.isoparse(
-            md["cluster"]["install_started_at"]
-        )
+        installation_started_at = md.get("cluster", {}).get("install_started_at")
+        if not installation_started_at:
+            return md
+        installation_start_time = dateutil.parser.isoparse(installation_started_at)
@@
-        all_hosts = md["cluster"]["hosts"]
+        all_hosts = md["cluster"].get("hosts", [])
         md["cluster"]["deleted_hosts"] = [
             h for h in all_hosts if host_deleted_before_installation_started(h)
         ]
         md["cluster"]["hosts"] = [
             h for h in all_hosts if not host_deleted_before_installation_started(h)
         ]
log_analyzer/signatures/networking.py (1)

27-66: Add guards for hosts, machine_networks, and inventory JSON.

Indexing cluster["machine_networks"][0]["cidr"] and host["inventory"] can raise; handle gracefully.

Apply this diff:

@@
-            if not cluster.get("hosts"):
+            if not cluster.get("hosts"):
                 return None
             host = cluster["hosts"][0]
-            inventory = json.loads(host["inventory"])
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except json.JSONDecodeError:
+                return None
@@
-            # The first one is the machine_cidr that will be used for network configuration
-            machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"])
+            # The first entry is the machine_cidr used for network configuration
+            mn = cluster.get("machine_networks") or []
+            if not mn or not mn[0].get("cidr"):
+                return None
+            machine_cidr = ipaddress.ip_network(mn[0]["cidr"])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aaa72fd and aca3062.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/mypy.yaml (1 hunks)
  • .github/workflows/pydocstyle.yaml (0 hunks)
  • Dockerfile (1 hunks)
  • Makefile (1 hunks)
  • log_analyzer/__init__.py (1 hunks)
  • log_analyzer/log_analyzer.py (1 hunks)
  • log_analyzer/main.py (1 hunks)
  • log_analyzer/signatures/__init__.py (1 hunks)
  • log_analyzer/signatures/advanced_analysis.py (1 hunks)
  • log_analyzer/signatures/base.py (1 hunks)
  • log_analyzer/signatures/basic_info.py (1 hunks)
  • log_analyzer/signatures/error_detection.py (1 hunks)
  • log_analyzer/signatures/networking.py (1 hunks)
  • log_analyzer/signatures/performance.py (1 hunks)
  • log_analyzer/signatures/platform_specific.py (1 hunks)
  • pyproject.toml (2 hunks)
  • server.py (4 hunks)
  • service_client/assisted_service_api.py (2 hunks)
  • template.yaml (2 hunks)
💤 Files with no reviewable changes (1)
  • .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
  • log_analyzer/init.py
  • template.yaml
  • .github/workflows/mypy.yaml
  • Makefile
  • log_analyzer/signatures/base.py
  • log_analyzer/main.py
🧰 Additional context used
🧬 Code graph analysis (9)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
  • sanitize_exceptions (24-58)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (4)
  • metadata (39-53)
  • get_hostname (218-228)
  • get_host_log_file (134-165)
  • get_journal_log (167-197)
server.py (3)
log_analyzer/main.py (1)
  • analyze_cluster (14-71)
metrics/metrics.py (1)
  • track_tool_usage (51-65)
service_client/assisted_service_api.py (1)
  • InventoryClient (27-561)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • format_time (78-83)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_hostname (218-228)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
  • create_result (104-120)
log_analyzer/log_analyzer.py (4)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_events_by_host (126-132)
  • get_hostname (218-228)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
  • operator_statuses_from_controller_logs (17-44)
  • filter_operators (57-71)
  • analyze (80-108)
  • analyze (114-157)
  • analyze (169-222)
  • analyze (228-274)
  • analyze (280-299)
  • analyze (309-329)
  • analyze (342-378)
  • analyze (384-432)
  • analyze (443-461)
log_analyzer/signatures/base.py (5)
  • Signature (51-83)
  • ErrorSignature (86-120)
  • SignatureResult (15-48)
  • analyze (59-68)
  • create_result (104-120)
log_analyzer/log_analyzer.py (2)
  • metadata (39-53)
  • get_controller_logs (199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
  • Signature (51-83)
  • SignatureResult (15-48)
  • analyze (59-68)
  • generate_table (71-75)
log_analyzer/log_analyzer.py (8)
  • get_all_cluster_events (90-103)
  • partition_cluster_events (106-124)
  • get_last_install_cluster_events (77-88)
  • metadata (39-53)
  • get_must_gather (208-215)
  • get_events_by_host (126-132)
  • get_controller_logs (199-206)
  • get_host_log_file (134-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (2)
Dockerfile (1)

17-17: LGTM!

The addition correctly includes the log_analyzer directory in the Docker image, following the established pattern for other directories.

log_analyzer/signatures/__init__.py (1)

1-33: LGTM!

The signature collection logic correctly uses introspection to gather all Signature subclasses while filtering out base types. The deterministic sorting ensures consistent execution order.

Comment on lines +389 to +391
if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller":
return None
for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard bootstrap progress access in ControllerFailedToStart.

Same risk here; avoid KeyError/TypeError on missing progress/current_stage.

Apply this diff:

@@
-        if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller":
+        progress_stage = bootstrap[0].get("progress", {}).get("current_stage", "")
+        if progress_stage != "Waiting for controller":
             return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller":
return None
for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH):
progress_stage = bootstrap[0].get("progress", {}).get("current_stage", "")
if progress_stage != "Waiting for controller":
return None
for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH):
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 389 to 391, the
code directly indexes bootstrap[0]["progress"]["current_stage"] which can raise
KeyError/TypeError if bootstrap is empty or keys are missing; update the guard
to first verify bootstrap is a non-empty sequence and bootstrap[0] is a dict,
then obtain progress = bootstrap[0].get("progress") and current_stage =
progress.get("current_stage") if isinstance(progress, dict) (or use
bootstrap[0].get("progress", {}) and .get("current_stage")) and only compare
current_stage to "Waiting for controller"; if any check fails or current_stage
is not the expected string, return None to avoid exceptions.

Comment on lines +20 to +50
def analyze(self, log_analyzer) -> Optional[SignatureResult]:
"""Analyze host status."""
try:
metadata = log_analyzer.metadata
cluster = metadata["cluster"]

hosts = []
for host in cluster["hosts"]:
info = host["status_info"]
role = host["role"]
inventory = json.loads(host["inventory"])
if host.get("bootstrap", False):
role = "bootstrap"

hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
status_info=str(info),
logs_info=host.get("logs_info", ""),
last_checked_in_at=self.format_time(
host.get("checked_in_at", str(datetime.min))
),
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing keys and invalid JSON in host status parsing.

Lines 30, 38-40 directly parse JSON and access nested dictionaries without error handling. This will crash the signature on malformed inventory or missing keys.

Apply this diff:

         hosts = []
         for host in cluster["hosts"]:
             info = host["status_info"]
-            role = host["role"]
-            inventory = json.loads(host["inventory"])
+            role = host.get("role")
+            try:
+                inventory = json.loads(host.get("inventory", "{}"))
+            except (json.JSONDecodeError, ValueError):
+                inventory = {}
             if host.get("bootstrap", False):
                 role = "bootstrap"

             hosts.append(
                 OrderedDict(
                     id=host["id"],
                     hostname=log_analyzer.get_hostname(host),
-                    progress=host["progress"]["current_stage"],
-                    status=host["status"],
+                    progress=host.get("progress", {}).get("current_stage", "N/A"),
+                    status=host.get("status", "N/A"),
                     role=role,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def analyze(self, log_analyzer) -> Optional[SignatureResult]:
"""Analyze host status."""
try:
metadata = log_analyzer.metadata
cluster = metadata["cluster"]
hosts = []
for host in cluster["hosts"]:
info = host["status_info"]
role = host["role"]
inventory = json.loads(host["inventory"])
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
status_info=str(info),
logs_info=host.get("logs_info", ""),
last_checked_in_at=self.format_time(
host.get("checked_in_at", str(datetime.min))
),
)
)
def analyze(self, log_analyzer) -> Optional[SignatureResult]:
"""Analyze host status."""
try:
metadata = log_analyzer.metadata
cluster = metadata["cluster"]
hosts = []
for host in cluster["hosts"]:
info = host["status_info"]
- role = host["role"]
role = host.get("role")
try:
inventory = json.loads(host.get("inventory", "{}"))
except (json.JSONDecodeError, ValueError):
inventory = {}
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
- progress=host["progress"]["current_stage"],
progress=host.get("progress", {}).get("current_stage", "N/A"),
status=host.get("status", "N/A"),
role=role,
boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
status_info=str(info),
logs_info=host.get("logs_info", ""),
last_checked_in_at=self.format_time(
host.get("checked_in_at", str(datetime.min))
),
)
)
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 20 to 50, the host parsing
assumes keys exist and json.loads always succeeds; wrap inventory parsing in a
try/except and default to {} on JSONDecodeError or non-string input, replace
direct indexing like cluster["hosts"], host["status_info"], host["role"],
host["inventory"], host["id"], host["progress"]["current_stage"],
host["status"], host.get("logs_info") with safe .get calls and sensible defaults
(e.g., empty list/dict, "N/A", or datetime.min), and protect nested lookups
(progress/current_stage, inventory.get("boot") etc.) by using .get with defaults
so malformed or missing fields don't raise exceptions; ensure role is determined
via host.get("role") before overriding with bootstrap, and last_checked_in_at
uses the safe fallback when formatting.

Comment on lines +67 to +73
def _get_inventory(self, log_analyzer, host_id):
try:
agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
except FileNotFoundError:
return None
match = self.agent_log_regex.search(agent_logs)
return json.loads(match.group(1).replace('\\"', '"')) if match else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard JSON parsing in _get_inventory helper.

Line 73 parses JSON from a regex match without error handling. Invalid JSON will cause the method to crash instead of returning None.

Apply this diff:

     def _get_inventory(self, log_analyzer, host_id):
         try:
             agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
         except FileNotFoundError:
             return None
         match = self.agent_log_regex.search(agent_logs)
-        return json.loads(match.group(1).replace('\\"', '"')) if match else None
+        if not match:
+            return None
+        try:
+            return json.loads(match.group(1).replace('\\"', '"'))
+        except (json.JSONDecodeError, ValueError):
+            return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_inventory(self, log_analyzer, host_id):
try:
agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
except FileNotFoundError:
return None
match = self.agent_log_regex.search(agent_logs)
return json.loads(match.group(1).replace('\\"', '"')) if match else None
def _get_inventory(self, log_analyzer, host_id):
try:
agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
except FileNotFoundError:
return None
match = self.agent_log_regex.search(agent_logs)
if not match:
return None
try:
return json.loads(match.group(1).replace('\\"', '"'))
except (json.JSONDecodeError, ValueError):
return None
🤖 Prompt for AI Agents
In log_analyzer/signatures/platform_specific.py around lines 67 to 73, the
helper _get_inventory calls json.loads on the regex match content without
guarding for invalid JSON; update the function to catch JSONDecodeError (and
ValueError for older Python) around the json.loads call and return None if
parsing fails, so the method doesn't crash on malformed JSON; ensure the
try/except only wraps the parsing step (not the file read) and still returns
None when no regex match is found.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Oct 14, 2025

@keitwb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/eval-test aca3062 link false /test eval-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@carbonin
Copy link
Copy Markdown
Collaborator

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
@openshift-merge-bot openshift-merge-bot Bot merged commit f25ad94 into openshift-assisted:master Oct 14, 2025
14 of 15 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants