Add log-analyzer to repo#120
Add log-analyzer to repo#120openshift-merge-bot[bot] merged 1 commit intoopenshift-assisted:masterfrom
Conversation
WalkthroughIntroduces a log analysis subsystem (LogAnalyzer, signature framework, many signatures), CLI/server entrypoints to run analyses, InventoryClient support to download cluster logs, packaging and container updates, CI/mypy config moved to pyproject.toml, and a feature gate to enable troubleshooting tools. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Main as log_analyzer.main
participant API as InventoryClient
participant NA as RemoteNestedArchive
participant LA as LogAnalyzer
participant SIG as Signatures (ALL_SIGNATURES)
User->>Main: analyze_cluster(cluster_id, api_client)
Main->>API: get_cluster_logs(cluster_id)
API->>NA: RemoteNestedArchive(presigned_url, init_download=True)
API-->>Main: RemoteNestedArchive
Main->>LA: LogAnalyzer(archive)
loop for each signature
Main->>SIG: instantiate & analyze(la)
SIG->>LA: request logs/metadata
LA-->>SIG: logs / metadata / events
SIG-->>Main: Optional SignatureResult
end
Main-->>User: List[SignatureResult]
sequenceDiagram
autonumber
actor Operator
participant Server
participant Tool as analyze_cluster_logs()
participant Main as analyze_cluster()
participant API as InventoryClient
Note over Server: registered only if ENABLE_TROUBLESHOOTING_TOOLS=1
Operator->>Server: invoke analyze_cluster_logs(cluster_id)
Server->>Tool: analyze_cluster_logs(cluster_id)
Tool->>Main: analyze_cluster(cluster_id, api_client)
Main->>API: get_cluster_logs(cluster_id)
API-->>Main: RemoteNestedArchive
Main-->>Tool: formatted results
Tool-->>Operator: analysis text
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (12)
pyproject.toml (1)
81-89: Consider incremental type-checking for log_analyzer.Excluding the entire
log_analyzer/package from type checking creates a maintenance burden and reduces code quality. While the PR description notes "too many errors," a better approach would be to:
- Enable type checking with less strict settings initially (e.g., only
disallow_untyped_defs = false)- Add
# type: ignorecomments for specific problematic lines- Incrementally address type errors over time
This prevents the excluded code from accumulating more type issues.
Consider this incremental approach:
[tool.mypy] -exclude = "log_analyzer/" follow_imports = "skip" explicit_package_bases = true disallow_untyped_calls = true disallow_untyped_defs = true disallow_incomplete_defs = true ignore_missing_imports = true disable_error_code = ["attr-defined"] + +# Less strict settings for log_analyzer during migration +[[tool.mypy.overrides]] +module = "log_analyzer.*" +disallow_untyped_defs = false +disallow_incomplete_defs = falselog_analyzer/signatures/__init__.py (2)
9-14: Consider explicit imports instead of wildcards.While wildcard imports with
# noqaare acceptable in package__init__.pyfiles for re-exporting, they can:
- Hide unused imports and make it harder to track which symbols are actually used
- Make static analysis tools less effective
- Import unexpected symbols if signature modules export additional items
Consider using explicit imports:
-from .basic_info import * # noqa -from .error_detection import * # noqa -from .performance import * # noqa -from .networking import * # noqa -from .advanced_analysis import * # noqa -from .platform_specific import * # noqa +from . import basic_info +from . import error_detection +from . import performance +from . import networking +from . import advanced_analysis +from . import platform_specificThen update the discovery loop to scan these module objects instead of the current module.
16-31: Dynamic signature discovery is clever but has tradeoffs.The dynamic discovery of
Signaturesubclasses usinginspectis flexible and automatically includes new signatures. However:Pros:
- Automatically includes new signature classes without manual registration
- Clean and minimal code
Cons:
- Less explicit; developers must understand the magic
- Harder to debug if a signature doesn't appear in
ALL_SIGNATURES- Can't easily control ordering beyond alphabetical
Consider whether an explicit registry would be clearer for maintainability:
# Alternative explicit approach from .basic_info import BasicInfoSignature # example from .error_detection import ErrorDetectionSignature # example # ... more imports ALL_SIGNATURES = [ BasicInfoSignature, ErrorDetectionSignature, # ... explicit list ]This makes it immediately clear which signatures are included and allows custom ordering.
log_analyzer/main.py (1)
16-23: Preferlogger.exceptionin exception handlers to include stack traces.Improves debuggability for signature failures and overall analysis errors.
Apply this diff:
def setup_logging(verbose: bool = False) -> None: - """Set up colored logging.""" + """Set up basic logging.""" @@ - except Exception as e: - logger.error( - "Error running signature %s: %s", signature_class.__name__, e - ) + except Exception: + logger.exception("Error running signature %s", signature_class.__name__) @@ - except Exception as e: - logger.error("Error analyzing cluster %s: %s", cluster_id, e) + except Exception: + logger.exception("Error analyzing cluster %s", cluster_id) raise @@ - except Exception as e: - logging.getLogger(__name__).error("Analysis failed: %s", e) + except Exception: + logging.getLogger(__name__).exception("Analysis failed") return 1Also applies to: 75-79, 82-85, 167-169
log_analyzer/signatures/platform_specific.py (1)
62-74: Harden parsing (regex, JSON, IPs) and journal decompression.Prevents false negatives and crashes on common log shapes/corruption.
Apply this diff:
- agent_log_regex = re.compile( - r"Sending step <inventory-[0-9a-f]{8}> reply output <(.+)> error <.*> exit-code <0>" - ) + agent_log_regex = re.compile( + r"Sending step <inventory-[0-9a-f]{8}> reply output <(.+?)> error <.*> exit-code <0>", + re.DOTALL, + ) @@ - def _get_inventory(self, log_analyzer, host_id): + def _get_inventory(self, log_analyzer, host_id): try: agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs") except FileNotFoundError: return None - match = self.agent_log_regex.search(agent_logs) - return json.loads(match.group(1).replace('\\"', '"')) if match else None + match = self.agent_log_regex.search(agent_logs) + if not match: + return None + try: + return json.loads(match.group(1).replace('\\"', '"')) + except json.JSONDecodeError: + return None @@ - def _get_address_map(self, inventory): + def _get_address_map(self, inventory): address_map = {} - interfaces = inventory.get("interfaces") - if interfaces: - for interface in interfaces: - for key in ["ipv4_addresses", "ipv6_addresses"]: - addresses = interface.get(key) - if addresses: - for addr in addresses: - intf = ipaddress.ip_interface(addr) - address_map[str(intf.ip)] = str(intf.network) + interfaces = inventory.get("interfaces") or [] + for interface in interfaces: + for key in ["ipv4_addresses", "ipv6_addresses"]: + addresses = interface.get(key) or [] + for addr in addresses: + try: + intf = ipaddress.ip_interface(addr) + except ValueError: + continue + address_map[str(intf.ip)] = str(intf.network) return address_map @@ - except FileNotFoundError: + except (FileNotFoundError, OSError, gzip.BadGzipFile): continueAlso applies to: 75-86, 101-109
log_analyzer/signatures/basic_info.py (2)
221-227: Avoid KeyError for checked_in_atchecked_in_at may be absent; use get with default.
- last_contacted=self.format_time(host["checked_in_at"]), + last_contacted=self.format_time(host.get("checked_in_at", "")),
298-306: Prefer plain strings over json.dumps for interface fieldsjson.dumps adds quotes/brackets; format readable strings instead.
- interfaces_details["mac_address"].append( - json.dumps(interface.get("mac_address")) - ) - interfaces_details["ipv4_addresses"].append( - json.dumps(interface.get("ipv4_addresses", [])) - ) - interfaces_details["ipv6_addresses"].append( - json.dumps(interface.get("ipv6_addresses", [])) - ) + interfaces_details["mac_address"].append( + str(interface.get("mac_address", "")) + ) + interfaces_details["ipv4_addresses"].append( + ", ".join(interface.get("ipv4_addresses", [])) + ) + interfaces_details["ipv6_addresses"].append( + ", ".join(interface.get("ipv6_addresses", [])) + )log_analyzer/signatures/error_detection.py (2)
153-172: Also scan kube-apiserver-*.log variant (path parity with networking signature)Covers both bootstrap-control-plane/kube-apiserver.log and kube-apiserver-*.log.
- new_logs_path = f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log" - old_logs_path = f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log" - for path in (new_logs_path, old_logs_path): + paths = ( + f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log", + f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/bootstrap-control-plane/kube-apiserver.log", + f"{NEW_LOG_BUNDLE_PATH}/bootstrap/containers/kube-apiserver-*.log", + f"{OLD_LOG_BUNDLE_PATH}/bootstrap/containers/kube-apiserver-*.log", + ) + for path in paths:
23-31: Remove unused helper_search_patterns_in_stringIt’s defined but never used; delete to clean up dead code.log_analyzer/signatures/performance.py (1)
246-250: Avoid double regex work per eventCompute duration once in the generator.
- return ( - (event, get_duration(event)) - for event in events - if get_duration(event) is not None - ) + return ( + (event, duration) + for event in events + if (duration := get_duration(event)) is not None + )log_analyzer/signatures/advanced_analysis.py (2)
242-249: Make node condition formatting resilient to missing keysAvoid KeyError when reason/message/status absent.
- def get_by_type(t, node): + def get_by_type(t, node): conds = node.get("status", {}).get("conditions", []) c = next((c for c in conds if c.get("type") == t), None) if not c: return "(Condition not found)" - return f"Status {c['status']} with reason {c['reason']}, message {c['message']}" + return ( + f"Status {c.get('status', '?')} " + f"with reason {c.get('reason', '?')}, " + f"message {c.get('message', '')}" + )
407-413: Default conditions to list, not dictPrevents accidental iteration over dict keys.
- ready = [ + ready = [ condition.get("status") == "True" - for condition in controller_pod.get("status", {}).get( - "conditions", {} - ) + for condition in controller_pod.get("status", {}).get( + "conditions", [] + ) if condition.get("type") == "Ready" ][0]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/api_client.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🧰 Additional context used
🧬 Code graph analysis (11)
log_analyzer/__init__.py (3)
log_analyzer/api_client.py (1)
AssistedInstallerAPIClient(14-78)log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_controller_logs(199-206)get_host_log_file(134-165)
server.py (2)
log_analyzer/main.py (2)
main(101-169)analyze_cluster(26-84)metrics/metrics.py (1)
track_tool_usage(51-65)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)generate_table(71-75)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_host_log_file(134-165)get_journal_log(167-197)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
log_analyzer/main.py (1)
log_analyzer/api_client.py (2)
AssistedInstallerAPIClient(14-78)download_logs(58-78)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/basic_info.py (6)
analyze(20-81)analyze(125-162)analyze(168-205)analyze(211-249)analyze(255-286)analyze(314-366)log_analyzer/signatures/error_detection.py (1)
analyze(37-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (12)
Dockerfile (1)
17-17: LGTM!The addition of the log_analyzer directory to the Docker image is correctly placed and consistent with the other package COPY statements.
pyproject.toml (2)
18-19: LGTM!The new dependencies
nestedarchiveandtabulateare appropriate for the log analyzer functionality. Based on learnings, these are stable packages suitable for processing nested archives and formatting tabular output.
25-25: No changes required. mypy 1.18.2 is the current release and has no known security advisories.template.yaml (1)
46-48: LGTM!The new
ENABLE_TROUBLESHOOTING_TOOLSparameter and corresponding environment variable are correctly configured. The default value of "true" enables the feature by default, which is appropriate for a troubleshooting capability.Also applies to: 107-108
.github/workflows/mypy.yaml (1)
22-22: LGTM!Migrating to config-file-based mypy invocation is a best practice that centralizes type-checking configuration and ensures consistency between CI and local development.
Makefile (3)
58-58: LGTM!Aligning the
check-typestarget with the centralized mypy configuration inpyproject.tomlensures consistency across CI and local development environments.
60-60: Note: Removed docstyle from verify target.The PR description indicates that pydocstyle was disabled as "very low value." While this is a reasonable decision for some projects, docstrings do provide value for API documentation and maintainability. Consider whether selective docstring enforcement (e.g., only for public APIs) might be a middle ground.
64-64: LGTM!Adding
ruff check --fixto the format target is a helpful improvement that automatically fixes linting issues during the formatting pass.server.py (2)
41-43: LGTM!The environment variable handling for
ENABLE_TROUBLESHOOTING_TOOLSis correctly implemented with a sensible default and proper boolean conversion.
1049-1051: LGTM!The conditional registration of the troubleshooting tool is correctly implemented and aligns with the environment-based feature flag approach.
log_analyzer/__init__.py (1)
1-17: LGTM!The package API is well-defined with appropriate exports. The use of
__all__ensures a clean public interface, and the version follows semantic versioning conventions.log_analyzer/signatures/advanced_analysis.py (1)
80-87: get_all_cluster_events returns the full event list; partition_cluster_events will correctly separate multiple installation attempts.
| def __init__( | ||
| self, | ||
| base_url: str = "https://api.openshift.com", | ||
| auth_token: Optional[str] = None, | ||
| ): |
There was a problem hiding this comment.
Add explicit HTTP timeouts and validate presigned URL field.
Requests without timeouts can hang indefinitely; also guard against missing 'url' in the response.
Apply this diff (uses a sane default timeout):
class AssistedInstallerAPIClient:
@@
- def __init__(
+ def __init__(
self,
base_url: str = "https://api.openshift.com",
auth_token: Optional[str] = None,
- ):
+ timeout: float | tuple[float, float] = (10.0, 120.0),
+ ):
@@
self.base_url = base_url.rstrip("/")
self.auth_token = auth_token
self.session = requests.Session()
+ self.timeout = timeout
@@
- response = self.session.get(url, params=params)
+ response = self.session.get(url, params=params, timeout=self.timeout)
response.raise_for_status()
@@
- download_info = self.get_logs_download_url(cluster_id)
- logs_url = download_info["url"]
+ download_info = self.get_logs_download_url(cluster_id)
+ logs_url = download_info.get("url")
+ if not logs_url:
+ raise ValueError("Presigned URL response missing 'url' field")Based on learnings: always set explicit timeouts with requests for reliability.
Also applies to: 29-35, 53-55, 71-78
| def analyze_cluster( | ||
| cluster_id: str, | ||
| auth_token: Optional[str] = None, | ||
| specific_signatures: Optional[List[str]] = None, | ||
| ) -> List[SignatureResult]: | ||
| """ |
There was a problem hiding this comment.
Fix: --api-url is parsed but ignored; pass it through to the API client.
Currently, AssistedInstallerAPIClient is always constructed with the default base URL, making --api-url ineffective. Wire the value through analyze_cluster and into the client.
Apply this diff:
def analyze_cluster(
cluster_id: str,
auth_token: Optional[str] = None,
- specific_signatures: Optional[List[str]] = None,
+ specific_signatures: Optional[List[str]] = None,
+ api_url: str = "https://api.openshift.com",
) -> List[SignatureResult]:
@@
- Args:
+ Args:
cluster_id: UUID of the cluster to analyze
auth_token: Authentication token for API access
specific_signatures: List of specific signature names to run (None for all)
+ api_url: Base URL for the OpenShift API
@@
- api_client = AssistedInstallerAPIClient(auth_token=auth_token)
+ api_client = AssistedInstallerAPIClient(base_url=api_url, auth_token=auth_token)
@@
- results = analyze_cluster(
+ results = analyze_cluster(
cluster_id=args.cluster_id,
auth_token=args.auth_token,
specific_signatures=args.signatures,
+ api_url=args.api_url,
)Also applies to: 46-47, 116-121, 151-157
🤖 Prompt for AI Agents
In log_analyzer/main.py around lines 26-31 (and also adjust lines 46-47,
116-121, 151-157), the --api-url CLI flag is parsed but never forwarded to
AssistedInstallerAPIClient; update analyze_cluster to accept an optional
api_url: Optional[str] parameter, pass that api_url into the
AssistedInstallerAPIClient constructor inside analyze_cluster, and propagate the
new parameter through all callers (update the CLI invocation and any other calls
at the indicated line ranges) so the client is constructed with the provided
base URL instead of the default.
carbonin
left a comment
There was a problem hiding this comment.
I'm fine with the approach. We can try to address the linting stuff in a followup. Mainly we should address the coderabbit comments that affect actual functionality (of which there are a few) and ensure the API url matches between where the MCP server is pointing to and the one in the analyzer tool.
|
|
||
| header = f"=== {self.title} ===" | ||
| if self.severity == "error": | ||
| header = f"🔴 {header}" |
There was a problem hiding this comment.
Are these icons useful for the model? Maybe it would be better to add the severity as text?
b1723fd to
50f40ee
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Dockerfile (1)
1-1: Python version mismatch will break builds (image uses 3.11; pyproject requires >=3.13).Container is based on Python 3.11 but pyproject.toml requires >=3.13. uv sync/install will likely fail. Align versions.
Fix option A (recommended): lower project requirement to 3.11+
- requires-python = ">=3.13" + requires-python = ">=3.11"pyproject.toml (1)
6-6: Align Python requirement with runtime (Dockerfile is Python 3.11).Set requires-python to >=3.11 to match the base image and avoid uv sync failures.
-requires-python = ">=3.13" +requires-python = ">=3.11"
♻️ Duplicate comments (2)
log_analyzer/log_analyzer.py (1)
55-75: Fix event loading semantics and guard missing install_started_at.
- _clean_metadata_json should handle missing cluster.install_started_at safely.
- get_all_cluster_events must return all events; selecting last partition belongs only in get_last_install_cluster_events.
Apply:
@staticmethod def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + installation_started_at = md.get("cluster", {}).get("install_started_at") + if not installation_started_at: + return md + installation_start_time = dateutil.parser.isoparse(installation_started_at) @@ - def get_last_install_cluster_events(self) -> List[Dict[str, Any]]: + def get_last_install_cluster_events(self) -> List[Dict[str, Any]]: """Get the cluster installation events for the most recent attempt.""" try: - all_events = self.get_all_cluster_events() - - # Get the last partition (latest installation attempt) - events = self.partition_cluster_events(all_events)[-1] + all_events = self.get_all_cluster_events() + events = self.partition_cluster_events(all_events)[-1] except Exception as e: logger.error("Failed to load cluster events: %s", e) return [] @@ - def get_all_cluster_events(self) -> List[Dict[str, Any]]: + def get_all_cluster_events(self) -> List[Dict[str, Any]]: """Get all the cluster installation events.""" - if self._cluster_events is None: + if getattr(self, "_all_events", None) is None: try: events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) - - # Get the last partition (latest installation attempt) - self._cluster_events = self.partition_cluster_events(all_events)[-1] + self._all_events = json.loads(cast(str | bytes, events_content)) except Exception as e: logger.error("Failed to load cluster events: %s", e) - self._cluster_events = [] - - return self._cluster_events + self._all_events = [] + return self._all_events # type: ignore[attr-defined]Also applies to: 77-104
log_analyzer/signatures/networking.py (1)
41-51: Guard machine_networks and skip invalid gateways (prevents spurious failures)Indexing cluster["machine_networks"][0]["cidr"] can raise on missing/empty data. Also, parsing invalid gateway IPs should not abort the analysis.
- # The first one is the machine_cidr that will be used for network configuration - machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"]) + # The first entry (if present) is used for network configuration + mn = cluster.get("machine_networks") or [] + if not mn or not mn[0].get("cidr"): + return None + machine_cidr = ipaddress.ip_network(mn[0]["cidr"]) - for route in inventory.get("routes", []): + for route in inventory.get("routes", []): # currently only relevant for ipv4 - if ( - route.get("destination") == "0.0.0.0" - and route.get("gateway") - and ipaddress.ip_address(route["gateway"]) in machine_cidr - ): - return None + try: + if ( + route.get("destination") == "0.0.0.0" + and route.get("gateway") + and ipaddress.ip_address(route["gateway"]) in machine_cidr + ): + return None + except ValueError: + # Ignore invalid gateway values + continue
🧹 Nitpick comments (4)
log_analyzer/signatures/basic_info.py (2)
299-307: Avoid json.dumps in interface fields; produces quoted strings in outputjson.dumps adds quotes and brackets. Use plain strings and join lists for readability.
- interfaces_details["mac_address"].append( - json.dumps(interface.get("mac_address")) - ) - interfaces_details["ipv4_addresses"].append( - json.dumps(interface.get("ipv4_addresses", [])) - ) - interfaces_details["ipv6_addresses"].append( - json.dumps(interface.get("ipv6_addresses", [])) - ) + interfaces_details["mac_address"].append(interface.get("mac_address", "") or "") + interfaces_details["ipv4_addresses"].append( + ", ".join(interface.get("ipv4_addresses", [])) + ) + interfaces_details["ipv6_addresses"].append( + ", ".join(interface.get("ipv6_addresses", [])) + )
181-190: Prefer “N/A” over empty strings for missing timestampsEmpty strings render poorly. Use “N/A” defaults; format_time already returns the input if parsing fails.
- "Installation Started At": self.format_time( - cluster.get("install_started_at", "") - ), - "Failed On": self.format_time(cluster.get("status_updated_at", "")), + "Installation Started At": self.format_time( + cluster.get("install_started_at", "N/A") + ), + "Failed On": self.format_time(cluster.get("status_updated_at", "N/A")),log_analyzer/signatures/networking.py (1)
106-112: Tighten VIP matching to avoid substring false positivesUsing if vip in ip_addr can match substrings (e.g., 10.0.0.1 in 110.0.0.10). Prefer IP-boundary-aware regex or parse addresses from ip-addr output.
Option A (quick): for IPv4, use word boundaries; for IPv6, use hex/colon boundaries.
import ipaddress, re def vip_in_text(vip: str, text: str) -> bool: try: ip = ipaddress.ip_address(vip) if isinstance(ip, ipaddress.IPv4Address): pat = re.compile(rf"(?<!\d){re.escape(vip)}(?!\d)") else: pat = re.compile(rf"(?<![0-9a-fA-F:]){re.escape(vip)}(?![0-9a-fA-F:])") return bool(pat.search(text)) except ValueError: return vip in text # fallbackReplace checks:
- if vip in ip_addr -> if vip_in_text(vip, ip_addr)
- Same for bootstrap and OLD paths.
Please verify against a few real ip-addr samples to confirm no regressions.
Also applies to: 123-131, 137-141, 147-151
log_analyzer/signatures/performance.py (1)
246-250: Avoid double computation in _get_fio_eventsget_duration(event) is called twice per event. Compute once and reuse.
- return ( - (event, get_duration(event)) - for event in events - if get_duration(event) is not None - ) + def gen(): + for event in events: + duration = get_duration(event) + if duration is not None: + yield event, duration + return gen()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (7)
- log_analyzer/signatures/init.py
- .github/workflows/mypy.yaml
- log_analyzer/signatures/base.py
- log_analyzer/signatures/platform_specific.py
- log_analyzer/signatures/error_detection.py
- template.yaml
- server.py
🧰 Additional context used
🧬 Code graph analysis (7)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
sanitize_exceptions(24-58)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
get_cluster_logs(161-174)log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (2)
SignatureResult(15-48)analyze(59-68)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (3)
Dockerfile (1)
17-17: LGTM: include log_analyzer in image.COPY of the new package is correct.
log_analyzer/__init__.py (1)
7-15: Public API surface looks good.Re-exports and all are precise and useful.
Makefile (1)
57-61: Type-check invocation via config looks good.Using --config-file aligns with pyproject settings; verify target order is sensible.
| inventory = json.loads(host["inventory"]) | ||
| if host.get("bootstrap", False): | ||
| role = "bootstrap" | ||
|
|
||
| hosts.append( | ||
| OrderedDict( | ||
| id=host["id"], | ||
| hostname=log_analyzer.get_hostname(host), | ||
| progress=host["progress"]["current_stage"], | ||
| status=host["status"], | ||
| role=role, | ||
| boot_mode=inventory.get("boot", {}).get( | ||
| "current_boot_mode", "N/A" | ||
| ), |
There was a problem hiding this comment.
Guard JSON decode and missing keys to avoid aborting the whole signature
A single bad/missing inventory JSON will throw and cause the signature to return None. Parse defensively and default to {} so the table can still render for other hosts.
- inventory = json.loads(host["inventory"])
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except json.JSONDecodeError:
+ inventory = {}
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
progress=host["progress"]["current_stage"],
status=host["status"],
role=role,
- boot_mode=inventory.get("boot", {}).get(
+ boot_mode=inventory.get("boot", {}).get(
"current_boot_mode", "N/A"
),
status_info=str(info),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inventory = json.loads(host["inventory"]) | |
| if host.get("bootstrap", False): | |
| role = "bootstrap" | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=log_analyzer.get_hostname(host), | |
| progress=host["progress"]["current_stage"], | |
| status=host["status"], | |
| role=role, | |
| boot_mode=inventory.get("boot", {}).get( | |
| "current_boot_mode", "N/A" | |
| ), | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except json.JSONDecodeError: | |
| inventory = {} | |
| if host.get("bootstrap", False): | |
| role = "bootstrap" | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=log_analyzer.get_hostname(host), | |
| progress=host["progress"]["current_stage"], | |
| status=host["status"], | |
| role=role, | |
| boot_mode=inventory.get("boot", {}).get( | |
| "current_boot_mode", "N/A" | |
| ), | |
| status_info=str(info), |
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 30 to 43, the code calls
json.loads(host["inventory"]) and indexes host keys directly which will raise on
missing or invalid inventory and abort the signature; wrap the inventory parse
in a try/except (catch json.JSONDecodeError and ValueError) and default
inventory = {} on failure (or use host.get("inventory", "{}") as the input), and
replace direct indexing of host keys with .get(..., default) or safe nested
.get(...) calls (e.g., progress = host.get("progress", {}).get("current_stage",
"N/A"), status = host.get("status", "N/A"), and boot_mode =
inventory.get("boot", {}).get("current_boot_mode", "N/A")); ensure role has a
sensible default when host.get("bootstrap") is missing so a bad inventory does
not cause the whole signature to return None.
| inventory = json.loads(host["inventory"]) | ||
| hosts.append( | ||
| OrderedDict( | ||
| id=host["id"], | ||
| hostname=inventory["hostname"], | ||
| requested_hostname=host.get("requested_hostname", "N/A"), | ||
| last_contacted=self.format_time(host["checked_in_at"]), | ||
| installation_disk=host.get("installation_disk_path", "N/A"), | ||
| product_name=inventory["system_vendor"].get( | ||
| "product_name", "Unavailable" | ||
| ), | ||
| manufacturer=inventory["system_vendor"].get( | ||
| "manufacturer", "Unavailable" | ||
| ), | ||
| virtual_host=inventory["system_vendor"].get("virtual", False), | ||
| disks_count=len(inventory["disks"]), | ||
| ) |
There was a problem hiding this comment.
Harden host extra details: robust inventory parsing and safe lookups
Direct indexing (inventory["hostname"], inventory["system_vendor"], host["checked_in_at"], inventory["disks"]) can KeyError and drop the entire signature.
- inventory = json.loads(host["inventory"])
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except json.JSONDecodeError:
+ inventory = {}
+ system_vendor = inventory.get("system_vendor", {})
hosts.append(
OrderedDict(
id=host["id"],
- hostname=inventory["hostname"],
+ hostname=log_analyzer.get_hostname(host),
requested_hostname=host.get("requested_hostname", "N/A"),
- last_contacted=self.format_time(host["checked_in_at"]),
+ last_contacted=self.format_time(host.get("checked_in_at", "")),
installation_disk=host.get("installation_disk_path", "N/A"),
- product_name=inventory["system_vendor"].get(
- "product_name", "Unavailable"
- ),
- manufacturer=inventory["system_vendor"].get(
- "manufacturer", "Unavailable"
- ),
- virtual_host=inventory["system_vendor"].get("virtual", False),
- disks_count=len(inventory["disks"]),
+ product_name=system_vendor.get("product_name", "Unavailable"),
+ manufacturer=system_vendor.get("manufacturer", "Unavailable"),
+ virtual_host=system_vendor.get("virtual", False),
+ disks_count=len(inventory.get("disks", [])),
)
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inventory = json.loads(host["inventory"]) | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=inventory["hostname"], | |
| requested_hostname=host.get("requested_hostname", "N/A"), | |
| last_contacted=self.format_time(host["checked_in_at"]), | |
| installation_disk=host.get("installation_disk_path", "N/A"), | |
| product_name=inventory["system_vendor"].get( | |
| "product_name", "Unavailable" | |
| ), | |
| manufacturer=inventory["system_vendor"].get( | |
| "manufacturer", "Unavailable" | |
| ), | |
| virtual_host=inventory["system_vendor"].get("virtual", False), | |
| disks_count=len(inventory["disks"]), | |
| ) | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except json.JSONDecodeError: | |
| inventory = {} | |
| system_vendor = inventory.get("system_vendor", {}) | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=log_analyzer.get_hostname(host), | |
| requested_hostname=host.get("requested_hostname", "N/A"), | |
| last_contacted=self.format_time(host.get("checked_in_at", "")), | |
| installation_disk=host.get("installation_disk_path", "N/A"), | |
| product_name=system_vendor.get("product_name", "Unavailable"), | |
| manufacturer=system_vendor.get("manufacturer", "Unavailable"), | |
| virtual_host=system_vendor.get("virtual", False), | |
| disks_count=len(inventory.get("disks", [])), | |
| ) | |
| ) |
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 220 to 236, the code
assumes inventory and several nested keys always exist which can raise
KeyError/TypeError and abort the signature; instead, wrap
json.loads(host.get("inventory", "{}")) in a try/except (fall back to {} on
parse error), use .get with sensible defaults for top-level lookups (hostname,
system_vendor, disks) and for nested system_vendor fields (product_name,
manufacturer, virtual), verify types (e.g., ensure disks is a list before taking
len, else 0), and guard host["checked_in_at"] via host.get(...) so format_time
is only called when a value exists (fallback like "N/A"); update the OrderedDict
fields to use these safe lookups and defaults so missing or malformed inventory
data cannot raise exceptions.
| inventory = json.loads(host["inventory"]) | ||
| disks = inventory["disks"] | ||
|
|
||
| disks_details = { | ||
| "type": [], | ||
| "bootable": [], | ||
| "name": [], | ||
| "path": [], | ||
| "by-path": [], | ||
| } | ||
|
|
||
| for d in disks: | ||
| disk_type = d.get("drive_type", "Not available") | ||
| disks_details["type"].append(disk_type) | ||
| disks_details["bootable"].append(str(d.get("bootable", False))) | ||
| disks_details["name"].append(d.get("name", "Not available")) | ||
| disks_details["path"].append(d.get("path", "Not available")) | ||
| disks_details["by-path"].append(d.get("by_path", "Not available")) | ||
|
|
There was a problem hiding this comment.
Guard missing/invalid inventory and disks list to prevent KeyError
Directly indexing inventory["disks"] can fail; parse safely and default to [].
- inventory = json.loads(host["inventory"])
- disks = inventory["disks"]
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except json.JSONDecodeError:
+ inventory = {}
+ disks = inventory.get("disks", [])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inventory = json.loads(host["inventory"]) | |
| disks = inventory["disks"] | |
| disks_details = { | |
| "type": [], | |
| "bootable": [], | |
| "name": [], | |
| "path": [], | |
| "by-path": [], | |
| } | |
| for d in disks: | |
| disk_type = d.get("drive_type", "Not available") | |
| disks_details["type"].append(disk_type) | |
| disks_details["bootable"].append(str(d.get("bootable", False))) | |
| disks_details["name"].append(d.get("name", "Not available")) | |
| disks_details["path"].append(d.get("path", "Not available")) | |
| disks_details["by-path"].append(d.get("by_path", "Not available")) | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except json.JSONDecodeError: | |
| inventory = {} | |
| disks = inventory.get("disks", []) | |
| disks_details = { | |
| "type": [], | |
| "bootable": [], | |
| "name": [], | |
| "path": [], | |
| "by-path": [], | |
| } | |
| for d in disks: | |
| disk_type = d.get("drive_type", "Not available") | |
| disks_details["type"].append(disk_type) | |
| disks_details["bootable"].append(str(d.get("bootable", False))) | |
| disks_details["name"].append(d.get("name", "Not available")) | |
| disks_details["path"].append(d.get("path", "Not available")) | |
| disks_details["by-path"].append(d.get("by_path", "Not available")) |
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 323 to 341, the code
assumes host["inventory"] exists and that inventory["disks"] is present and a
list; this can raise KeyError or TypeError. Safely load inventory using
host.get("inventory", "{}") and wrap json.loads in try/except (or default to {}
on failure), then obtain disks with inventory.get("disks", []) and coerce to an
empty list if it's None or not a list; proceed to iterate over that safe disks
list so missing/invalid inventory or disks won't crash the code.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (9)
log_analyzer/log_analyzer.py (2)
90-103: Method name misleading:get_all_cluster_eventsreturns only last partition.The method is named
get_all_cluster_eventsbut line 98 actually returns only the last partition (latest installation attempt), not all events. This contradicts the docstring "Get all the cluster installation events" and could confuse callers expecting all events.Consider this approach to fix the semantic mismatch:
+ def get_all_cluster_events(self) -> List[Dict[str, Any]]: + """Get all the cluster installation events (all attempts).""" + if self._all_events is None: + try: + events_content = self.logs_archive.get("cluster_events.json") + self._all_events = json.loads(cast(str | bytes, events_content)) + except Exception as e: + logger.error("Failed to load cluster events: %s", e) + self._all_events = [] + return self._all_events + - def get_all_cluster_events(self) -> List[Dict[str, Any]]: - """Get all the cluster installation events.""" + def _get_last_partition_events(self) -> List[Dict[str, Any]]: + """Get events from the last installation partition only.""" if self._cluster_events is None: try: - events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) + all_events = self.get_all_cluster_events() # Get the last partition (latest installation attempt) self._cluster_events = self.partition_cluster_events(all_events)[-1] except Exception as e: logger.error("Failed to load cluster events: %s", e) self._cluster_events = [] return self._cluster_eventsThen update
get_last_install_cluster_events(line 80) to callself._get_last_partition_events()instead of re-partitioning.
55-75: Guard against missinginstall_started_atfield.Line 58-60 assumes
md["cluster"]["install_started_at"]exists without checking. If this field is missing or null, the code will raise a KeyError or pass None toisoparse, causing a crash.Apply this diff to add a safe fallback:
@staticmethod def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + installation_started_at = md.get("cluster", {}).get("install_started_at") + if not installation_started_at: + # Cannot partition deleted hosts without installation start time + return md + installation_start_time = dateutil.parser.isoparse(installation_started_at)server.py (1)
1018-1027: Add error handling for robustness.The function lacks error handling around
analyze_cluster, which could cause the MCP tool to fail ungracefully if log download or analysis fails. Consider wrapping in a try-except to return user-friendly error messages.Apply this diff to add error handling:
@track_tool_usage() async def analyze_cluster_logs( cluster_id: Annotated[str, Field(description="The ID of the cluster")], ) -> str: """ Analyze the cluster logs for the given cluster_id and return the results. + + Returns: + str: A formatted string containing signature analysis results. """ - client = InventoryClient(get_access_token()) - results = await analyze_cluster(cluster_id=cluster_id, api_client=client) - return "\n\n".join([str(r) for r in results]) + try: + client = InventoryClient(get_access_token()) + results = await analyze_cluster(cluster_id=cluster_id, api_client=client) + if not results: + return "No issues detected in cluster logs." + return "\n\n".join([str(r) for r in results]) + except Exception as e: + log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True) + return f"Error analyzing cluster logs: {str(e)}"service_client/assisted_service_api.py (1)
160-174: Security concern: Presigned URL logged with credentials.Line 171 logs the full presigned URL, which includes query parameters containing temporary credentials. This could leak credentials in log aggregation systems.
Apply this diff to redact sensitive query parameters:
+from urllib.parse import urlparse + @sanitize_exceptions async def get_cluster_logs( self, cluster_id: str ) -> nestedarchive.RemoteNestedArchive: result = await self._api_call( self._installer_api().v2_get_presigned_for_cluster_files, cluster_id=cluster_id, file_name="logs", ) logs_url = cast(PresignedUrl, result).url - log.info("Downloading logs from %s", logs_url) + parsed = urlparse(cast(str, logs_url)) + safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}" + log.info("Downloading logs from %s", safe_url) return nestedarchive.RemoteNestedArchive( cast(str, logs_url), init_download=True )Note: The import for
urlparseis already present at line 12.log_analyzer/signatures/basic_info.py (3)
30-49: Harden host parsing: guard inventory JSON and missing keysA bad/missing host["inventory"] or missing progress/status will crash the signature. Parse defensively and use .get with defaults.
- inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + inventory = {} if host.get("bootstrap", False): role = "bootstrap" hosts.append( OrderedDict( id=host["id"], hostname=log_analyzer.get_hostname(host), - progress=host["progress"]["current_stage"], - status=host["status"], + progress=host.get("progress", {}).get("current_stage", "N/A"), + status=host.get("status", "N/A"), role=role, boot_mode=inventory.get("boot", {}).get( "current_boot_mode", "N/A" ), status_info=str(info), logs_info=host.get("logs_info", ""), - last_checked_in_at=self.format_time( - host.get("checked_in_at", str(datetime.min)) - ), + last_checked_in_at=self.format_time(host.get("checked_in_at", "")), ) )
218-236: Harden HostsExtraDetailSignature: robust inventory parsing and safe lookupsThis block can KeyError/JSONDecodeError; parse inventory safely and guard nested keys.
- inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + inventory = {} + system_vendor = inventory.get("system_vendor", {}) hosts.append( OrderedDict( id=host["id"], - hostname=inventory["hostname"], + hostname=log_analyzer.get_hostname(host), requested_hostname=host.get("requested_hostname", "N/A"), - last_contacted=self.format_time(host["checked_in_at"]), + last_contacted=self.format_time(host.get("checked_in_at", "")), installation_disk=host.get("installation_disk_path", "N/A"), - product_name=inventory["system_vendor"].get( - "product_name", "Unavailable" - ), - manufacturer=inventory["system_vendor"].get( - "manufacturer", "Unavailable" - ), - virtual_host=inventory["system_vendor"].get("virtual", False), - disks_count=len(inventory["disks"]), + product_name=system_vendor.get("product_name", "Unavailable"), + manufacturer=system_vendor.get("manufacturer", "Unavailable"), + virtual_host=system_vendor.get("virtual", False), + disks_count=len(inventory.get("disks", [])), ) )
323-325: Guard missing/invalid inventory and disks listinventory["disks"] can raise; default to [].
- inventory = json.loads(host["inventory"]) - disks = inventory["disks"] + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + inventory = {} + disks = inventory.get("disks", [])log_analyzer/signatures/platform_specific.py (1)
38-39: Don’t abort signature on bootstrap host; skip itReturning None drops other valid matches. Use continue.
- if host.get("role") == "bootstrap": - return None + if host.get("role") == "bootstrap": + continuelog_analyzer/signatures/advanced_analysis.py (1)
176-193: Guard missing validation-name matches to avoid exceptionsDon’t assume .match() succeeds; skip when it doesn’t.
- succeed_to_failing_counter = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.succeed_to_failing_regexp.match(event["message"]) - ) + succeed_to_failing_counter = Counter( + m.groups()[0] + for event in events + if self.succeed_to_failing_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None + ) - now_fixed = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.now_fixed_regexp.match(event["message"]) - ) + now_fixed = Counter( + m.groups()[0] + for event in events + if self.now_fixed_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None + )
🧹 Nitpick comments (3)
log_analyzer/signatures/basic_info.py (1)
291-307: Safer interface extraction; avoid json.dumps in table fieldsParse inventory defensively; present addresses plainly instead of JSON-encoded strings.
- inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + inventory = {} interfaces_details = defaultdict(list) for interface in inventory.get("interfaces", []): name = interface.get("name") if not name: continue interfaces_details["name"].append(name) - interfaces_details["mac_address"].append( - json.dumps(interface.get("mac_address")) - ) - interfaces_details["ipv4_addresses"].append( - json.dumps(interface.get("ipv4_addresses", [])) - ) - interfaces_details["ipv6_addresses"].append( - json.dumps(interface.get("ipv6_addresses", [])) - ) + interfaces_details["mac_address"].append( + str(interface.get("mac_address", "")) + ) + interfaces_details["ipv4_addresses"].append( + ", ".join(interface.get("ipv4_addresses", [])) + ) + interfaces_details["ipv6_addresses"].append( + ", ".join(interface.get("ipv6_addresses", [])) + )log_analyzer/signatures/error_detection.py (1)
23-31: Escape patterns when building combined regexWithout re.escape, metacharacters in patterns can change matching unintentionally.
- combined_regex = re.compile( - f'({"|".join(fr".*{pattern}.*" for pattern in patterns)})' - ) + escaped = [re.escape(p) for p in patterns] + combined_regex = re.compile(f'({"|".join(fr".*{p}.*" for p in escaped)})')log_analyzer/signatures/advanced_analysis.py (1)
259-275: Try both bundle paths before emitting a warningCurrently returns a warning after the first empty nodes.json, skipping the fallback path. Continue to next base and only warn if both have no items.
- if nodes_table: - return SignatureResult( - signature_name=self.name, - title="Collected nodes.json from installer gather", - content=self.generate_table(nodes_table), - severity="info", - ) - return SignatureResult( - signature_name=self.name, - title="Collected nodes.json from installer gather", - content=( - "The nodes.json file doesn't have any node resources in it. You should probably check the kubelet logs for the 2 non-bootstrap control-plane hosts" - ), - severity="warning", - ) - return None + if nodes_table: + return SignatureResult( + signature_name=self.name, + title="Collected nodes.json from installer gather", + content=self.generate_table(nodes_table), + severity="info", + ) + # No nodes found in this file; try the next base path + continue + # Neither path had node items; emit a warning + return SignatureResult( + signature_name=self.name, + title="Collected nodes.json from installer gather", + content=( + "The nodes.json file doesn't have any node resources in it. You should probably check the kubelet logs for the 2 non-bootstrap control-plane hosts" + ), + severity="warning", + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- template.yaml
- .github/workflows/mypy.yaml
- log_analyzer/signatures/networking.py
- log_analyzer/signatures/init.py
- log_analyzer/signatures/performance.py
- Makefile
🧰 Additional context used
🧬 Code graph analysis (9)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_controller_logs(199-206)get_host_log_file(134-165)
server.py (3)
log_analyzer/main.py (1)
analyze_cluster(14-71)metrics/metrics.py (1)
track_tool_usage(51-65)service_client/assisted_service_api.py (1)
InventoryClient(27-561)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)generate_table(71-75)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_host_log_file(134-165)get_journal_log(167-197)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/basic_info.py (6)
analyze(20-81)analyze(125-163)analyze(169-206)analyze(212-250)analyze(256-287)analyze(315-367)log_analyzer/signatures/error_detection.py (1)
analyze(37-87)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
get_cluster_logs(161-174)log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (2)
SignatureResult(15-48)analyze(59-68)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
sanitize_exceptions(24-58)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (14)
pyproject.toml (2)
18-19: LGTM! Dependencies align with established versions.The added dependencies (nestedarchive and tabulate) use versions that are the latest official releases on PyPI. Based on learnings, nestedarchive 0.2.4 (Aug 2022) and tabulate 0.9.0 (Oct 2022) are stable and appropriate for this use case.
81-89: LGTM! Mypy configuration is appropriate for gradual typing adoption.The new mypy configuration excludes
log_analyzer/(as noted in the PR description) while enabling strict type checking for the rest of the codebase. This is a reasonable approach for integrating copied code that may need additional type annotation work.Dockerfile (1)
17-17: LGTM! Docker layer appropriately includes the new package.The new COPY instruction correctly bundles the log_analyzer directory into the container image, making the log analysis modules available at runtime.
server.py (2)
41-43: LGTM! Sensible default for troubleshooting feature gate.The TROUBLESHOOTING_ENABLED flag defaults to disabled ("0"), which is appropriate for a new troubleshooting feature that should be explicitly opt-in.
1047-1048: LGTM! Appropriate conditional registration of troubleshooting tool.The feature gate correctly enables the log analysis tool only when TROUBLESHOOTING_ENABLED is true, preventing accidental exposure of the troubleshooting feature in production.
log_analyzer/__init__.py (1)
1-15: LGTM! Clean package interface with appropriate public API surface.The
__init__.pyproperly defines the public API by re-exporting key classes and setting__all__. The version string "1.0.0" is appropriate for an initial release.log_analyzer/main.py (2)
14-72: LGTM! Well-structured log analysis orchestration.The
analyze_clusterfunction properly:
- Downloads logs via the API client
- Handles signature filtering when specific signatures are requested
- Includes per-signature error handling to prevent one failure from blocking others
- Logs appropriate information at each step
74-85: LGTM! Clean result presentation.The
print_resultsfunction provides a clear, formatted output with appropriate handling of empty results.log_analyzer/signatures/base.py (3)
15-48: LGTM! Clear and consistent result representation.The
SignatureResultclass provides a clean interface for signature results with appropriate severity handling in the string representation.
51-83: LGTM! Solid base class with useful utilities.The
Signatureabstract base class provides helpful utilities:
generate_tablefor consistent formattingformat_timewith graceful fallback for parsing errors
86-120: LGTM! Appropriate specialization for error signatures.The
ErrorSignatureclass provides a convenientcreate_resulthelper that defaults to "error" severity, which is appropriate for error-focused signatures.log_analyzer/log_analyzer.py (3)
134-165: LGTM! Robust path fallback for different log formats.The
get_host_log_filemethod properly handles both new and old log path formats with appropriate fallback logic.
167-197: LGTM! Consistent fallback pattern for journal logs.The
get_journal_logmethod follows the same robust pattern asget_host_log_file, handling both new and old path formats.
217-228: LGTM! Hostname resolution with sensible fallback.The
get_hostnamestatic method provides a clear fallback chain: requested_hostname → inventory hostname → host ID → "unknown".
| ready = [ | ||
| condition.get("status") == "True" | ||
| for condition in controller_pod.get("status", {}).get( | ||
| "conditions", {} | ||
| ) | ||
| if condition.get("type") == "Ready" | ||
| ][0] | ||
| except Exception: | ||
| ready = False | ||
| conditions_tbl = self.generate_table( | ||
| controller_pod.get("status", {}).get("conditions", []) | ||
| ) | ||
| containers_tbl = self.generate_table( | ||
| controller_pod.get("status", {}).get("containerStatuses", []) | ||
| ) |
There was a problem hiding this comment.
Use list default for conditions/containerStatuses to avoid type issues
Defaulting to {} leads to iterating keys and AttributeError on condition.get.
- ready = [
- condition.get("status") == "True"
- for condition in controller_pod.get("status", {}).get(
- "conditions", {}
- )
+ ready = [
+ condition.get("status") == "True"
+ for condition in controller_pod.get("status", {}).get(
+ "conditions", []
+ )
][0]
@@
- conditions_tbl = self.generate_table(
- controller_pod.get("status", {}).get("conditions", [])
- )
- containers_tbl = self.generate_table(
- controller_pod.get("status", {}).get("containerStatuses", [])
- )
+ conditions_tbl = self.generate_table(
+ controller_pod.get("status", {}).get("conditions", [])
+ )
+ containers_tbl = self.generate_table(
+ controller_pod.get("status", {}).get("containerStatuses", [])
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ready = [ | |
| condition.get("status") == "True" | |
| for condition in controller_pod.get("status", {}).get( | |
| "conditions", {} | |
| ) | |
| if condition.get("type") == "Ready" | |
| ][0] | |
| except Exception: | |
| ready = False | |
| conditions_tbl = self.generate_table( | |
| controller_pod.get("status", {}).get("conditions", []) | |
| ) | |
| containers_tbl = self.generate_table( | |
| controller_pod.get("status", {}).get("containerStatuses", []) | |
| ) | |
| ready = [ | |
| condition.get("status") == "True" | |
| for condition in controller_pod.get("status", {}).get( | |
| "conditions", [] | |
| ) | |
| if condition.get("type") == "Ready" | |
| ][0] | |
| except Exception: | |
| ready = False | |
| conditions_tbl = self.generate_table( | |
| controller_pod.get("status", {}).get("conditions", []) | |
| ) | |
| containers_tbl = self.generate_table( | |
| controller_pod.get("status", {}).get("containerStatuses", []) | |
| ) |
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 407-421, the code
uses {} as the default for "conditions" (and should also use [] for
"containerStatuses") which causes iteration over dict keys and AttributeError
when calling condition.get; replace the defaults with empty lists (use
.get("conditions", []) and .get("containerStatuses", [])) so the list
comprehension iterates over a list and generate_table calls receive lists; keep
existing try/except behavior for empty results.
| some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts) | ||
|
|
||
| for host in hosts: | ||
| if host["role"] not in ["master", "bootstrap"]: | ||
| continue | ||
| if host["progress"]["current_stage"] == "Done" or host["status"] != "error": | ||
| continue | ||
|
|
There was a problem hiding this comment.
Guard host summary against missing keys and progress
Direct indexing of host["progress"]["current_stage"], host["role"], host["status"] can raise. Use safe access.
- some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts)
+ some_done = any(
+ host.get("progress", {}).get("current_stage") == "Done" for host in hosts
+ )
for host in hosts:
- if host["role"] not in ["master", "bootstrap"]:
+ role = host.get("role")
+ if role not in ["master", "bootstrap"]:
continue
- if host["progress"]["current_stage"] == "Done" or host["status"] != "error":
+ current_stage = host.get("progress", {}).get("current_stage", "")
+ if current_stage == "Done" or host.get("status") != "error":
continue
- host_comment = {
+ host_comment = {
"Waiting for bootkube": "bootkube.service never completed",
"Rebooting": "Node never pulled Ignition",
"Configuring": "Node pulled Ignition, but never started kubelet",
"Joined": (
"The Node k8s resource associated with this host is not Ready"
+ (
" or the Assisted Controller is not running on the cluster"
if not some_done
else ""
)
),
"Waiting for control plane": "Masters never formed 2-node cluster",
"Waiting for controller": "Assisted installer controller pod never started",
"Writing image to disk": "Image probably failed to be written on disk",
- }.get(host["progress"]["current_stage"], "Unknown")
+ }.get(current_stage, "Unknown")
host_summary.append(
OrderedDict(
hostname=log_analyzer.get_hostname(host),
- role=host["role"],
+ role=role or "N/A",
description=host_comment,
)
)Also applies to: 111-117
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 86-93 (and similarly
111-117), direct indexing like host["progress"]["current_stage"], host["role"],
and host["status"] can raise KeyError or TypeError when keys or progress are
missing; update these checks to use safe accessors (e.g., host.get("role"),
host.get("status"), and host.get("progress", {}).get("current_stage")) and guard
against non-dict progress values before comparing, defaulting to a safe value
(None or "") so the conditions become robust and skip hosts without the expected
keys.
| inventory = json.loads(host.get("inventory", "{}")) | ||
| if ( | ||
| len(inventory.get("disks", [])) == 1 | ||
| and "KVM" in inventory.get("system_vendor", {}).get("product_name", "") | ||
| and host.get("progress", {}).get("current_stage") == "Rebooting" | ||
| and host.get("status") == "error" | ||
| ): |
There was a problem hiding this comment.
Guard inventory JSON and nested lookups
A bad/missing inventory will crash the loop. Parse safely.
- inventory = json.loads(host.get("inventory", "{}"))
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except (ValueError, json.JSONDecodeError):
+ inventory = {}
if (
len(inventory.get("disks", [])) == 1
and "KVM" in inventory.get("system_vendor", {}).get("product_name", "")
and host.get("progress", {}).get("current_stage") == "Rebooting"
and host.get("status") == "error"
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inventory = json.loads(host.get("inventory", "{}")) | |
| if ( | |
| len(inventory.get("disks", [])) == 1 | |
| and "KVM" in inventory.get("system_vendor", {}).get("product_name", "") | |
| and host.get("progress", {}).get("current_stage") == "Rebooting" | |
| and host.get("status") == "error" | |
| ): | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except (ValueError, json.JSONDecodeError): | |
| inventory = {} | |
| if ( | |
| len(inventory.get("disks", [])) == 1 | |
| and "KVM" in inventory.get("system_vendor", {}).get("product_name", "") | |
| and host.get("progress", {}).get("current_stage") == "Rebooting" | |
| and host.get("status") == "error" | |
| ): |
🤖 Prompt for AI Agents
In log_analyzer/signatures/platform_specific.py around lines 31 to 37, the code
assumes host["inventory"] is valid JSON and that nested keys exist, which will
raise on bad/missing inventory; wrap json.loads in a try/except and fall back to
an empty dict on failure, ensure the parsed inventory is a dict before using
.get, and replace chained index/lookup assumptions with safe .get calls and type
checks (e.g., treat disks only if inventory.get("disks") is a list,
system_vendor as dict) so that on malformed/missing inventory you skip this
branch (continue) instead of crashing the loop.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pyproject.toml (2)
6-6: Fix Python version mismatch (breaks uv sync).Base image runs Python 3.11, but requires-python is ">=3.13".
uv syncwill fail. Align with the runtime.-requires-python = ">=3.13" +requires-python = ">=3.11"
7-20: Add runtime dependency: python-dateutil.log_analyzer imports dateutil.parser at runtime; only types are in dev. This will crash at import without python-dateutil installed.
dependencies = [ "assisted-service-client>=2.41.0.post3", "mcp>=1.15.0", "netaddr>=1.3.0", "requests>=2.32.3", "retry>=0.9.2", "types-requests>=2.32.4.20250611", "prometheus_client>=0.22.1", "pyyaml>=6", "jinja2>=3.1", "pydantic>=2", "nestedarchive>=0.2.4", "tabulate>=0.9.0", + "python-dateutil>=2.9.0", ]As per retrieved learnings: python-dateutil 2.9.0 is stable and compatible with Python 3.11. [Based on learnings]
♻️ Duplicate comments (4)
server.py (1)
1018-1028: Add error handling and clearer output for analyze_cluster_logs.Catch failures from analyze_cluster and return a user-friendly string. Also handle empty results.
@track_tool_usage() async def analyze_cluster_logs( cluster_id: Annotated[str, Field(description="The ID of the cluster")], ) -> str: """ - Analyze the cluster logs for the given cluster_id and return the results. + Analyze the cluster logs for the given cluster_id and return the results. + + Returns: + str: A formatted string containing signature analysis results with severity indicators. """ - client = InventoryClient(get_access_token()) - results = await analyze_cluster(cluster_id=cluster_id, api_client=client) - return "\n\n".join([str(r) for r in results]) + try: + client = InventoryClient(get_access_token()) + results = await analyze_cluster(cluster_id=cluster_id, api_client=client) + if not results: + return "No issues detected in cluster logs." + return "\n".join(str(r) for r in results if str(r).strip()) + except Exception as e: + log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True) + return f"Error analyzing cluster logs: {e}"service_client/assisted_service_api.py (1)
160-175: Redact presigned URL in logs; keep full URL only for download.Logging the full presigned URL leaks credentials. Log only scheme+host+path.
result = await self._api_call( self._installer_api().v2_get_presigned_for_cluster_files, cluster_id=cluster_id, file_name="logs", ) - logs_url = cast(PresignedUrl, result).url - log.info("Downloading logs from %s", logs_url) - return nestedarchive.RemoteNestedArchive( - cast(str, logs_url), init_download=True - ) + logs_url = cast(PresignedUrl, result).url + parsed = urlparse(cast(str, logs_url)) + safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}" + log.info("Downloading logs from %s", safe_url) + return nestedarchive.RemoteNestedArchive(cast(str, logs_url), init_download=True)log_analyzer/log_analyzer.py (2)
55-75: Guard missing install_started_at in metadata cleaning.Current code assumes the key exists and may KeyError. Fall back safely.
- def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: + def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + installation_started_at = md.get("cluster", {}).get("install_started_at") + if not installation_started_at: + return md + installation_start_time = dateutil.parser.isoparse(installation_started_at)
90-104: Fix event loading semantics; return all events here.get_all_cluster_events currently returns only the last partition; rename semantics or return all and partition downstream.
- def get_all_cluster_events(self) -> List[Dict[str, Any]]: + def get_all_cluster_events(self) -> List[Dict[str, Any]]: """Get all the cluster installation events.""" - if self._cluster_events is None: + if self._all_events is None: try: events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) - - # Get the last partition (latest installation attempt) - self._cluster_events = self.partition_cluster_events(all_events)[-1] + self._all_events = json.loads(cast(str | bytes, events_content)) except Exception as e: logger.error("Failed to load cluster events: %s", e) - self._cluster_events = [] - - return self._cluster_events + self._all_events = [] + return self._all_events
🧹 Nitpick comments (4)
server.py (1)
41-43: Make feature flag parsing more flexible (optional).Accept common truthy values ("1", "true", "yes") to ease ops.
-TROUBLESHOOTING_ENABLED = ( - os.environ.get("ENABLE_TROUBLESHOOTING_TOOLS", "0").lower() == "1" -) +TROUBLESHOOTING_ENABLED = os.environ.get( + "ENABLE_TROUBLESHOOTING_TOOLS", "0" +).lower() in ("1", "true", "yes")log_analyzer/log_analyzer.py (1)
34-37: Initialize cache for all events.Prepare a separate cache for all events to avoid overloading _cluster_events with partitioned data.
- self._metadata = None - self._cluster_events = None + self._metadata = None + self._all_events: List[Dict[str, Any]] | None = None + self._cluster_events = Nonelog_analyzer/signatures/base.py (1)
92-102: Remove unused attributes or document their purpose.The
_function_impact_labeland_labelattributes are stored but never referenced increate_resultor anywhere else in the codebase. Either remove these unused parameters or document how subclasses should utilize them.Apply this diff to remove the unused attributes:
- def __init__(self, function_impact_label=None, label=None): + def __init__(self): """ Initialize error signature. - - Args: - function_impact_label: Optional function impact classification - label: Optional custom label for categorization """ super().__init__() - self._function_impact_label = function_impact_label - self._label = labellog_analyzer/signatures/advanced_analysis.py (1)
309-321: Use safe key access for host ID.Direct indexing of
host["id"]on line 313 assumes the key always exists. Use.get("id")with a fallback to handle missing keys gracefully.Apply this diff:
for host in cluster.get("hosts", []): - host_id = host["id"] + host_id = host.get("id") + if not host_id: + continue try: journal_logs = log_analyzer.get_host_log_file(host_id, "journal.logs")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
- Makefile
- log_analyzer/signatures/init.py
- .github/workflows/mypy.yaml
- log_analyzer/main.py
- log_analyzer/signatures/error_detection.py
🧰 Additional context used
🧬 Code graph analysis (9)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
sanitize_exceptions(24-58)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)generate_table(71-75)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_host_log_file(134-165)get_journal_log(167-197)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/basic_info.py (6)
analyze(20-81)analyze(125-163)analyze(169-206)analyze(212-250)analyze(256-287)analyze(315-367)log_analyzer/signatures/error_detection.py (1)
analyze(37-87)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
server.py (3)
log_analyzer/main.py (1)
analyze_cluster(14-71)metrics/metrics.py (1)
track_tool_usage(51-65)service_client/assisted_service_api.py (1)
InventoryClient(27-561)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (4)
Dockerfile (1)
17-17: Include analyzer in image — LGTM.Copying the log_analyzer package into the image is correct and matches imports in server.py.
template.yaml (1)
46-49: Gating parameter and env wiring — LGTM.Parameter ENABLE_TROUBLESHOOTING_TOOLS and its env passthrough correctly gate tool registration in server.py.
Also applies to: 107-108
log_analyzer/__init__.py (1)
1-15: Public API exports — LGTM.Re-exporting LogAnalyzer, ALL_SIGNATURES, and SignatureResult simplifies usage.
log_analyzer/signatures/platform_specific.py (1)
67-73: Guard against None match result.If
agent_log_regex.search(agent_logs)returns None, accessingmatch.group(1)on line 73 will raise AttributeError. The conditional on line 73 checks if match is truthy but only after attempting to parse, not before calling.group().Apply this diff to fix:
def _get_inventory(self, log_analyzer, host_id): try: agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs") except FileNotFoundError: return None match = self.agent_log_regex.search(agent_logs) - return json.loads(match.group(1).replace('\\"', '"')) if match else None + if not match: + return None + return json.loads(match.group(1).replace('\\"', '"'))Likely an incorrect or invalid review comment.
| if ( | ||
| bootstrap_node["progress"]["current_stage"] | ||
| not in eligible_bootstrap_stages | ||
| ): |
There was a problem hiding this comment.
Guard nested progress access.
Direct indexing of bootstrap_node["progress"]["current_stage"] can raise KeyError if the progress key is missing or the value is not a dict.
Apply this diff to use safe access:
- if (
- bootstrap_node["progress"]["current_stage"]
- not in eligible_bootstrap_stages
- ):
+ current_stage = bootstrap_node.get("progress", {}).get("current_stage", "")
+ if current_stage not in eligible_bootstrap_stages:
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ( | |
| bootstrap_node["progress"]["current_stage"] | |
| not in eligible_bootstrap_stages | |
| ): | |
| current_stage = bootstrap_node.get("progress", {}).get("current_stage", "") | |
| if current_stage not in eligible_bootstrap_stages: | |
| return None |
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 133 to 136, the
code directly indexes bootstrap_node["progress"]["current_stage"], which can
raise KeyError or TypeError if progress is missing or not a dict; change the
check to safely access progress by first retrieving progress =
bootstrap_node.get("progress") and confirming it's a dict (or defaulting to an
empty dict), then read current_stage = progress.get("current_stage") (or None)
and use that variable in the if condition so missing or non-dict progress won’t
cause an exception.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: carbonin, keitwb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
Something in the container build is not happy: |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server.py (1)
861-869: Avoid logging full presigned URLs (sensitive).
resultlikely contains the URL; log a redacted form.Apply:
- log.info( - "Successfully retrieved presigned URL for cluster %s credentials file %s - %s", - cluster_id, - file_name, - result, - ) + # Redact query/credentials when logging + safe = result + try: + from urllib.parse import urlparse + parsed = urlparse(result.url or "") + safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}" + except Exception: + safe_url = "<redacted>" + log.info( + "Successfully retrieved presigned URL for cluster %s credentials file %s - %s", + cluster_id, + file_name, + safe_url, + )
♻️ Duplicate comments (14)
log_analyzer/signatures/advanced_analysis.py (2)
176-192: Handle unmatched validation messages in FlappingValidations
self.validation_name_regexp.match(event["message"])may returnNone; calling.groups()on it raises and causes the analyzer to bail out. Filter out unmatched lines before dereferencing the match.- succeed_to_failing_counter = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.succeed_to_failing_regexp.match(event["message"]) - ) + succeed_to_failing_counter = Counter( + match.group(1) + for event in events + if self.succeed_to_failing_regexp.match(event["message"]) + for match in [self.validation_name_regexp.match(event["message"])] + if match is not None + ) @@ - now_fixed = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.now_fixed_regexp.match(event["message"]) - ) + now_fixed = Counter( + match.group(1) + for event in events + if self.now_fixed_regexp.match(event["message"]) + for match in [self.validation_name_regexp.match(event["message"])] + if match is not None + )
128-136: Guard bootstrap progress lookup before comparing stages
bootstrap_node["progress"]["current_stage"]will raise whenprogressis missing/None, which happens in incomplete bundles and aborts the signature. Please fetch the stage via.get()(with an empty dict default) before checking membership.- if ( - bootstrap_node["progress"]["current_stage"] - not in eligible_bootstrap_stages - ): + current_stage = ( + (bootstrap_node.get("progress") or {}).get("current_stage") + ) + if current_stage not in eligible_bootstrap_stages: return Nonelog_analyzer/signatures/error_detection.py (1)
47-71: Protect MasterFailedToPullIgnitionSignature from missing host fieldsA single host lacking
progress,inventory, orstatusdata will throw and drop the entire signature. Please add safe lookups and reuseget_hostnameso malformed inventory blobs don’t explode the analysis.- if ( - host["role"] == "bootstrap" - and host["progress"]["current_stage"] == "WaitingForControlPlane" - ): + role = host.get("role") + current_stage = (host.get("progress") or {}).get("current_stage") + if ( + role == "bootstrap" + and current_stage == "WaitingForControlPlane" + ): # Probably failed due to other issues return None - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except json.JSONDecodeError: + inventory = {} boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A") if ( - host["role"] == "master" - and host["progress"]["current_stage"] == "Rebooting" - and host["status"] == "error" + role == "master" + and current_stage == "Rebooting" + and host.get("status") == "error" and boot_mode == "bios" ): hosts.append( OrderedDict( - HostID=host["id"], - Hostname=inventory["hostname"], - Progress=host["progress"]["current_stage"], + HostID=host.get("id", "unknown"), + Hostname=log_analyzer.get_hostname(host), + Progress=current_stage or "N/A", ) )log_analyzer/signatures/basic_info.py (4)
27-48: Parse host inventory defensively before building the hosts tableA single host with a missing/invalid
inventory,progress, orstatusfield will raise and abort the entire signature, so the analyzer silently returnsNone. Please harden the loop before appending rows.- info = host["status_info"] - role = host["role"] - inventory = json.loads(host["inventory"]) + info = host.get("status_info", "") + role = host.get("role", "unknown") + try: + inventory = json.loads(host.get("inventory", "{}")) + except json.JSONDecodeError: + inventory = {} if host.get("bootstrap", False): role = "bootstrap" hosts.append( OrderedDict( - id=host["id"], + id=host.get("id", "unknown"), hostname=log_analyzer.get_hostname(host), - progress=host["progress"]["current_stage"], - status=host["status"], + progress=(host.get("progress") or {}).get( + "current_stage", "N/A" + ), + status=host.get("status", "unknown"), role=role,…and keep using the guarded
inventory,info, and.get()defaults for the remaining fields in the OrderedDict.
218-236: Harden host extra details against malformed inventoryDirectly indexing
host["inventory"],inventory["hostname"], orhost["checked_in_at"]raises and drops the whole signature when any field is missing. Please wrap the JSON load and switch to safe lookups.- inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except json.JSONDecodeError: + inventory = {} + system_vendor = inventory.get("system_vendor", {}) or {} + disks = inventory.get("disks", []) + disks_count = len(disks) if isinstance(disks, list) else 0 hosts.append( OrderedDict( - id=host["id"], - hostname=inventory["hostname"], + id=host.get("id", "unknown"), + hostname=log_analyzer.get_hostname(host), requested_hostname=host.get("requested_hostname", "N/A"), - last_contacted=self.format_time(host["checked_in_at"]), + last_contacted=self.format_time(host.get("checked_in_at", "")), installation_disk=host.get("installation_disk_path", "N/A"), - product_name=inventory["system_vendor"].get( - "product_name", "Unavailable" - ), - manufacturer=inventory["system_vendor"].get( - "manufacturer", "Unavailable" - ), - virtual_host=inventory["system_vendor"].get("virtual", False), - disks_count=len(inventory["disks"]), + product_name=system_vendor.get("product_name", "Unavailable"), + manufacturer=system_vendor.get("manufacturer", "Unavailable"), + virtual_host=system_vendor.get("virtual", False), + disks_count=disks_count, ) )
323-340: Do not assumedisksexists or is well-formedSome bundles omit
inventory["disks"]or send non-list values; the current code raises and the entire storage report disappears. Guard the JSON load and defaultdisksto an empty list.- inventory = json.loads(host["inventory"]) - disks = inventory["disks"] + try: + inventory = json.loads(host.get("inventory", "{}")) + except json.JSONDecodeError: + inventory = {} + disks = inventory.get("disks", []) + if not isinstance(disks, list): + disks = []
86-117: Guard host progress/role lookups in the summary builder
host["progress"]["current_stage"]and similar subscripts will throw when the key is absent orprogressis not a dict, which is common in partially collected bundles. Please switch to.get()with sensible defaults before building the summary.- host_summary = [] - some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts) + host_summary = [] + some_done = any( + (host.get("progress") or {}).get("current_stage") == "Done" + for host in hosts + ) for host in hosts: - if host["role"] not in ["master", "bootstrap"]: + role = host.get("role") + if role not in ["master", "bootstrap"]: continue - if host["progress"]["current_stage"] == "Done" or host["status"] != "error": + current_stage = (host.get("progress") or {}).get("current_stage") + if current_stage == "Done" or host.get("status") != "error": continue @@ - }.get(host["progress"]["current_stage"], "Unknown") + }.get(current_stage or "", "Unknown") host_summary.append( OrderedDict( hostname=log_analyzer.get_hostname(host), - role=host["role"], + role=role or "N/A", description=host_comment, ) )service_client/assisted_service_api.py (2)
14-14: Import looks correct.Using the top-level
nestedarchiveimport resolves prior mis-import.
170-174: Redact presigned URL when logging (omit query/credentials).Log only scheme+host+path; return the full URL to the client unchanged.
Apply:
- logs_url = cast(PresignedUrl, result).url - log.info("Downloading logs from %s", logs_url) - return nestedarchive.RemoteNestedArchive( - cast(str, logs_url), init_download=True - ) + logs_url = cast(PresignedUrl, result).url + parsed = urlparse(cast(str, logs_url)) + safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}" + log.info("Downloading logs from %s", safe_url) + return nestedarchive.RemoteNestedArchive(cast(str, logs_url), init_download=True)server.py (1)
1018-1027: Handle errors and improve empty-results formatting in analyze_cluster_logs.Add try/except and return a friendly message on empty results.
Apply:
@track_tool_usage() async def analyze_cluster_logs( cluster_id: Annotated[str, Field(description="The ID of the cluster")], ) -> str: """ Analyze the cluster logs for the given cluster_id and return the results. """ - client = InventoryClient(get_access_token()) - results = await analyze_cluster(cluster_id=cluster_id, api_client=client) - return "\n\n".join([str(r) for r in results]) + try: + client = InventoryClient(get_access_token()) + results = await analyze_cluster(cluster_id=cluster_id, api_client=client) + if not results: + return "No issues detected in cluster logs." + return "\n".join(str(r) for r in results if str(r).strip()) + except Exception as e: + log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True) + return f"Error analyzing cluster logs: {e}"log_analyzer/log_analyzer.py (3)
90-104: Fix get_all_cluster_events to return all events (not just last partition).Current implementation returns only the last partition, conflicting with name.
Apply:
- def get_all_cluster_events(self) -> List[Dict[str, Any]]: - """Get all the cluster installation events.""" - if self._cluster_events is None: - try: - events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) - - # Get the last partition (latest installation attempt) - self._cluster_events = self.partition_cluster_events(all_events)[-1] - except Exception as e: - logger.error("Failed to load cluster events: %s", e) - self._cluster_events = [] - - return self._cluster_events + def get_all_cluster_events(self) -> List[Dict[str, Any]]: + """Get all cluster installation events (unpartitioned).""" + if self._all_events is None: + try: + events_content = self.logs_archive.get("cluster_events.json") + self._all_events = json.loads(cast(str | bytes, events_content)) + except Exception as e: + logger.error("Failed to load cluster events: %s", e) + self._all_events = [] + return self._all_events
55-61: Guard missing install_started_at to avoid KeyError.Clusters without this field will crash current code.
Apply:
- installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + installation_started_at = md.get("cluster", {}).get("install_started_at") + if not installation_started_at: + return md + installation_start_time = dateutil.parser.isoparse(installation_started_at)
25-37: Maintain separate caches for all events vs. last attempt.Prevents double-partitioning and clarifies semantics.
Apply:
- _metadata: dict[str, Any] | None + _metadata: dict[str, Any] | None + _all_events: list[dict[str, Any]] | None @@ - self._metadata = None - self._cluster_events = None + self._metadata = None + self._all_events = None + self._cluster_events = Nonelog_analyzer/signatures/networking.py (1)
36-44: Guard machine_networks and inventory JSON (duplicate of prior feedback).Indexing cluster["machine_networks"][0]["cidr"] and unguarded JSON load on host["inventory"] can raise; handle gracefully instead of relying on the broad except. Also consider strict=False for ip_network.
Apply this diff:
- if not cluster.get("hosts"): - return None - host = cluster["hosts"][0] - inventory = json.loads(host["inventory"]) - - # The first one is the machine_cidr that will be used for network configuration - machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"]) + hosts = cluster.get("hosts") or [] + if not hosts: + return None + host = hosts[0] + + inv_raw = host.get("inventory") + if not inv_raw: + return None + try: + inventory = json.loads(inv_raw) + except json.JSONDecodeError: + return None + + # The first one is the machine_cidr that will be used for network configuration + mn = cluster.get("machine_networks") or [] + if not mn or not mn[0].get("cidr"): + return None + machine_cidr = ipaddress.ip_network(mn[0]["cidr"], strict=False)
🧹 Nitpick comments (8)
server.py (1)
17-21: CI build failure: pydantic-core doesn’t build on Python 3.14 with PyO3 0.24.1.Fix options:
- Prefer: use Python <= 3.13 in the build image.
- Or: upgrade pydantic/pydantic-core to a release that supports 3.14.
- Or (interim): set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 during build (Dockerfile/CI env) before
uv sync.Given the failure log, the minimal unblock is the env var; long-term, pin a compatible Python/runtime or bump deps.
log_analyzer/signatures/__init__.py (1)
9-14: Avoidimport *and narrow discovery scope for clarity.Import modules explicitly and inspect only those to reduce namespace noise.
Apply:
-from .basic_info import * # noqa -from .error_detection import * # noqa -from .performance import * # noqa -from .networking import * # noqa -from .advanced_analysis import * # noqa -from .platform_specific import * # noqa +from . import basic_info, error_detection, performance, networking, advanced_analysis, platform_specific @@ -ALL_SIGNATURES = [] - -current_module = sys.modules[__name__] -for name, obj in inspect.getmembers(current_module): +ALL_SIGNATURES = [] +for module in ( + basic_info, + error_detection, + performance, + networking, + advanced_analysis, + platform_specific, +): + for name, obj in inspect.getmembers(module): if ( inspect.isclass(obj) and issubclass(obj, Signature) and obj is not Signature and obj is not ErrorSignature and obj is not SignatureResult ): ALL_SIGNATURES.append(obj)Also applies to: 19-31
log_analyzer/signatures/networking.py (2)
88-110: Avoid substring false-positives when matching VIPs in ip-addr output.Checking
if vip in ip_addrcan match partial IPs (e.g., 10.0.0.1 in 10.0.0.10). Use a word-boundary-like regex per VIP.Apply this diff:
- vips = [vip["ip"] for vip in cluster.get("api_vips", [])] + vips = [vip["ip"] for vip in cluster.get("api_vips", [])] + vip_patterns = {vip: re.compile(rf"(?<!\d){re.escape(vip)}(?!\d)") for vip in vips} @@ - node_ip = os.path.basename(node_dir) + node_ip = getattr(node_dir, "name", os.path.basename(str(node_dir))) @@ - for vip in vips: - if vip in ip_addr: + for vip in vips: + if vip_patterns[vip].search(ip_addr): collisions.append((vip, node_ip)) @@ - for vip in vips: - if vip in bootstrap_ip_addr: + for vip in vips: + if vip_patterns[vip].search(bootstrap_ip_addr): collisions.append((vip, "bootstrap")) @@ - for vip in vips: - if vip in ip_addr: + for vip in vips: + if vip_patterns[vip].search(ip_addr): collisions.append((vip, node_ip)) @@ - for vip in vips: - if vip in bootstrap_ip_addr: + for vip in vips: + if vip_patterns[vip].search(bootstrap_ip_addr): collisions.append((vip, "bootstrap"))Also, please confirm what type
iterdir()yields fromlogs_archive.get(...)(Path-like vs custom object). If not Path-like, preferstr(node_dir)for basename extraction.Also applies to: 123-151, 158-169
181-201: Tighten nameserver parsing to ignore comments and trailing text.Current regex captures entire line; IP parsing then relies on exceptions. Capture only the token.
- NAMESERVER_PATTERN = re.compile(r"^nameserver (.*)$") + NAMESERVER_PATTERN = re.compile(r"^\s*nameserver\s+([^\s#;]+)")log_analyzer/signatures/performance.py (4)
56-66: Usesearchinstead ofmatchfor robustness.Event messages may have prefixes;
searchis safer than start-anchoredmatch.- def get_image_download_info(event): - match = cls.image_download_regex.match(event["message"]) + def get_image_download_info(event): + match = cls.image_download_regex.search(event["message"]) if match: return match.groupdict() return None
90-96: Guard missing/empty hosts in metadata.Avoid KeyError if
cluster["hosts"]is missing/empty.- host_entries = [] - for host in cluster["hosts"]: + host_entries = [] + hosts = cluster.get("hosts") or [] + if not hosts: + return None + for host in hosts: host_id = host["id"] host_events = events_by_host.get(host_id, []) entry = self._create_host_entry(host, host_events, log_analyzer) host_entries.append(entry)
246-250: Avoid double regex evaluation in generator.
get_duration(event)runs twice per event. Use walrus.- return ( - (event, get_duration(event)) - for event in events - if get_duration(event) is not None - ) + return ( + (event, duration) + for event in events + if (duration := get_duration(event)) is not None + )
6-12: CI failure: Python 3.14 + pydantic-core (PyO3 0.24.1) workaround.Container build fails compiling pydantic-core due to PyO3 supporting up to 3.13. Options:
- Prefer: pin base Python to 3.13 for now.
- Or: set ENV PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 before running uv/pip to use stable ABI.
- Or: upgrade to a pydantic/pydantic-core release with wheels/pyO3 that support 3.14 (if available).
Example Dockerfile tweak:
- RUN uv sync + ENV PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 + RUN uv syncPlease verify which route you prefer and whether newer pydantic-core wheels for cp314 are available in your build environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- template.yaml
- log_analyzer/main.py
- .github/workflows/mypy.yaml
- log_analyzer/signatures/platform_specific.py
- Makefile
- Dockerfile
🧰 Additional context used
🧬 Code graph analysis (10)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/basic_info.py (6)
analyze(20-81)analyze(125-163)analyze(169-206)analyze(212-250)analyze(256-287)analyze(315-367)log_analyzer/signatures/error_detection.py (1)
analyze(37-87)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)
server.py (3)
log_analyzer/main.py (1)
analyze_cluster(14-71)metrics/metrics.py (1)
track_tool_usage(51-65)service_client/assisted_service_api.py (1)
InventoryClient(27-561)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
sanitize_exceptions(24-58)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
log_analyzer/__init__.py (2)
log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (1)
SignatureResult(15-48)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
🔇 Additional comments (2)
log_analyzer/__init__.py (1)
7-15: Public surface looks good.Re-exporting LogAnalyzer, ALL_SIGNATURES, and SignatureResult is appropriate.
log_analyzer/log_analyzer.py (1)
77-84: Derive last attempt from partitioning of all events.After fixing get_all_cluster_events, this remains correct; keep as-is or clarify via local var names.
| def _get_interfaces(self, host): | ||
| """Extract interface information from host.""" | ||
| inventory = json.loads(host["inventory"]) | ||
| interfaces_details = defaultdict(list) | ||
|
|
||
| for interface in inventory.get("interfaces", []): | ||
| name = interface.get("name") | ||
| if not name: | ||
| continue | ||
| interfaces_details["name"].append(name) | ||
| interfaces_details["mac_address"].append( | ||
| json.dumps(interface.get("mac_address")) | ||
| ) | ||
| interfaces_details["ipv4_addresses"].append( | ||
| json.dumps(interface.get("ipv4_addresses", [])) | ||
| ) | ||
| interfaces_details["ipv6_addresses"].append( | ||
| json.dumps(interface.get("ipv6_addresses", [])) | ||
| ) |
There was a problem hiding this comment.
Make _get_interfaces resilient to bad inventory blobs
Parsing host["inventory"] without guards can still throw and short-circuit the parent signature. Load via .get() and catch JSONDecodeError before iterating interfaces.
- inventory = json.loads(host["inventory"])
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except json.JSONDecodeError:
+ inventory = {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_interfaces(self, host): | |
| """Extract interface information from host.""" | |
| inventory = json.loads(host["inventory"]) | |
| interfaces_details = defaultdict(list) | |
| for interface in inventory.get("interfaces", []): | |
| name = interface.get("name") | |
| if not name: | |
| continue | |
| interfaces_details["name"].append(name) | |
| interfaces_details["mac_address"].append( | |
| json.dumps(interface.get("mac_address")) | |
| ) | |
| interfaces_details["ipv4_addresses"].append( | |
| json.dumps(interface.get("ipv4_addresses", [])) | |
| ) | |
| interfaces_details["ipv6_addresses"].append( | |
| json.dumps(interface.get("ipv6_addresses", [])) | |
| ) | |
| def _get_interfaces(self, host): | |
| """Extract interface information from host.""" | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except json.JSONDecodeError: | |
| inventory = {} | |
| interfaces_details = defaultdict(list) | |
| for interface in inventory.get("interfaces", []): | |
| name = interface.get("name") | |
| if not name: | |
| continue | |
| interfaces_details["name"].append(name) | |
| interfaces_details["mac_address"].append( | |
| json.dumps(interface.get("mac_address")) | |
| ) | |
| interfaces_details["ipv4_addresses"].append( | |
| json.dumps(interface.get("ipv4_addresses", [])) | |
| ) | |
| interfaces_details["ipv6_addresses"].append( | |
| json.dumps(interface.get("ipv6_addresses", [])) | |
| ) |
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 289 to 307,
_get_interfaces currently calls json.loads(host["inventory"]) directly which can
raise KeyError or JSONDecodeError and abort the signature; change it to use
host.get("inventory") with a safe default (e.g., "{}" or empty dict), attempt
json.loads inside a try/except that catches json.JSONDecodeError (and TypeError
if non-string) and fall back to an empty dict, then proceed to iterate
inventory.get("interfaces", []) so the function is resilient to missing or
malformed inventory blobs.
| "pydantic>=2.11.9", | ||
| "nestedarchive>=0.2.4", | ||
| "tabulate>=0.9.0", |
There was a problem hiding this comment.
Add runtime dependency on python-dateutil
log_analyzer/signatures/base.py imports dateutil.parser, but only the typing stub (types-python-dateutil) was added. Without the real python-dateutil wheel the analyzer aborts with ModuleNotFoundError. Please add the runtime package next to the other new dependencies.
"pydantic>=2.11.9",
+ "python-dateutil>=2.9.0.post0",
"nestedarchive>=0.2.4",Based on learnings
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "pydantic>=2.11.9", | |
| "nestedarchive>=0.2.4", | |
| "tabulate>=0.9.0", | |
| "pydantic>=2.11.9", | |
| "python-dateutil>=2.9.0.post0", | |
| "nestedarchive>=0.2.4", | |
| "tabulate>=0.9.0", |
🤖 Prompt for AI Agents
In pyproject.toml around lines 17 to 19, the code imports dateutil.parser but
only the typing stub was added; add the runtime dependency "python-dateutil"
(pin to a compatible minimum version, e.g. >=2.8.2) to the same dependencies
list alongside pydantic, nestedarchive and tabulate so the real package is
installed at runtime and prevents ModuleNotFoundError.
Fixed it by upgrading pydantic to latest. |
- Copied it in from https://github.com/carbonin/assisted-log-analyzer - Merged API Client method into existing api client - Got rid of cli main method - Disabled pydocstyle checker as it is very low value - Skipping mypy check for the new log_analyzer package as there are too many errors - Moving mypy config to pyproject.toml - Adding tool to call the analyze logs function and return the string dump of the signatures.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (14)
log_analyzer/signatures/advanced_analysis.py (3)
176-192: Make FlappingValidations resilient to unexpected message formats.Lines 176-183 and 185-192 use
cast(re.Match[str], self.validation_name_regexp.match(...))which assumes the match always succeeds. Ifvalidation_name_regexpdoesn't match a message (even thoughsucceed_to_failing_regexpdid), the cast will fail and crash the signature.Apply this diff to filter out non-matching messages:
succeed_to_failing_counter = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] + m.groups()[0] for event in events if self.succeed_to_failing_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None ) now_fixed = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] + m.groups()[0] for event in events if self.now_fixed_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None )Based on past review comments.
133-136: Guard nested progress access to avoid KeyError.Line 134 directly indexes
bootstrap_node["progress"]["current_stage"]without checking if theprogresskey exists or is a dict. This will raiseKeyErrorifprogressis missing orTypeErrorif it's not a dict.Apply this diff:
- if ( - bootstrap_node["progress"]["current_stage"] - not in eligible_bootstrap_stages - ): + current_stage = bootstrap_node.get("progress", {}).get("current_stage", "") + if current_stage not in eligible_bootstrap_stages: return NoneBased on past review comments.
407-421: Use list default for conditions/containerStatuses to avoid iteration errors.Lines 410 and 417 use
{}as the default for"conditions"and"containerStatuses". Since these are lists in the Kubernetes API, defaulting to{}will cause the code to iterate over dict keys instead of list items, leading toAttributeErrorwhen callingcondition.get(...).Apply this diff:
ready = [ condition.get("status") == "True" for condition in controller_pod.get("status", {}).get( - "conditions", {} + "conditions", [] ) if condition.get("type") == "Ready" ][0] @@ conditions_tbl = self.generate_table( - controller_pod.get("status", {}).get("conditions", []) + controller_pod.get("status", {}).get("conditions", []) ) containers_tbl = self.generate_table( - controller_pod.get("status", {}).get("containerStatuses", []) + controller_pod.get("status", {}).get("containerStatuses", []) )Based on past review comments.
server.py (1)
1018-1028: Add error handling and improve result formatting.The function has several issues flagged in past reviews:
- No error handling: If
analyze_clusterraises an exception (e.g., network error, invalid cluster ID), it will crash the MCP tool call with an unhandled exception.- Suboptimal formatting: Using list
str()representation produces output like[SignatureResult(...), ...]instead of formatted analysis results.Apply this diff:
@track_tool_usage() async def analyze_cluster_logs( cluster_id: Annotated[str, Field(description="The ID of the cluster")], ) -> str: """ Analyze the cluster logs for the given cluster_id and return the results. + + Returns: + str: Formatted string containing signature analysis results with severity indicators. """ - client = InventoryClient(get_access_token()) - results = await analyze_cluster(cluster_id=cluster_id, api_client=client) - return "\n\n".join([str(r) for r in results]) + try: + client = InventoryClient(get_access_token()) + results = await analyze_cluster(cluster_id=cluster_id, api_client=client) + if not results: + return "No issues detected in cluster logs." + return "\n\n".join(str(result) for result in results) + except Exception as e: + log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True) + return f"Error analyzing cluster logs: {str(e)}"Based on past review comments.
log_analyzer/log_analyzer.py (2)
90-103: Fix incorrect event loading semantics.The method
get_all_cluster_eventsshould return all events, but line 98 filters to only the last partition. This creates confusion:
- The method name says "all" but returns only the latest attempt
- Line 83 in
get_last_install_cluster_eventscalls this method and partitions again (unnecessary)- The cache variable
_cluster_eventsstores only the last partition, not all eventsApply this diff to fix the semantics:
+ _all_events: List[Dict[str, Any]] | None + _cluster_events: List[Dict[str, Any]] | None + def __init__(self, logs_archive: nestedarchive.RemoteNestedArchive): """ Initialize the log analyzer. Args: logs_archive: RemoteNestedArchive containing the cluster logs """ self.logs_archive = logs_archive self._metadata = None + self._all_events = None self._cluster_events = None @@ def get_all_cluster_events(self) -> List[Dict[str, Any]]: """Get all the cluster installation events.""" - if self._cluster_events is None: + if self._all_events is None: try: events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) - - # Get the last partition (latest installation attempt) - self._cluster_events = self.partition_cluster_events(all_events)[-1] + self._all_events = json.loads(cast(str | bytes, events_content)) except Exception as e: logger.error("Failed to load cluster events: %s", e) - self._cluster_events = [] - - return self._cluster_events + self._all_events = [] + return self._all_eventsBased on past review comments.
55-75: Guard missinginstall_started_atto prevent crashes.Lines 58-60 assume
install_started_atexists in the metadata. If this field is missing (which can happen with incomplete or corrupted metadata),KeyErrorwill crash the metadata loading.Apply this diff:
@staticmethod def _clean_metadata_json(md: Dict[str, Any]) -> Dict[str, Any]: """Clean metadata JSON by separating deleted hosts.""" - installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + install_started_at = md.get("cluster", {}).get("install_started_at") + if not install_started_at: + # If no installation start time, we can't partition deleted hosts + return md + + installation_start_time = dateutil.parser.isoparse(install_started_at)Based on past review comments.
log_analyzer/signatures/error_detection.py (1)
49-72: Harden host parsing to prevent crashes on malformed data.Lines 49-72 have multiple unsafe accesses that can crash on missing keys or invalid JSON:
- Lines 50-51: Direct indexing of
host["role"]andhost["progress"]["current_stage"]without.get()- Line 56:
json.loads(host["inventory"])without try/except forJSONDecodeError- Line 68: Direct indexing of
inventory["hostname"]Apply this diff to handle missing/malformed data gracefully:
for host in cluster_hosts: if ( - host["role"] == "bootstrap" - and host["progress"]["current_stage"] == "WaitingForControlPlane" + host.get("role") == "bootstrap" + and host.get("progress", {}).get("current_stage") == "WaitingForControlPlane" ): # Probably failed due to other issues return None - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + logger.warning("Invalid inventory for host %s", host.get("id")) + continue boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A") if ( - host["role"] == "master" - and host["progress"]["current_stage"] == "Rebooting" - and host["status"] == "error" + host.get("role") == "master" + and host.get("progress", {}).get("current_stage") == "Rebooting" + and host.get("status") == "error" and boot_mode == "bios" ): hosts.append( OrderedDict( HostID=host["id"], - Hostname=inventory["hostname"], - Progress=host["progress"]["current_stage"], + Hostname=log_analyzer.get_hostname(host), + Progress=host.get("progress", {}).get("current_stage", "N/A"), ) )Based on past review comments.
log_analyzer/signatures/platform_specific.py (1)
31-49: Guard JSON parsing to prevent signature crashes.Line 31 parses
host.get("inventory", "{}")without error handling. If inventory contains invalid JSON (which can happen with corrupted or incomplete data),json.loadswill raiseJSONDecodeErrorand abort the entire signature analysis, preventing other signatures from running.Apply this diff to handle malformed inventory gracefully:
hosts = [] for host in cluster.get("hosts", []): - inventory = json.loads(host.get("inventory", "{}")) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (ValueError, json.JSONDecodeError): + logger.warning("Invalid inventory JSON for host %s, skipping", host.get("id")) + continue if ( len(inventory.get("disks", [])) == 1Based on past review comments that explicitly requested this fix.
log_analyzer/signatures/basic_info.py (5)
26-50: Don’t let a single bad host abort the whole signature; parse inventory and keys defensivelyGuard JSON decoding and missing keys so one malformed host doesn’t drop the entire HostsStatusSignature result.
Apply:
- for host in cluster["hosts"]: - info = host["status_info"] - role = host["role"] - inventory = json.loads(host["inventory"]) - if host.get("bootstrap", False): - role = "bootstrap" + for host in cluster.get("hosts", []): + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, TypeError): + inventory = {} + info = host.get("status_info", "") + role = "bootstrap" if host.get("bootstrap", False) else host.get("role") @@ - id=host["id"], + id=host.get("id", "unknown"), hostname=log_analyzer.get_hostname(host), - progress=host["progress"]["current_stage"], - status=host["status"], - role=role, + progress=host.get("progress", {}).get("current_stage", "N/A"), + status=host.get("status", "N/A"), + role=role or "N/A", boot_mode=inventory.get("boot", {}).get( "current_boot_mode", "N/A" ), status_info=str(info), logs_info=host.get("logs_info", ""), - last_checked_in_at=self.format_time( - host.get("checked_in_at", str(datetime.min)) - ), + last_checked_in_at=self.format_time(host.get("checked_in_at", "")),Based on learnings
86-117: Harden hosts summary: safe progress/role/status access to avoid KeyErrorUse .get for nested keys and compute current_stage once.
- some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts) + some_done = any( + host.get("progress", {}).get("current_stage") == "Done" for host in hosts + ) - for host in hosts: - if host["role"] not in ["master", "bootstrap"]: + for host in hosts: + role = host.get("role") + if role not in ["master", "bootstrap"]: continue - if host["progress"]["current_stage"] == "Done" or host["status"] != "error": + current_stage = host.get("progress", {}).get("current_stage", "") + if current_stage == "Done" or host.get("status") != "error": continue @@ - }.get(host["progress"]["current_stage"], "Unknown") + }.get(current_stage, "Unknown") @@ - role=host["role"], + role=role or "N/A",
218-237: Harden host extra details: robust inventory parsing and safe lookupsAvoid KeyError/JSON errors; prefer get_hostname and defaults.
- for host in cluster["hosts"]: - inventory = json.loads(host["inventory"]) + for host in cluster.get("hosts", []): + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, TypeError): + inventory = {} + system_vendor = inventory.get("system_vendor", {}) hosts.append( OrderedDict( - id=host["id"], - hostname=inventory["hostname"], + id=host.get("id", "unknown"), + hostname=log_analyzer.get_hostname(host), requested_hostname=host.get("requested_hostname", "N/A"), - last_contacted=self.format_time(host["checked_in_at"]), + last_contacted=self.format_time(host.get("checked_in_at", "")), installation_disk=host.get("installation_disk_path", "N/A"), - product_name=inventory["system_vendor"].get( - "product_name", "Unavailable" - ), - manufacturer=inventory["system_vendor"].get( - "manufacturer", "Unavailable" - ), - virtual_host=inventory["system_vendor"].get("virtual", False), - disks_count=len(inventory["disks"]), + product_name=system_vendor.get("product_name", "Unavailable"), + manufacturer=system_vendor.get("manufacturer", "Unavailable"), + virtual_host=system_vendor.get("virtual", False), + disks_count=len(inventory.get("disks", [])), ) )
289-309: Make _get_interfaces resilient; avoid quoting MAC addressGuard JSON parse and avoid json.dumps for scalars to prevent extra quotes.
- def _get_interfaces(self, host): + def _get_interfaces(self, host): """Extract interface information from host.""" - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, TypeError): + inventory = {} interfaces_details = defaultdict(list) for interface in inventory.get("interfaces", []): name = interface.get("name") if not name: continue interfaces_details["name"].append(name) - interfaces_details["mac_address"].append( - json.dumps(interface.get("mac_address")) - ) + interfaces_details["mac_address"].append(str(interface.get("mac_address", ""))) interfaces_details["ipv4_addresses"].append( json.dumps(interface.get("ipv4_addresses", [])) ) interfaces_details["ipv6_addresses"].append( json.dumps(interface.get("ipv6_addresses", [])) )
321-350: Guard inventory/disks; default to empty listPrevent KeyError/TypeError on missing or malformed inventory.
- for host in cluster["hosts"]: - inventory = json.loads(host["inventory"]) - disks = inventory["disks"] + for host in cluster.get("hosts", []): + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, TypeError): + inventory = {} + disks = inventory.get("disks", []) or []log_analyzer/signatures/networking.py (1)
36-45: Guard machine_networks and inventory parsing to avoid IndexError/KeyErrorHandle empty hosts/machine_networks and malformed inventory gracefully.
- if not cluster.get("hosts"): + if not cluster.get("hosts"): return None host = cluster["hosts"][0] - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, TypeError): + inventory = {} - # The first one is the machine_cidr that will be used for network configuration - machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"]) + # The first one is the machine_cidr that will be used for network configuration + mn = cluster.get("machine_networks") or [] + if not mn or not mn[0].get("cidr"): + return None + machine_cidr = ipaddress.ip_network(mn[0]["cidr"])
🧹 Nitpick comments (6)
log_analyzer/signatures/base.py (1)
40-46: Minor: Simplify redundant else branch.Line 46 assigns
headerto itself unnecessarily. The else branch can be removed since lines 41-44 already handle the error and warning cases.Apply this diff:
header = f"=== {self.title} ===" if self.severity == "error": header = f"ERROR: {header}" elif self.severity == "warning": header = f"WARNING: {header}" - else: - header = f"{header}"log_analyzer/signatures/basic_info.py (1)
175-190: Optional: use safe accessors to avoid aborting on missing fieldsFailureDescription will crash on missing created_at/status/status_info. Prefer .get defaults.
- "Cluster ID": cluster["id"], + "Cluster ID": cluster.get("id", "N/A"), @@ - "Created At": self.format_time(cluster["created_at"]), + "Created At": self.format_time(cluster.get("created_at", "")), @@ - "Status": cluster["status"], - "Status Info": cluster["status_info"], + "Status": cluster.get("status", "N/A"), + "Status Info": cluster.get("status_info", "N/A"),log_analyzer/signatures/networking.py (1)
6-12: CI build fix for Python 3.14 + pydantic-core/PyO3Container build fails compiling pydantic-core due to PyO3 0.24.1 max Python 3.13. Options:
- Set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 in the build stage (uv sync step).
- Or pin Python base image to 3.13.
- Or upgrade pydantic-core/pydantic to a version with Python 3.14 wheels.
log_analyzer/signatures/performance.py (3)
33-35: Avoid ValueError on non-numeric download_rateGuard float conversion so a single odd event doesn’t drop the signature.
- for image_info in image_info_list: - if float(image_info["download_rate"]) < self.minimum_download_rate_mb: - abnormal_image_info.append(image_info) + for image_info in image_info_list: + try: + rate = float(image_info["download_rate"]) + except (TypeError, ValueError): + continue + if rate < self.minimum_download_rate_mb: + abnormal_image_info.append(image_info)
126-135: Per-host resilience: don’t abort on timestamp parsing errorsCatch generic exceptions when deriving start/end to keep other hosts’ results.
- try: + try: start_time = cls._get_start_event_timestamp(host_events) end_time = cls._get_end_event_timestamp(host_events) total_duration_seconds = (end_time - start_time).total_seconds() - except cls.NoEventFound: + except (cls.NoEventFound, Exception): return entry
246-250: Avoid double regex work in _get_fio_eventsCompute duration once with the walrus operator.
- return ( - (event, get_duration(event)) - for event in events - if get_duration(event) is not None - ) + return ( + (event, duration) + for event in events + if (duration := get_duration(event)) is not None + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- log_analyzer/signatures/init.py
- pyproject.toml
- log_analyzer/init.py
- service_client/assisted_service_api.py
- .github/workflows/mypy.yaml
- Dockerfile
🧰 Additional context used
🧬 Code graph analysis (9)
log_analyzer/main.py (3)
service_client/assisted_service_api.py (1)
get_cluster_logs(161-174)log_analyzer/log_analyzer.py (1)
LogAnalyzer(22-228)log_analyzer/signatures/base.py (2)
SignatureResult(15-48)analyze(59-68)
log_analyzer/signatures/base.py (3)
log_analyzer/signatures/advanced_analysis.py (9)
analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/basic_info.py (6)
analyze(20-81)analyze(125-163)analyze(169-206)analyze(212-250)analyze(256-287)analyze(315-367)log_analyzer/signatures/error_detection.py (1)
analyze(37-87)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
server.py (2)
log_analyzer/main.py (1)
analyze_cluster(14-71)service_client/assisted_service_api.py (1)
InventoryClient(27-561)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)generate_table(71-75)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_host_log_file(134-165)get_journal_log(167-197)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (5)
Makefile (1)
57-64: Good call aligning mypy and format targets.Pointing mypy at
pyproject.tomland extendingformatwith Ruff autofixes keeps the workflow consistent with the new configuration.template.yaml (1)
46-48: LGTM!The new parameter and environment variable configuration correctly implement the troubleshooting tools feature flag, defaulting to disabled ("0") as requested in past reviews.
Also applies to: 107-108
log_analyzer/main.py (2)
14-71: LGTM!The
analyze_clusterfunction is well-structured with:
- Proper error handling at both the outer level (lines 69-71) and per-signature level (lines 62-65)
- Clear logging for debugging
- Graceful handling of unknown signatures
- Clean async/await pattern
74-85: LGTM!The
print_resultsfunction provides clear, user-friendly output formatting with appropriate handling of empty results.log_analyzer/signatures/basic_info.py (1)
125-159: LGTM: versions/release_tag access is wrap-aware and robustUsing metadata.get(...) with cluster_md fallback is correct.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (15)
service_client/assisted_service_api.py (1)
160-174: Redact credentials from presigned URL before logging.Line 171 logs the full presigned URL, which typically includes temporary credentials in the query string (e.g.,
X-Amz-Signature, tokens). Logging these credentials can expose them in log aggregation systems.Apply this diff to log only the safe parts of the URL:
+from urllib.parse import urlparse + @sanitize_exceptions async def get_cluster_logs( self, cluster_id: str ) -> nestedarchive.RemoteNestedArchive: result = await self._api_call( self._installer_api().v2_get_presigned_for_cluster_files, cluster_id=cluster_id, file_name="logs", ) logs_url = cast(PresignedUrl, result).url - log.info("Downloading logs from %s", logs_url) + parsed = urlparse(cast(str, logs_url)) + safe_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}" + log.info("Downloading logs from %s", safe_url) return nestedarchive.RemoteNestedArchive( cast(str, logs_url), init_download=True )pyproject.toml (1)
17-19: Add runtime dependency onpython-dateutil.
log_analyzer/signatures/base.pyimportsdateutil.parser, but only the typing stub (types-python-dateutilin dev dependencies) is present. Without the runtime package, the analyzer will fail withModuleNotFoundError.Based on learnings
Apply this diff:
"pydantic>=2.12.1", + "python-dateutil>=2.9.0.post0", "nestedarchive>=0.2.4",log_analyzer/signatures/error_detection.py (1)
37-87: Harden host parsing with safe JSON decoding and key access.Lines 50-51, 56, 60-63, 68-69 directly access dictionaries and parse JSON without guards. This will crash the signature on missing keys or invalid inventory.
Apply this diff:
hosts = [] for host in cluster_hosts: if ( - host["role"] == "bootstrap" - and host["progress"]["current_stage"] == "WaitingForControlPlane" + host.get("role") == "bootstrap" + and host.get("progress", {}).get("current_stage") == "WaitingForControlPlane" ): # Probably failed due to other issues return None - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, ValueError): + inventory = {} boot_mode = inventory.get("boot", {}).get("current_boot_mode", "N/A") if ( - host["role"] == "master" - and host["progress"]["current_stage"] == "Rebooting" - and host["status"] == "error" + host.get("role") == "master" + and host.get("progress", {}).get("current_stage") == "Rebooting" + and host.get("status") == "error" and boot_mode == "bios" ): hosts.append( OrderedDict( HostID=host["id"], - Hostname=inventory["hostname"], - Progress=host["progress"]["current_stage"], + Hostname=log_analyzer.get_hostname(host), + Progress=host.get("progress", {}).get("current_stage", "N/A"), ) )server.py (1)
1018-1027: Add error handling and improve docstring.The function lacks exception handling for
analyze_cluster, which can cause the MCP tool to fail with an unhandled exception. The docstring should also describe the return format.Apply this diff:
@track_tool_usage() async def analyze_cluster_logs( cluster_id: Annotated[str, Field(description="The ID of the cluster")], ) -> str: """ Analyze the cluster logs for the given cluster_id and return the results. + + Returns: + str: A formatted string containing signature analysis results with severity indicators, + or an error message if analysis fails. """ - client = InventoryClient(get_access_token()) - results = await analyze_cluster(cluster_id=cluster_id, api_client=client) - return "\n\n".join([str(r) for r in results]) + try: + client = InventoryClient(get_access_token()) + results = await analyze_cluster(cluster_id=cluster_id, api_client=client) + if not results: + return "No issues detected in cluster logs." + return "\n\n".join(str(r) for r in results) + except Exception as e: + log.error("Error analyzing cluster logs for %s: %s", cluster_id, e, exc_info=True) + return f"Error analyzing cluster logs: {str(e)}"log_analyzer/signatures/basic_info.py (4)
289-309: Guard JSON parsing in _get_interfaces helper.Line 291 directly parses JSON without error handling. A malformed inventory will cause the helper to crash.
Apply this diff:
def _get_interfaces(self, host): """Extract interface information from host.""" - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, ValueError): + inventory = {} interfaces_details = defaultdict(list)
83-119: Use safe dictionary access in host summary generation.Lines 86, 89, 91, 109, 114 directly index dictionaries without checking for key existence. This can raise
KeyErrorwhen keys are missing.Apply this diff:
def _generate_hosts_summary(self, hosts, log_analyzer): """Generate a summary of host issues.""" host_summary = [] - some_done = any(host["progress"]["current_stage"] == "Done" for host in hosts) + some_done = any( + host.get("progress", {}).get("current_stage") == "Done" for host in hosts + ) for host in hosts: - if host["role"] not in ["master", "bootstrap"]: + role = host.get("role") + if role not in ["master", "bootstrap"]: continue - if host["progress"]["current_stage"] == "Done" or host["status"] != "error": + current_stage = host.get("progress", {}).get("current_stage", "") + if current_stage == "Done" or host.get("status") != "error": continue host_comment = { "Waiting for bootkube": "bootkube.service never completed", "Rebooting": "Node never pulled Ignition", "Configuring": "Node pulled Ignition, but never started kubelet", "Joined": ( "The Node k8s resource associated with this host is not Ready" + ( " or the Assisted Controller is not running on the cluster" if not some_done else "" ) ), "Waiting for control plane": "Masters never formed 2-node cluster", "Waiting for controller": "Assisted installer controller pod never started", "Writing image to disk": "Image probably failed to be written on disk", - }.get(host["progress"]["current_stage"], "Unknown") + }.get(current_stage, "Unknown") host_summary.append( OrderedDict( hostname=log_analyzer.get_hostname(host), - role=host["role"], + role=role or "N/A", description=host_comment, ) )
209-250: Add defensive parsing in HostsExtraDetailSignature.Lines 220-236 directly parse JSON and access nested keys without error handling. Missing or malformed inventory will crash the signature.
Apply this diff:
hosts = [] for host in cluster["hosts"]: - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, ValueError): + inventory = {} + system_vendor = inventory.get("system_vendor", {}) hosts.append( OrderedDict( id=host["id"], - hostname=inventory["hostname"], + hostname=log_analyzer.get_hostname(host), requested_hostname=host.get("requested_hostname", "N/A"), - last_contacted=self.format_time(host["checked_in_at"]), + last_contacted=self.format_time(host.get("checked_in_at", "")), installation_disk=host.get("installation_disk_path", "N/A"), - product_name=inventory["system_vendor"].get( - "product_name", "Unavailable" - ), - manufacturer=inventory["system_vendor"].get( - "manufacturer", "Unavailable" - ), - virtual_host=inventory["system_vendor"].get("virtual", False), - disks_count=len(inventory["disks"]), + product_name=system_vendor.get("product_name", "Unavailable"), + manufacturer=system_vendor.get("manufacturer", "Unavailable"), + virtual_host=system_vendor.get("virtual", False), + disks_count=len(inventory.get("disks", [])), ) )
312-367: Add error handling in StorageDetailSignature.Lines 323-324 directly parse JSON and access the disks array without guards. Missing or invalid inventory will crash the signature.
Apply this diff:
hosts = [] for host in cluster["hosts"]: - inventory = json.loads(host["inventory"]) - disks = inventory["disks"] + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, ValueError): + inventory = {} + disks = inventory.get("disks", [])log_analyzer/signatures/platform_specific.py (1)
22-56: Add JSON parse error handling in LibvirtRebootFlagSignature.Line 31 calls
json.loadswithout a try/except. If the inventory contains invalid JSON, the signature will crash instead of gracefully handling the error.Apply this diff:
hosts = [] for host in cluster.get("hosts", []): - inventory = json.loads(host.get("inventory", "{}")) + try: + inventory = json.loads(host.get("inventory", "{}")) + except (json.JSONDecodeError, ValueError): + inventory = {} if (log_analyzer/signatures/advanced_analysis.py (3)
407-413: Use list default for pod conditions to prevent type issues.Defaulting to {} leads to iterating dict keys and failures on condition.get.
Apply this diff:
@@ - ready = [ + ready = [ condition.get("status") == "True" for condition in controller_pod.get("status", {}).get( - "conditions", {} + "conditions", [] ) if condition.get("type") == "Ready" ][0]
176-193: Make FlappingValidations robust to unexpected message formats.validation_name_regexp may not match; current code assumes a match and will raise.
Apply this diff:
@@ - succeed_to_failing_counter = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.succeed_to_failing_regexp.match(event["message"]) - ) + succeed_to_failing_counter = Counter( + m.groups()[0] + for event in events + if self.succeed_to_failing_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None + ) @@ - now_fixed = Counter( - cast( - re.Match[str], - self.validation_name_regexp.match(event["message"]), - ).groups()[0] - for event in events - if self.now_fixed_regexp.match(event["message"]) - ) + now_fixed = Counter( + m.groups()[0] + for event in events + if self.now_fixed_regexp.match(event["message"]) + for m in [self.validation_name_regexp.match(event["message"])] + if m is not None + )
133-136: Guard bootstrap progress access in MissingMustGatherLogs.Direct indexing bootstrap_node["progress"]["current_stage"] can KeyError.
Apply this diff:
@@ - if ( - bootstrap_node["progress"]["current_stage"] - not in eligible_bootstrap_stages - ): + current_stage = bootstrap_node.get("progress", {}).get("current_stage", "") + if current_stage not in eligible_bootstrap_stages: return Nonelog_analyzer/log_analyzer.py (2)
34-37: Cache all events separately; don’t conflate with “last attempt”.Current get_all_cluster_events returns only the last partition; this breaks callers (e.g., EventsInstallationAttempts) and contradicts the docstring. Also add a dedicated cache for all events.
Apply this diff:
@@ - self.logs_archive = logs_archive - self._metadata = None - self._cluster_events = None + self.logs_archive = logs_archive + self._metadata = None + self._all_events = None + self._cluster_events = NoneAnd update get_all_cluster_events:
@@ - def get_all_cluster_events(self) -> List[Dict[str, Any]]: + def get_all_cluster_events(self) -> List[Dict[str, Any]]: """Get all the cluster installation events.""" - if self._cluster_events is None: + if self._all_events is None: try: events_content = self.logs_archive.get("cluster_events.json") - all_events = json.loads(cast(str | bytes, events_content)) - - # Get the last partition (latest installation attempt) - self._cluster_events = self.partition_cluster_events(all_events)[-1] + self._all_events = json.loads(cast(str | bytes, events_content)) except Exception as e: logger.error("Failed to load cluster events: %s", e) - self._cluster_events = [] - - return self._cluster_events + self._all_events = [] + return self._all_events
55-75: Guard missing install_started_at in metadata cleaning.accessing md["cluster"]["install_started_at"] will KeyError when absent; entire analysis fails.
Apply this diff:
@@ - installation_start_time = dateutil.parser.isoparse( - md["cluster"]["install_started_at"] - ) + installation_started_at = md.get("cluster", {}).get("install_started_at") + if not installation_started_at: + return md + installation_start_time = dateutil.parser.isoparse(installation_started_at) @@ - all_hosts = md["cluster"]["hosts"] + all_hosts = md["cluster"].get("hosts", []) md["cluster"]["deleted_hosts"] = [ h for h in all_hosts if host_deleted_before_installation_started(h) ] md["cluster"]["hosts"] = [ h for h in all_hosts if not host_deleted_before_installation_started(h) ]log_analyzer/signatures/networking.py (1)
27-66: Add guards for hosts, machine_networks, and inventory JSON.Indexing cluster["machine_networks"][0]["cidr"] and host["inventory"] can raise; handle gracefully.
Apply this diff:
@@ - if not cluster.get("hosts"): + if not cluster.get("hosts"): return None host = cluster["hosts"][0] - inventory = json.loads(host["inventory"]) + try: + inventory = json.loads(host.get("inventory", "{}")) + except json.JSONDecodeError: + return None @@ - # The first one is the machine_cidr that will be used for network configuration - machine_cidr = ipaddress.ip_network(cluster["machine_networks"][0]["cidr"]) + # The first entry is the machine_cidr used for network configuration + mn = cluster.get("machine_networks") or [] + if not mn or not mn[0].get("cidr"): + return None + machine_cidr = ipaddress.ip_network(mn[0]["cidr"])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/workflows/mypy.yaml(1 hunks).github/workflows/pydocstyle.yaml(0 hunks)Dockerfile(1 hunks)Makefile(1 hunks)log_analyzer/__init__.py(1 hunks)log_analyzer/log_analyzer.py(1 hunks)log_analyzer/main.py(1 hunks)log_analyzer/signatures/__init__.py(1 hunks)log_analyzer/signatures/advanced_analysis.py(1 hunks)log_analyzer/signatures/base.py(1 hunks)log_analyzer/signatures/basic_info.py(1 hunks)log_analyzer/signatures/error_detection.py(1 hunks)log_analyzer/signatures/networking.py(1 hunks)log_analyzer/signatures/performance.py(1 hunks)log_analyzer/signatures/platform_specific.py(1 hunks)pyproject.toml(2 hunks)server.py(4 hunks)service_client/assisted_service_api.py(2 hunks)template.yaml(2 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/pydocstyle.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- log_analyzer/init.py
- template.yaml
- .github/workflows/mypy.yaml
- Makefile
- log_analyzer/signatures/base.py
- log_analyzer/main.py
🧰 Additional context used
🧬 Code graph analysis (9)
service_client/assisted_service_api.py (1)
service_client/exceptions.py (1)
sanitize_exceptions(24-58)
log_analyzer/signatures/error_detection.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_controller_logs(199-206)get_host_log_file(134-165)
log_analyzer/signatures/platform_specific.py (2)
log_analyzer/signatures/base.py (5)
ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)generate_table(71-75)log_analyzer/log_analyzer.py (4)
metadata(39-53)get_hostname(218-228)get_host_log_file(134-165)get_journal_log(167-197)
server.py (3)
log_analyzer/main.py (1)
analyze_cluster(14-71)metrics/metrics.py (1)
track_tool_usage(51-65)service_client/assisted_service_api.py (1)
InventoryClient(27-561)
log_analyzer/signatures/__init__.py (1)
log_analyzer/signatures/base.py (3)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)
log_analyzer/signatures/basic_info.py (2)
log_analyzer/signatures/base.py (5)
Signature(51-83)SignatureResult(15-48)analyze(59-68)format_time(78-83)generate_table(71-75)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_hostname(218-228)
log_analyzer/signatures/performance.py (2)
log_analyzer/signatures/base.py (6)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)generate_table(71-75)create_result(104-120)log_analyzer/log_analyzer.py (4)
get_last_install_cluster_events(77-88)metadata(39-53)get_events_by_host(126-132)get_hostname(218-228)
log_analyzer/signatures/networking.py (3)
log_analyzer/signatures/advanced_analysis.py (11)
operator_statuses_from_controller_logs(17-44)filter_operators(57-71)analyze(80-108)analyze(114-157)analyze(169-222)analyze(228-274)analyze(280-299)analyze(309-329)analyze(342-378)analyze(384-432)analyze(443-461)log_analyzer/signatures/base.py (5)
Signature(51-83)ErrorSignature(86-120)SignatureResult(15-48)analyze(59-68)create_result(104-120)log_analyzer/log_analyzer.py (2)
metadata(39-53)get_controller_logs(199-206)
log_analyzer/signatures/advanced_analysis.py (2)
log_analyzer/signatures/base.py (4)
Signature(51-83)SignatureResult(15-48)analyze(59-68)generate_table(71-75)log_analyzer/log_analyzer.py (8)
get_all_cluster_events(90-103)partition_cluster_events(106-124)get_last_install_cluster_events(77-88)metadata(39-53)get_must_gather(208-215)get_events_by_host(126-132)get_controller_logs(199-206)get_host_log_file(134-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-service-mcp-saas-main-on-pull-request
🔇 Additional comments (2)
Dockerfile (1)
17-17: LGTM!The addition correctly includes the log_analyzer directory in the Docker image, following the established pattern for other directories.
log_analyzer/signatures/__init__.py (1)
1-33: LGTM!The signature collection logic correctly uses introspection to gather all
Signaturesubclasses while filtering out base types. The deterministic sorting ensures consistent execution order.
| if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller": | ||
| return None | ||
| for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH): |
There was a problem hiding this comment.
Guard bootstrap progress access in ControllerFailedToStart.
Same risk here; avoid KeyError/TypeError on missing progress/current_stage.
Apply this diff:
@@
- if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller":
+ progress_stage = bootstrap[0].get("progress", {}).get("current_stage", "")
+ if progress_stage != "Waiting for controller":
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if bootstrap[0]["progress"]["current_stage"] != "Waiting for controller": | |
| return None | |
| for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH): | |
| progress_stage = bootstrap[0].get("progress", {}).get("current_stage", "") | |
| if progress_stage != "Waiting for controller": | |
| return None | |
| for base in (NEW_LOG_BUNDLE_PATH, OLD_LOG_BUNDLE_PATH): |
🤖 Prompt for AI Agents
In log_analyzer/signatures/advanced_analysis.py around lines 389 to 391, the
code directly indexes bootstrap[0]["progress"]["current_stage"] which can raise
KeyError/TypeError if bootstrap is empty or keys are missing; update the guard
to first verify bootstrap is a non-empty sequence and bootstrap[0] is a dict,
then obtain progress = bootstrap[0].get("progress") and current_stage =
progress.get("current_stage") if isinstance(progress, dict) (or use
bootstrap[0].get("progress", {}) and .get("current_stage")) and only compare
current_stage to "Waiting for controller"; if any check fails or current_stage
is not the expected string, return None to avoid exceptions.
| def analyze(self, log_analyzer) -> Optional[SignatureResult]: | ||
| """Analyze host status.""" | ||
| try: | ||
| metadata = log_analyzer.metadata | ||
| cluster = metadata["cluster"] | ||
|
|
||
| hosts = [] | ||
| for host in cluster["hosts"]: | ||
| info = host["status_info"] | ||
| role = host["role"] | ||
| inventory = json.loads(host["inventory"]) | ||
| if host.get("bootstrap", False): | ||
| role = "bootstrap" | ||
|
|
||
| hosts.append( | ||
| OrderedDict( | ||
| id=host["id"], | ||
| hostname=log_analyzer.get_hostname(host), | ||
| progress=host["progress"]["current_stage"], | ||
| status=host["status"], | ||
| role=role, | ||
| boot_mode=inventory.get("boot", {}).get( | ||
| "current_boot_mode", "N/A" | ||
| ), | ||
| status_info=str(info), | ||
| logs_info=host.get("logs_info", ""), | ||
| last_checked_in_at=self.format_time( | ||
| host.get("checked_in_at", str(datetime.min)) | ||
| ), | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Guard against missing keys and invalid JSON in host status parsing.
Lines 30, 38-40 directly parse JSON and access nested dictionaries without error handling. This will crash the signature on malformed inventory or missing keys.
Apply this diff:
hosts = []
for host in cluster["hosts"]:
info = host["status_info"]
- role = host["role"]
- inventory = json.loads(host["inventory"])
+ role = host.get("role")
+ try:
+ inventory = json.loads(host.get("inventory", "{}"))
+ except (json.JSONDecodeError, ValueError):
+ inventory = {}
if host.get("bootstrap", False):
role = "bootstrap"
hosts.append(
OrderedDict(
id=host["id"],
hostname=log_analyzer.get_hostname(host),
- progress=host["progress"]["current_stage"],
- status=host["status"],
+ progress=host.get("progress", {}).get("current_stage", "N/A"),
+ status=host.get("status", "N/A"),
role=role,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def analyze(self, log_analyzer) -> Optional[SignatureResult]: | |
| """Analyze host status.""" | |
| try: | |
| metadata = log_analyzer.metadata | |
| cluster = metadata["cluster"] | |
| hosts = [] | |
| for host in cluster["hosts"]: | |
| info = host["status_info"] | |
| role = host["role"] | |
| inventory = json.loads(host["inventory"]) | |
| if host.get("bootstrap", False): | |
| role = "bootstrap" | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=log_analyzer.get_hostname(host), | |
| progress=host["progress"]["current_stage"], | |
| status=host["status"], | |
| role=role, | |
| boot_mode=inventory.get("boot", {}).get( | |
| "current_boot_mode", "N/A" | |
| ), | |
| status_info=str(info), | |
| logs_info=host.get("logs_info", ""), | |
| last_checked_in_at=self.format_time( | |
| host.get("checked_in_at", str(datetime.min)) | |
| ), | |
| ) | |
| ) | |
| def analyze(self, log_analyzer) -> Optional[SignatureResult]: | |
| """Analyze host status.""" | |
| try: | |
| metadata = log_analyzer.metadata | |
| cluster = metadata["cluster"] | |
| hosts = [] | |
| for host in cluster["hosts"]: | |
| info = host["status_info"] | |
| - role = host["role"] | |
| role = host.get("role") | |
| try: | |
| inventory = json.loads(host.get("inventory", "{}")) | |
| except (json.JSONDecodeError, ValueError): | |
| inventory = {} | |
| if host.get("bootstrap", False): | |
| role = "bootstrap" | |
| hosts.append( | |
| OrderedDict( | |
| id=host["id"], | |
| hostname=log_analyzer.get_hostname(host), | |
| - progress=host["progress"]["current_stage"], | |
| progress=host.get("progress", {}).get("current_stage", "N/A"), | |
| status=host.get("status", "N/A"), | |
| role=role, | |
| boot_mode=inventory.get("boot", {}).get( | |
| "current_boot_mode", "N/A" | |
| ), | |
| status_info=str(info), | |
| logs_info=host.get("logs_info", ""), | |
| last_checked_in_at=self.format_time( | |
| host.get("checked_in_at", str(datetime.min)) | |
| ), | |
| ) | |
| ) |
🤖 Prompt for AI Agents
In log_analyzer/signatures/basic_info.py around lines 20 to 50, the host parsing
assumes keys exist and json.loads always succeeds; wrap inventory parsing in a
try/except and default to {} on JSONDecodeError or non-string input, replace
direct indexing like cluster["hosts"], host["status_info"], host["role"],
host["inventory"], host["id"], host["progress"]["current_stage"],
host["status"], host.get("logs_info") with safe .get calls and sensible defaults
(e.g., empty list/dict, "N/A", or datetime.min), and protect nested lookups
(progress/current_stage, inventory.get("boot") etc.) by using .get with defaults
so malformed or missing fields don't raise exceptions; ensure role is determined
via host.get("role") before overriding with bootstrap, and last_checked_in_at
uses the safe fallback when formatting.
| def _get_inventory(self, log_analyzer, host_id): | ||
| try: | ||
| agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs") | ||
| except FileNotFoundError: | ||
| return None | ||
| match = self.agent_log_regex.search(agent_logs) | ||
| return json.loads(match.group(1).replace('\\"', '"')) if match else None |
There was a problem hiding this comment.
Guard JSON parsing in _get_inventory helper.
Line 73 parses JSON from a regex match without error handling. Invalid JSON will cause the method to crash instead of returning None.
Apply this diff:
def _get_inventory(self, log_analyzer, host_id):
try:
agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs")
except FileNotFoundError:
return None
match = self.agent_log_regex.search(agent_logs)
- return json.loads(match.group(1).replace('\\"', '"')) if match else None
+ if not match:
+ return None
+ try:
+ return json.loads(match.group(1).replace('\\"', '"'))
+ except (json.JSONDecodeError, ValueError):
+ return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_inventory(self, log_analyzer, host_id): | |
| try: | |
| agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs") | |
| except FileNotFoundError: | |
| return None | |
| match = self.agent_log_regex.search(agent_logs) | |
| return json.loads(match.group(1).replace('\\"', '"')) if match else None | |
| def _get_inventory(self, log_analyzer, host_id): | |
| try: | |
| agent_logs = log_analyzer.get_host_log_file(host_id, "agent.logs") | |
| except FileNotFoundError: | |
| return None | |
| match = self.agent_log_regex.search(agent_logs) | |
| if not match: | |
| return None | |
| try: | |
| return json.loads(match.group(1).replace('\\"', '"')) | |
| except (json.JSONDecodeError, ValueError): | |
| return None |
🤖 Prompt for AI Agents
In log_analyzer/signatures/platform_specific.py around lines 67 to 73, the
helper _get_inventory calls json.loads on the regex match content without
guarding for invalid JSON; update the function to catch JSONDecodeError (and
ValueError for older Python) around the json.loads call and return None if
parsing fails, so the method doesn't crash on malformed JSON; ensure the
try/except only wraps the parsing step (not the file read) and still returns
None when no regex match is found.
|
@keitwb: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
f25ad94
into
openshift-assisted:master
Summary by CodeRabbit
New Features
Chores