[Observability Onboarding] Add data detection & loading indicators to onboarding flows #257870
Conversation
…tion-loading-indicators-to-onboarding-flows
…tion-loading-indicators-to-onboarding-flows
|
Pinging @elastic/obs-onboarding-team (Team:obs-onboarding) |
…tion-loading-indicators-to-onboarding-flows
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
Caution Review failedAn error occurred during the review process. Please try again later. 📝 WalkthroughWalkthroughReplaces fixed timeouts with event-driven data detection across onboarding flows; adds server-side has-data routes querying Elasticsearch; introduces time-window detection hook and progress UI components; propagates onboardingId into install commands and tests update to assert data-received indicators. Changes
Sequence DiagramsequenceDiagram
participant User as User/Browser
participant UI as Onboarding UI
participant Hook as useTimeWindowDataDetection
participant Server as Onboarding Server Route
participant ES as Elasticsearch
User->>UI: Blur window (leave tab)
UI->>Hook: trigger monitoring (sessionStartTime)
Hook->>Server: GET /internal/.../has-data?start=...
Server->>ES: search logs/metrics since start (terminate_after:1, size:0)
ES-->>Server: search result (hits>0 / 0)
Server-->>Hook: { hasData: boolean, hasLogs?, hasMetrics? }
alt hasData == true
Hook->>UI: hasData = true
UI->>UI: render GetStartedPanel (action links)
else hasData == false
Hook->>Hook: if elapsed > troubleshootingDelay -> isTroubleshootingVisible = true
UI->>UI: render ProgressIndicator (and troubleshooting if true)
end
Note over Hook,UI: Polling repeats at FETCH_INTERVAL until hasData
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (8)
x-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.ts (2)
70-74: Avoid duplicatinglogTypekeys across schema and index map.
logTypeis declared inline in the route and also encoded inCLOUDFORWARDER_INDEX_PATTERNS. If they drift,indexPatterncan becomeundefinedat runtime. Prefer a shared runtime key source.Also applies to: 86-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.ts` around lines 70 - 74, The route duplicates the logType keys inline in the params schema and in CLOUDFORWARDER_INDEX_PATTERNS which can drift; extract a shared constant (e.g., LOG_TYPE_KEYS or CLOUDFORWARDER_LOG_TYPES) that lists the allowed keys ("vpcflow", "elbaccess", "cloudtrail") and use that constant both when building the t.keyof(...) in the params schema and when constructing CLOUDFORWARDER_INDEX_PATTERNS, so the source of truth for log types is a single symbol referenced from the route's params and the index pattern map.
9-9: Replace wildcardio-tsimport with explicit imports.This keeps the module surface explicit and aligns with the TS import guideline.
♻️ Proposed refactor
-import * as t from 'io-ts'; +import { keyof as ioKeyof, string as ioString, type as ioType } from 'io-ts';- params: t.type({ - query: t.type({ - logType: t.keyof({ vpcflow: null, elbaccess: null, cloudtrail: null }), - start: t.string, + params: ioType({ + query: ioType({ + logType: ioKeyof({ vpcflow: null, elbaccess: null, cloudtrail: null }), + start: ioString, }), }),As per coding guidelines, "Prefer explicit import/exports over wildcard imports/exports in TypeScript."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.ts` at line 9, Replace the wildcard import "import * as t from 'io-ts';" with explicit named imports of only the symbols used in this file: scan for all occurrences of "t." in route.ts (e.g., t.type, t.string, t.union, etc.), collect those identifiers, and change the import to "import { <Identifiers> } from 'io-ts';" so the module surface is explicit and aligns with the TS import guideline.x-pack/solutions/observability/plugins/observability_onboarding/common/aws_cloudforwarder.ts (1)
31-35: Prefer an immutable typed map for index patterns.
Record<LogType, string>widens the object and leaves properties mutable. Consideras const+satisfies Readonly<Record<LogType, string>>for safer config semantics.♻️ Proposed refactor
-export const CLOUDFORWARDER_INDEX_PATTERNS: Record<LogType, string> = { +export const CLOUDFORWARDER_INDEX_PATTERNS = { vpcflow: 'logs-aws.vpcflow.otel-*', elbaccess: 'logs-aws.elbaccess.otel-*', cloudtrail: 'logs-aws.cloudtrail.otel-*', -}; +} as const satisfies Readonly<Record<LogType, string>>;As per coding guidelines, "Prefer
readonlyandas constfor immutable structures in TypeScript."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/common/aws_cloudforwarder.ts` around lines 31 - 35, CLOUDFORWARDER_INDEX_PATTERNS is declared as a mutable Record<LogType, string>; make it an immutable, fully-typed map by appending as const and using the satisfies Readonly<Record<LogType, string>> pattern so the object literal is inferred with literal types and its properties are readonly; update the declaration for CLOUDFORWARDER_INDEX_PATTERNS (referencing the LogType type) accordingly.x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts (1)
57-68: Consider typing the dynamic endpoint more safely.The
as anycast on line 60 bypasses type checking for the dynamic endpoint. While this is a pragmatic workaround for routes not known at compile time, it reduces type safety.Consider defining a union type for known endpoints or using a more specific type that captures the expected response shape:
type HasDataEndpoint = | '/internal/observability_onboarding/otel_host/has-data' | '/internal/observability_onboarding/otel_apm/has-data' | '/internal/observability_onboarding/cloudforwarder/has-data';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts` around lines 57 - 68, The dynamic endpoint passed into callApi inside the useFetcher in use_time_window_data_detection.ts is cast with `as any`, losing type safety; define a union type (e.g., HasDataEndpoint) listing the allowed endpoint string literals and use that type for the `endpoint` variable (and the callApi argument) so the call becomes type-checked against known routes, and update any function signatures or generics around useFetcher/callApi to accept HasDataEndpoint (and the expected response shape) instead of using `any`.x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsx (2)
45-52: Consider memoizinglogsLocatorParamsto maintain referential stability.The
logsLocatorParamsobject is recreated on every render since it depends onuseWiredStreams. IflogsLocator?.getRedirectUrl(logsLocatorParams)or downstream components perform equality checks, this could cause unnecessary recalculations.♻️ Suggested improvement
+ import React, { useEffect, useState, useMemo } from 'react'; ... const useWiredStreams = ingestionMode === 'wired'; - const logsLocatorParams = useWiredStreams ? { dataViewSpec: WIRED_ECS_DATA_VIEW_SPEC } : {}; + const logsLocatorParams = useMemo( + () => (useWiredStreams ? { dataViewSpec: WIRED_ECS_DATA_VIEW_SPEC } : {}), + [useWiredStreams] + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsx` around lines 45 - 52, logsLocatorParams is recreated each render causing referential instability; wrap its creation in a memo so it only changes when useWiredStreams or WIRED_ECS_DATA_VIEW_SPEC change. Replace the inline object creation with a memoized value (using React.useMemo) keyed on useWiredStreams and WIRED_ECS_DATA_VIEW_SPEC so calls like logsLocator?.getRedirectUrl(logsLocatorParams) receive a stable reference and avoid unnecessary recalculations.
75-109: Consider memoizingkubernetesActionLinksto prevent unnecessary re-renders.This array is recreated on every render. Since it depends on
metricsOnboardingEnabled,dashboardLocator,logsLocator, andlogsLocatorParams, memoizing it would improve performance and prevent unnecessary re-renders of child components.♻️ Suggested improvement
+ const kubernetesActionLinks: ActionLink[] = useMemo( + () => [ ...(metricsOnboardingEnabled ? [ { id: CLUSTER_OVERVIEW_DASHBOARD_ID, label: i18n.translate( 'xpack.observability_onboarding.kubernetesPanel.exploreDashboard', { defaultMessage: 'Explore Kubernetes cluster' } ), title: i18n.translate( 'xpack.observability_onboarding.kubernetesPanel.monitoringCluster', { defaultMessage: 'Overview your Kubernetes cluster with this pre-made dashboard', } ), requires: 'metrics' as const, href: dashboardLocator?.getRedirectUrl({ dashboardId: CLUSTER_OVERVIEW_DASHBOARD_ID, }) ?? '', }, ] : []), { id: 'logs', title: i18n.translate('xpack.observability_onboarding.kubernetesPanel.logsTitle', { defaultMessage: 'View and analyze your logs:', }), label: i18n.translate('xpack.observability_onboarding.kubernetesPanel.logsLabel', { defaultMessage: 'Explore logs', }), requires: 'logs' as const, href: logsLocator?.getRedirectUrl(logsLocatorParams) ?? '', }, + ], + [metricsOnboardingEnabled, dashboardLocator, logsLocator, logsLocatorParams] + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsx` around lines 75 - 109, The kubernetesActionLinks array is rebuilt on every render; wrap its creation in React's useMemo to memoize it and avoid unnecessary re-renders, using [metricsOnboardingEnabled, dashboardLocator, logsLocator, logsLocatorParams] as the dependency array so kubernetesActionLinks is recomputed only when those values change (keep the existing structure and values and reference the same symbols: kubernetesActionLinks, metricsOnboardingEnabled, dashboardLocator, logsLocator, logsLocatorParams).x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx (1)
93-97: Consider ensuringonDataReceivedcallback stability to prevent repeated invocations.The effect depends on
onDataReceived, which is passed as an inline arrow function from the parent (() => setDataReceived(true)). Each parent render creates a new function reference, causing this effect to re-run. While functionally harmless (setting state to the same value is a no-op), it's inefficient.♻️ Suggested fix in the parent component (index.tsx)
+ import React, { useEffect, useState, useCallback } from 'react'; ... const [dataReceived, setDataReceived] = useState(false); + const handleDataReceived = useCallback(() => setDataReceived(true), []); ... <DataIngestStatus onboardingId={data.onboardingId} onboardingFlowType="kubernetes" dataset="kubernetes" integration="kubernetes" actionLinks={kubernetesActionLinks} - onDataReceived={() => setDataReceived(true)} + onDataReceived={handleDataReceived} />🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx` around lines 93 - 97, The effect in useEffect in data_ingest_status.tsx depends on onDataReceived which is passed from the parent as an inline arrow, causing a new function reference each render; fix by stabilizing the callback in the parent (where you currently pass () => setDataReceived(true))—wrap that handler with useCallback (e.g., const handleDataReceived = useCallback(() => setDataReceived(true), [setDataReceived]) ) and pass handleDataReceived into the child so useEffect (in data_ingest_status.tsx) no longer retriggers due to a changing onDataReceived reference.x-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/route.ts (1)
141-192: Split the scripted fallback out of the main polling query.The
_rtclause is runtime-scripted, and this handler polls broadlogs-*/logs.*andmetrics-*/metrics.*patterns without a time window. That makes the "no data yet" path much more expensive than it needs to be. Consider keeping the indexed terms in the first search and moving the runtime fallback to a second query scoped to the passthrough streams.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/route.ts` around lines 141 - 192, The current combined search builds query (variable query) with both indexed terms and the scripted runtime field 'resource.attributes.onboarding.id._rt' via runtimeMappings and then polls broad index patterns in the two searches (logsResult, metricsResult), which is inefficient; change it to a two-step approach: 1) call elasticsearch.client.asCurrentUser.search for logs and metrics using the existing query but remove runtime_mappings and drop any should-clause term that references 'resource.attributes.onboarding.id._rt' so the first fast check only uses indexed fields (termQuery results); 2) only if those first searches return no hits, run a second targeted search scoped to passthrough streams (narrow index pattern or filter for logs.otel/passthrough streams) that includes runtime_mappings (runtimeMappings) and a query containing just the runtime '_rt' should-clause; update the code around runtimeMappings, query, and the two search calls (logsResult, metricsResult) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/host_otel.spec.ts`:
- Around line 72-76: Replace the fixed 2-minute sleep (page.waitForTimeout(2 *
60000)) with a condition-based wait that polls the actual signals that OTEL data
has landed for this branch; for example, use page.waitForSelector to wait for a
specific dashboard panel or Discover hit-count element to appear, or use
page.waitForResponse/ page.waitForFunction to wait for the API that returns
documents/metrics (replace the timeout call in host_otel.spec.ts where
page.waitForTimeout is used). Ensure the new wait targets branch-specific UI
elements or network endpoints that indicate data readiness (e.g., Discover
document count, a saved search result, or a specific visualization DOM node) and
include a reasonable timeout for the wait operation instead of a fixed 2-minute
delay.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/kubernetes_otel.spec.ts`:
- Around line 105-109: Remove the unconditional await page.waitForTimeout(2 *
60000); and replace it with a deterministic polling-based readiness check:
implement a retry loop (or use page.waitForFunction / page.waitForResponse) that
periodically verifies the specific readiness signal (e.g., the presence of a
dashboard/Discover element, expected network response, or API endpoint returning
data) until success or a max timeout, keeping the existing overall timeout but
eliminating the fixed 120s sleep; target the call site by replacing the
page.waitForTimeout invocation in kubernetes_otel.spec.ts and reference
page.waitForFunction, page.waitForResponse, or a helper like
waitForCondition/checkDataReady to perform the checks.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_host_flow.page.ts`:
- Around line 67-72: The assertDataReceivedIndicator helper currently uses the
default expect timeout and has no explicit return type; update the method
signature to declare an explicit return type (Promise<void>) and call
toBeVisible with an ingestion-safe timeout option (e.g., await
expect(this.dataReceivedIndicator, '...').toBeVisible({ timeout:
INGESTION_SAFE_TIMEOUT_MS })) or use the project's shared ingestion timeout
constant, ensuring you reference the assertDataReceivedIndicator method and the
dataReceivedIndicator locator when making the change.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_kubernetes_flow.page.ts`:
- Around line 89-94: The assertDataReceivedIndicator method uses Playwright's
default expect timeout which is too short for ingestion waits and lacks an
explicit return type; update the public method signature to return Promise<void>
and call expect(this.dataReceivedIndicator, ...) with an increased timeout
option (e.g., { timeout: 30000 } or a shared constant) so the assertion waits
longer for onboarding data to arrive while keeping the descriptive message
intact.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.ts`:
- Around line 15-30: Change both function signatures to accept a typed
Elasticsearch error (e.g., an Error object with optional statusCode/status and
meta.body fields) instead of any; update isNoShardsAvailableError to use that
typed parameter when inspecting meta.body.error.type and
meta.body.error.root_cause[0].type, and modify throwHasDataSearchError to
rethrow the original status code by boomifying the original error (use
Boom.boomify or equivalent) with statusCode taken from error.statusCode ||
error.status || error.meta?.statusCode before falling back to Boom.internal, and
include the original error message; reference the functions
isNoShardsAvailableError and throwHasDataSearchError when applying these
changes.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_apm/route.ts`:
- Around line 62-65: The route's input schema (the params object with params:
t.type({ query: t.type({ start: t.string }) })) accepts any string for
query.start but that value is immediately used as a date/timestamp; update the
route boundary to validate start as a timestamp (e.g., use an io-ts number type
or a branded/refined string that matches an epoch/ms timestamp/ISO date) and add
explicit error handling to return a typed 400 Bad Request when validation fails;
adjust the params schema and the route handler code that reads
params.query.start so it only proceeds when the validated/parsed timestamp is
present, and return a clear bad-request response (rather than letting the
downstream search error) when parsing/validation fails.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts`:
- Around line 87-90: The route's query codec currently allows any string for
`start` (params -> query -> start) so malformed timestamps aren't rejected;
update the codec for `params` (the query.start field in route.ts) to validate
that `start` is a proper timestamp (e.g., an ISO8601 string or numeric epoch)
using a refined/branded io-ts type or a custom validator instead of plain
t.string, and ensure the handler returns a 400 typed error when validation fails
so malformed `start` values are rejected at the route boundary.
---
Nitpick comments:
In
`@x-pack/solutions/observability/plugins/observability_onboarding/common/aws_cloudforwarder.ts`:
- Around line 31-35: CLOUDFORWARDER_INDEX_PATTERNS is declared as a mutable
Record<LogType, string>; make it an immutable, fully-typed map by appending as
const and using the satisfies Readonly<Record<LogType, string>> pattern so the
object literal is inferred with literal types and its properties are readonly;
update the declaration for CLOUDFORWARDER_INDEX_PATTERNS (referencing the
LogType type) accordingly.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx`:
- Around line 93-97: The effect in useEffect in data_ingest_status.tsx depends
on onDataReceived which is passed from the parent as an inline arrow, causing a
new function reference each render; fix by stabilizing the callback in the
parent (where you currently pass () => setDataReceived(true))—wrap that handler
with useCallback (e.g., const handleDataReceived = useCallback(() =>
setDataReceived(true), [setDataReceived]) ) and pass handleDataReceived into the
child so useEffect (in data_ingest_status.tsx) no longer retriggers due to a
changing onDataReceived reference.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsx`:
- Around line 45-52: logsLocatorParams is recreated each render causing
referential instability; wrap its creation in a memo so it only changes when
useWiredStreams or WIRED_ECS_DATA_VIEW_SPEC change. Replace the inline object
creation with a memoized value (using React.useMemo) keyed on useWiredStreams
and WIRED_ECS_DATA_VIEW_SPEC so calls like
logsLocator?.getRedirectUrl(logsLocatorParams) receive a stable reference and
avoid unnecessary recalculations.
- Around line 75-109: The kubernetesActionLinks array is rebuilt on every
render; wrap its creation in React's useMemo to memoize it and avoid unnecessary
re-renders, using [metricsOnboardingEnabled, dashboardLocator, logsLocator,
logsLocatorParams] as the dependency array so kubernetesActionLinks is
recomputed only when those values change (keep the existing structure and values
and reference the same symbols: kubernetesActionLinks, metricsOnboardingEnabled,
dashboardLocator, logsLocator, logsLocatorParams).
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts`:
- Around line 57-68: The dynamic endpoint passed into callApi inside the
useFetcher in use_time_window_data_detection.ts is cast with `as any`, losing
type safety; define a union type (e.g., HasDataEndpoint) listing the allowed
endpoint string literals and use that type for the `endpoint` variable (and the
callApi argument) so the call becomes type-checked against known routes, and
update any function signatures or generics around useFetcher/callApi to accept
HasDataEndpoint (and the expected response shape) instead of using `any`.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.ts`:
- Around line 70-74: The route duplicates the logType keys inline in the params
schema and in CLOUDFORWARDER_INDEX_PATTERNS which can drift; extract a shared
constant (e.g., LOG_TYPE_KEYS or CLOUDFORWARDER_LOG_TYPES) that lists the
allowed keys ("vpcflow", "elbaccess", "cloudtrail") and use that constant both
when building the t.keyof(...) in the params schema and when constructing
CLOUDFORWARDER_INDEX_PATTERNS, so the source of truth for log types is a single
symbol referenced from the route's params and the index pattern map.
- Line 9: Replace the wildcard import "import * as t from 'io-ts';" with
explicit named imports of only the symbols used in this file: scan for all
occurrences of "t." in route.ts (e.g., t.type, t.string, t.union, etc.), collect
those identifiers, and change the import to "import { <Identifiers> } from
'io-ts';" so the module surface is explicit and aligns with the TS import
guideline.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/route.ts`:
- Around line 141-192: The current combined search builds query (variable query)
with both indexed terms and the scripted runtime field
'resource.attributes.onboarding.id._rt' via runtimeMappings and then polls broad
index patterns in the two searches (logsResult, metricsResult), which is
inefficient; change it to a two-step approach: 1) call
elasticsearch.client.asCurrentUser.search for logs and metrics using the
existing query but remove runtime_mappings and drop any should-clause term that
references 'resource.attributes.onboarding.id._rt' so the first fast check only
uses indexed fields (termQuery results); 2) only if those first searches return
no hits, run a second targeted search scoped to passthrough streams (narrow
index pattern or filter for logs.otel/passthrough streams) that includes
runtime_mappings (runtimeMappings) and a query containing just the runtime '_rt'
should-clause; update the code around runtimeMappings, query, and the two search
calls (logsResult, metricsResult) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0794ac4b-2162-41a9-9bc2-7fada33ebb33
📒 Files selected for processing (20)
x-pack/solutions/observability/plugins/observability_onboarding/common/aws_cloudforwarder.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/host_otel.spec.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/kubernetes_otel.spec.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_host_flow.page.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_kubernetes_flow.page.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/cloudforwarder/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_apm/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.test.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/otel_kubernetes_panel.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/get_started_panel.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.tsx-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_apm/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts
| /** | ||
| * Additional buffer to ensure data has propagated | ||
| * to dashboards and Discover before navigating. | ||
| */ | ||
| await page.waitForTimeout(2 * 60000); |
There was a problem hiding this comment.
Avoid fixed 2-minute buffer; wait on branch-specific readiness instead.
Line 76 adds a static delay that slows all runs and is still less reliable than condition-based waiting.
Proposed fix
- await page.waitForTimeout(2 * 60000);
+ const readinessLinkTestId =
+ useWiredStreams || isLogsEssentialsMode
+ ? 'observabilityOnboardingDataIngestStatusActionLink-logs'
+ : 'observabilityOnboardingDataIngestStatusActionLink-metrics';
+ await page.getByTestId(readinessLinkTestId).waitFor({
+ state: 'visible',
+ timeout: 2 * 60_000,
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/host_otel.spec.ts`
around lines 72 - 76, Replace the fixed 2-minute sleep (page.waitForTimeout(2 *
60000)) with a condition-based wait that polls the actual signals that OTEL data
has landed for this branch; for example, use page.waitForSelector to wait for a
specific dashboard panel or Discover hit-count element to appear, or use
page.waitForResponse/ page.waitForFunction to wait for the API that returns
documents/metrics (replace the timeout call in host_otel.spec.ts where
page.waitForTimeout is used). Ensure the new wait targets branch-specific UI
elements or network endpoints that indicate data readiness (e.g., Discover
document count, a saved search result, or a specific visualization DOM node) and
include a reasonable timeout for the wait operation instead of a fixed 2-minute
delay.
| /** | ||
| * Additional buffer to ensure data has propagated | ||
| * to dashboards and Discover before navigating. | ||
| */ | ||
| await page.waitForTimeout(2 * 60000); |
There was a problem hiding this comment.
Remove unconditional 2-minute sleep.
Line 109 introduces a fixed 120s delay on every run, which inflates suite time and still isn’t a deterministic readiness check.
Proposed fix
- await page.waitForTimeout(2 * 60000);
+ const readinessLinkTestId =
+ useWiredStreams || isLogsEssentialsMode
+ ? 'observabilityOnboardingDataIngestStatusActionLink-logs'
+ : 'observabilityOnboardingDataIngestStatusActionLink-kubernetes_otel-cluster-overview';
+ await page.getByTestId(readinessLinkTestId).waitFor({
+ state: 'visible',
+ timeout: 2 * 60_000,
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/kubernetes_otel.spec.ts`
around lines 105 - 109, Remove the unconditional await page.waitForTimeout(2 *
60000); and replace it with a deterministic polling-based readiness check:
implement a retry loop (or use page.waitForFunction / page.waitForResponse) that
periodically verifies the specific readiness signal (e.g., the presence of a
dashboard/Discover element, expected network response, or API endpoint returning
data) until success or a max timeout, keeping the existing overall timeout but
eliminating the fixed 120s sleep; target the call site by replacing the
page.waitForTimeout invocation in kubernetes_otel.spec.ts and reference
page.waitForFunction, page.waitForResponse, or a helper like
waitForCondition/checkDataReady to perform the checks.
| public async assertDataReceivedIndicator() { | ||
| await expect( | ||
| this.dataReceivedIndicator, | ||
| 'Data received indicator should be visible' | ||
| ).toBeVisible(); | ||
| } |
There was a problem hiding this comment.
Use an ingestion-safe timeout and explicit return type in the new assertion helper.
Line 71 relies on default expect timeout, which can be too short for real data arrival in stateful onboarding tests.
Proposed fix
- public async assertDataReceivedIndicator() {
+ public async assertDataReceivedIndicator(timeoutMs: number = 5 * 60_000): Promise<void> {
await expect(
this.dataReceivedIndicator,
'Data received indicator should be visible'
- ).toBeVisible();
+ ).toBeVisible({ timeout: timeoutMs });
}As per coding guidelines, "Prefer explicit return types for public APIs and exported functions in TypeScript."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public async assertDataReceivedIndicator() { | |
| await expect( | |
| this.dataReceivedIndicator, | |
| 'Data received indicator should be visible' | |
| ).toBeVisible(); | |
| } | |
| public async assertDataReceivedIndicator(timeoutMs: number = 5 * 60_000): Promise<void> { | |
| await expect( | |
| this.dataReceivedIndicator, | |
| 'Data received indicator should be visible' | |
| ).toBeVisible({ timeout: timeoutMs }); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_host_flow.page.ts`
around lines 67 - 72, The assertDataReceivedIndicator helper currently uses the
default expect timeout and has no explicit return type; update the method
signature to declare an explicit return type (Promise<void>) and call
toBeVisible with an ingestion-safe timeout option (e.g., await
expect(this.dataReceivedIndicator, '...').toBeVisible({ timeout:
INGESTION_SAFE_TIMEOUT_MS })) or use the project's shared ingestion timeout
constant, ensuring you reference the assertDataReceivedIndicator method and the
dataReceivedIndicator locator when making the change.
| public async assertDataReceivedIndicator() { | ||
| await expect( | ||
| this.dataReceivedIndicator, | ||
| 'Data received indicator should be visible' | ||
| ).toBeVisible(); | ||
| } |
There was a problem hiding this comment.
Increase assertion timeout for ingestion waits and type the method return.
Line 93 currently uses the default Playwright expect timeout, which is often too short for onboarding data ingestion. This can fail before data arrives.
Proposed fix
- public async assertDataReceivedIndicator() {
+ public async assertDataReceivedIndicator(timeoutMs: number = 5 * 60_000): Promise<void> {
await expect(
this.dataReceivedIndicator,
'Data received indicator should be visible'
- ).toBeVisible();
+ ).toBeVisible({ timeout: timeoutMs });
}As per coding guidelines, "Prefer explicit return types for public APIs and exported functions in TypeScript."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_kubernetes_flow.page.ts`
around lines 89 - 94, The assertDataReceivedIndicator method uses Playwright's
default expect timeout which is too short for ingestion waits and lacks an
explicit return type; update the public method signature to return Promise<void>
and call expect(this.dataReceivedIndicator, ...) with an increased timeout
option (e.g., { timeout: 30000 } or a shared constant) so the assertion waits
longer for onboarding data to arrive while keeping the descriptive message
intact.
| export function isNoShardsAvailableError(error: any): boolean { | ||
| const errorType = error?.meta?.body?.error?.type; | ||
| const rootCauseType = error?.meta?.body?.error?.root_cause?.[0]?.type; | ||
|
|
||
| if ( | ||
| errorType === 'search_phase_execution_exception' && | ||
| rootCauseType === 'no_shard_available_action_exception' | ||
| ) { | ||
| return true; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| export function throwHasDataSearchError(error: any): never { | ||
| throw Boom.internal(`Elasticsearch responded with an error. ${error.message}`); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.tsRepository: elastic/kibana
Length of output: 2046
🏁 Script executed:
rg -l "throwHasDataSearchError|isNoShardsAvailableError" x-pack/solutions/observability --type ts --type tsxRepository: elastic/kibana
Length of output: 2046
🏁 Script executed:
rg "@elastic/elasticsearch" x-pack/solutions/observability/plugins/observability_onboarding/server --type ts -A 2 | head -30Repository: elastic/kibana
Length of output: 2046
Type error parameter and preserve Elasticsearch error status codes.
Both functions currently accept error: any and throwHasDataSearchError unconditionally wraps failures in Boom.internal(), which always returns HTTP 500. This erases upstream Elasticsearch error status codes—401/403/400 responses become indistinguishable from actual internal errors in the UI. Type the error parameter to match the Elasticsearch client's error contract and preserve the original status code when available.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.ts`
around lines 15 - 30, Change both function signatures to accept a typed
Elasticsearch error (e.g., an Error object with optional statusCode/status and
meta.body fields) instead of any; update isNoShardsAvailableError to use that
typed parameter when inspecting meta.body.error.type and
meta.body.error.root_cause[0].type, and modify throwHasDataSearchError to
rethrow the original status code by boomifying the original error (use
Boom.boomify or equivalent) with statusCode taken from error.statusCode ||
error.status || error.meta?.statusCode before falling back to Boom.internal, and
include the original error message; reference the functions
isNoShardsAvailableError and throwHasDataSearchError when applying these
changes.
| params: t.type({ | ||
| query: t.type({ | ||
| start: t.string, | ||
| }), |
There was a problem hiding this comment.
Validate start as a timestamp at the route boundary.
t.string is too loose for a value that is immediately used in a date range query. A malformed start currently goes down the generic search-error path instead of returning a clear bad-request response.
As per coding guidelines, **/*.{ts,tsx}: Handle errors explicitly; return typed errors from APIs when possible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_apm/route.ts`
around lines 62 - 65, The route's input schema (the params object with params:
t.type({ query: t.type({ start: t.string }) })) accepts any string for
query.start but that value is immediately used as a date/timestamp; update the
route boundary to validate start as a timestamp (e.g., use an io-ts number type
or a branded/refined string that matches an epoch/ms timestamp/ISO date) and add
explicit error handling to return a typed 400 Bad Request when validation fails;
adjust the params schema and the route handler code that reads
params.query.start so it only proceeds when the validated/parsed timestamp is
present, and return a clear bad-request response (rather than letting the
downstream search error) when parsing/validation fails.
| params: t.type({ | ||
| query: t.type({ | ||
| start: t.string, | ||
| }), |
There was a problem hiding this comment.
Validate start as a timestamp at the route boundary.
Right now any string gets through the API contract, so a malformed start falls into the generic search-error path instead of being rejected as a bad request. Tightening the query codec here will keep this endpoint deterministic.
As per coding guidelines, **/*.{ts,tsx}: Handle errors explicitly; return typed errors from APIs when possible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts`
around lines 87 - 90, The route's query codec currently allows any string for
`start` (params -> query -> start) so malformed timestamps aren't rejected;
update the codec for `params` (the query.start field in route.ts) to validate
that `start` is a proper timestamp (e.g., an ISO8601 string or numeric epoch)
using a refined/branded io-ts type or a custom validator instead of plain
t.string, and ensure the handler returns a 400 typed error when validation fails
so malformed `start` values are rejected at the route boundary.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.ts (1)
19-28: Consider adding runtime guard for emptyonboardingId.The
onboardingIdparameter is required at the type level, but an empty string could still be passed at runtime, resulting inonboarding.id=being set in the Helm command. If the caller can guarantee non-empty values, this is fine; otherwise, consider adding a guard.Optional defensive validation
}: { isMetricsOnboardingEnabled: boolean; isManagedOtlpServiceAvailable: boolean; managedOtlpEndpointUrl: string; elasticsearchUrl: string; apiKeyEncoded: string; agentVersion: string; useWiredStreams?: boolean; onboardingId: string; }): string { + if (!onboardingId) { + throw new Error('onboardingId is required'); + } const ingestEndpointUrl = isManagedOtlpServiceAvailable🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.ts` around lines 19 - 28, The code accepts onboardingId but doesn't guard against an empty string; update the command-building logic in build_install_stack_command (where onboardingId is appended to the Helm/CLI args) to only include the onboarding.id flag when onboardingId is a non-empty trimmed string (e.g., check onboardingId && onboardingId.trim().length > 0) so you don't emit `--set onboarding.id=`; reference the onboardingId parameter and the function that constructs the Helm command to apply this conditional guard and ensure any whitespace-only values are treated as empty.x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts (2)
25-34: Add explicit return type for the exported function.As per coding guidelines, public APIs should have explicit return types.
♻️ Add explicit return type
+interface UseTimeWindowDataDetectionResult { + hasData: boolean; + isTroubleshootingVisible: boolean; +} + export function useTimeWindowDataDetection({ isMonitoringActive, sessionStartTime, fetchInterval, troubleshootingDelay, flowType, onboardingId, endpoint, extraQueryParams, -}: UseTimeWindowDataDetectionOptions) { +}: UseTimeWindowDataDetectionOptions): UseTimeWindowDataDetectionResult {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts` around lines 25 - 34, The exported function useTimeWindowDataDetection currently lacks an explicit return type; update its declaration to include the precise return type that the hook returns (e.g., ReturnType or a specific interface describing the returned values) to satisfy the public API guideline. Locate the function definition useTimeWindowDataDetection(...) and add the explicit return type annotation after the parameter list, ensuring you import or declare any return type interface (or use ReturnType<typeof useTimeWindowDataDetectionImplementation> if applicable) so TypeScript infers no any types for the exported API.
81-91: The telemetry step is hardcoded to 'logs-ingest'.This may not be accurate for all flow types using this hook (e.g., metrics-only or traces flows). Consider making the step configurable via the options interface if different flows need distinct telemetry labels.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts` around lines 81 - 91, The telemetry step string is hardcoded to 'logs-ingest' in the useEffect inside the useTimeWindowDataDetection hook; update the hook signature (useTimeWindowDataDetection) and its options interface to accept a telemetryStep (or derive it from flowType) and replace the hardcoded 'logs-ingest' with that value when calling analytics.reportEvent; ensure callers that invoke useTimeWindowDataDetection (and any tests) supply the appropriate telemetryStep for metrics/traces-only flows and keep existing behavior by defaulting telemetryStep to 'logs-ingest' if no option is provided, leaving other symbols (hasDataResponse, dataReceivedTelemetrySent, analytics, OBSERVABILITY_ONBOARDING_TELEMETRY_EVENT) unchanged.x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.ts (1)
29-31: Consider preserving the original error for debugging.The current implementation only includes
error.messagein the Boom error. For better debugging in server logs, consider preserving the original error.♻️ Optional: Preserve original error for debugging
export function throwHasDataSearchError(error: any): never { - throw Boom.internal(`Elasticsearch responded with an error. ${error.message}`); + throw Boom.internal(`Elasticsearch responded with an error. ${error.message}`, error); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.ts` around lines 29 - 31, The thrown Boom error in throwHasDataSearchError only includes error.message; update throwHasDataSearchError to preserve the original error by attaching it to the Boom error (e.g., set the original error on the Boom object's data or cause) so logs/debuggers can access full error details while keeping the existing message produced by Boom.internal.x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsx (1)
98-114: Consider deferring sessionStartTime initialization for consistency.Unlike
otel_apm(which setssessionStartTimewhen monitoring becomes active to narrow the detection window), this flow initializes it on mount. For APM, this reduces false positives from other services already ingesting data.For host monitoring, this may be less critical since existing OTel host data is less common, but the pattern inconsistency across flows could be worth aligning if false positives become an issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsx` around lines 98 - 114, sessionStartTime is created on mount but should be set when monitoring actually becomes active to match the otel_apm flow and avoid false positives; change the initialization of sessionStartTime so it is set when isMonitoringStepActive turns true (useState to hold a nullable string and a useEffect that sets sessionStartTime = new Date().toISOString() when isMonitoringStepActive becomes true), keep the useWindowBlurDataMonitoringTrigger usage and ensure useTimeWindowDataDetection receives the updated sessionStartTime (or a safe empty/undefined value until set) so the detection window only starts once monitoring is active.x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx (1)
34-41: Prefer const-arrow exported component with explicit return type.This exported React component currently uses a function declaration with inferred return type.
As per coding guidelines: "Prefer const arrow functions in TypeScript." and "Prefer explicit return types for public APIs and exported functions in TypeScript."Proposed refactor
-export function DataIngestStatus({ +export const DataIngestStatus = ({ onboardingId, onboardingFlowType, dataset, integration, actionLinks, onDataReceived, -}: Props) { +}: Props): JSX.Element => { @@ -} +};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx` around lines 34 - 41, Replace the exported function declaration DataIngestStatus(...) with a const arrow component and add an explicit return type: declare "export const DataIngestStatus: (props: Props) => JSX.Element" or "export const DataIngestStatus: React.FC<Props>" and implement it as "const DataIngestStatus = ({ onboardingId, onboardingFlowType, dataset, integration, actionLinks, onDataReceived }: Props) => { return ( ... ); }"; keep the same prop names and body, ensure Props type is used in the signature and the export remains the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx`:
- Around line 61-63: The readiness/filtering currently only checks for requires
=== 'metrics' and defaults unknown requires to hasData, which misses traces-only
links; add explicit handling for requires === 'traces' by introducing a
needsTraces boolean (e.g., const needsTraces = actionLinks.some(al =>
al.requires === 'traces')) and update isReady to consider traces (const isReady
= needsMetrics ? hasMetrics : needsTraces ? hasTraces : hasData), and likewise
adjust any filtering logic that maps requires -> readiness so that 'traces'
checks use hasTraces while preserving the existing default behavior for unknown
values.
- Around line 93-97: The effect currently depends on both isReady and
onDataReceived so when the parent passes a new inline callback the effect can
re-run and call onDataReceived again; to fix, keep the effect dependent only on
isReady and call a stable ref to the latest callback: create a ref (e.g.,
latestOnDataReceivedRef) and update it whenever onDataReceived prop changes,
then in the useEffect for readiness (useEffect with [isReady]) call
latestOnDataReceivedRef.current?.() when isReady is true; update references to
onDataReceived in this component to use the ref so repeated identity changes
don’t retrigger the readiness side effect.
---
Nitpick comments:
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx`:
- Around line 34-41: Replace the exported function declaration
DataIngestStatus(...) with a const arrow component and add an explicit return
type: declare "export const DataIngestStatus: (props: Props) => JSX.Element" or
"export const DataIngestStatus: React.FC<Props>" and implement it as "const
DataIngestStatus = ({ onboardingId, onboardingFlowType, dataset, integration,
actionLinks, onDataReceived }: Props) => { return ( ... ); }"; keep the same
prop names and body, ensure Props type is used in the signature and the export
remains the same.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.ts`:
- Around line 19-28: The code accepts onboardingId but doesn't guard against an
empty string; update the command-building logic in build_install_stack_command
(where onboardingId is appended to the Helm/CLI args) to only include the
onboarding.id flag when onboardingId is a non-empty trimmed string (e.g., check
onboardingId && onboardingId.trim().length > 0) so you don't emit `--set
onboarding.id=`; reference the onboardingId parameter and the function that
constructs the Helm command to apply this conditional guard and ensure any
whitespace-only values are treated as empty.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsx`:
- Around line 98-114: sessionStartTime is created on mount but should be set
when monitoring actually becomes active to match the otel_apm flow and avoid
false positives; change the initialization of sessionStartTime so it is set when
isMonitoringStepActive turns true (useState to hold a nullable string and a
useEffect that sets sessionStartTime = new Date().toISOString() when
isMonitoringStepActive becomes true), keep the
useWindowBlurDataMonitoringTrigger usage and ensure useTimeWindowDataDetection
receives the updated sessionStartTime (or a safe empty/undefined value until
set) so the detection window only starts once monitoring is active.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts`:
- Around line 25-34: The exported function useTimeWindowDataDetection currently
lacks an explicit return type; update its declaration to include the precise
return type that the hook returns (e.g., ReturnType or a specific interface
describing the returned values) to satisfy the public API guideline. Locate the
function definition useTimeWindowDataDetection(...) and add the explicit return
type annotation after the parameter list, ensuring you import or declare any
return type interface (or use ReturnType<typeof
useTimeWindowDataDetectionImplementation> if applicable) so TypeScript infers no
any types for the exported API.
- Around line 81-91: The telemetry step string is hardcoded to 'logs-ingest' in
the useEffect inside the useTimeWindowDataDetection hook; update the hook
signature (useTimeWindowDataDetection) and its options interface to accept a
telemetryStep (or derive it from flowType) and replace the hardcoded
'logs-ingest' with that value when calling analytics.reportEvent; ensure callers
that invoke useTimeWindowDataDetection (and any tests) supply the appropriate
telemetryStep for metrics/traces-only flows and keep existing behavior by
defaulting telemetryStep to 'logs-ingest' if no option is provided, leaving
other symbols (hasDataResponse, dataReceivedTelemetrySent, analytics,
OBSERVABILITY_ONBOARDING_TELEMETRY_EVENT) unchanged.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.ts`:
- Around line 29-31: The thrown Boom error in throwHasDataSearchError only
includes error.message; update throwHasDataSearchError to preserve the original
error by attaching it to the Boom error (e.g., set the original error on the
Boom object's data or cause) so logs/debuggers can access full error details
while keeping the existing message produced by Boom.internal.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: e3b66f59-4623-4721-8dd9-6d0e055d9328
📒 Files selected for processing (20)
x-pack/solutions/observability/plugins/observability_onboarding/common/aws_cloudforwarder.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/host_otel.spec.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/kubernetes_otel.spec.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_host_flow.page.tsx-pack/solutions/observability/plugins/observability_onboarding/e2e/playwright/stateful/pom/pages/otel_kubernetes_flow.page.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/cloudforwarder/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_apm/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.test.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/build_install_stack_command.tsx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_kubernetes/otel_kubernetes_panel.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/get_started_panel.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.tsx-pack/solutions/observability/plugins/observability_onboarding/server/lib/handle_has_data_search_error.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/cloudforwarder/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_apm/route.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts
| const needsMetrics = actionLinks.some((actionLink) => actionLink.requires === 'metrics'); | ||
| const isReady = needsMetrics ? hasMetrics : hasData; | ||
|
|
There was a problem hiding this comment.
requires: 'traces' is not handled explicitly in readiness/filtering.
Line 103 defaults unknown requires values to hasData, so a traces-only link can appear before traces are confirmed. Line 62 also ignores traces in isReady.
Proposed fix
const hasData = data?.hasData ?? false;
const hasLogs = data?.hasLogs ?? hasData;
const hasMetrics = data?.hasMetrics ?? hasData;
+const hasTraces = data?.hasTraces ?? false;
const needsMetrics = actionLinks.some((actionLink) => actionLink.requires === 'metrics');
-const isReady = needsMetrics ? hasMetrics : hasData;
+const needsTraces = actionLinks.some((actionLink) => actionLink.requires === 'traces');
+const isReady = needsMetrics ? hasMetrics : needsTraces ? hasTraces : hasData;
@@
if (requires === 'metrics') {
return hasMetrics;
}
+if (requires === 'traces') {
+ return hasTraces;
+}
+
return hasData;Also applies to: 102-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx`
around lines 61 - 63, The readiness/filtering currently only checks for requires
=== 'metrics' and defaults unknown requires to hasData, which misses traces-only
links; add explicit handling for requires === 'traces' by introducing a
needsTraces boolean (e.g., const needsTraces = actionLinks.some(al =>
al.requires === 'traces')) and update isReady to consider traces (const isReady
= needsMetrics ? hasMetrics : needsTraces ? hasTraces : hasData), and likewise
adjust any filtering logic that maps requires -> readiness so that 'traces'
checks use hasTraces while preserving the existing default behavior for unknown
values.
| useEffect(() => { | ||
| if (isReady) { | ||
| onDataReceived?.(); | ||
| } | ||
| }, [isReady, onDataReceived]); |
There was a problem hiding this comment.
onDataReceived can fire repeatedly after readiness.
At Line 93, the effect re-runs whenever onDataReceived identity changes. With inline parent callbacks, this can trigger duplicate side effects after isReady becomes true.
Proposed fix
const [checkDataStartTime] = useState(Date.now());
const [dataReceivedTelemetrySent, setDataReceivedTelemetrySent] = useState(false);
+const [dataReceivedNotified, setDataReceivedNotified] = useState(false);
@@
useEffect(() => {
- if (isReady) {
- onDataReceived?.();
- }
-}, [isReady, onDataReceived]);
+ if (!isReady || dataReceivedNotified) {
+ return;
+ }
+
+ onDataReceived?.();
+ setDataReceivedNotified(true);
+}, [isReady, onDataReceived, dataReceivedNotified]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| if (isReady) { | |
| onDataReceived?.(); | |
| } | |
| }, [isReady, onDataReceived]); | |
| useEffect(() => { | |
| if (!isReady || dataReceivedNotified) { | |
| return; | |
| } | |
| onDataReceived?.(); | |
| setDataReceivedNotified(true); | |
| }, [isReady, onDataReceived, dataReceivedNotified]); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx`
around lines 93 - 97, The effect currently depends on both isReady and
onDataReceived so when the parent passes a new inline callback the effect can
re-run and call onDataReceived again; to fix, keep the effect dependent only on
isReady and call a stable ref to the latest callback: create a ref (e.g.,
latestOnDataReceivedRef) and update it whenever onDataReceived prop changes,
then in the useEffect for readiness (useEffect with [isReady]) call
latestOnDataReceivedRef.current?.() when isReady is true; update references to
onDataReceived in this component to use the ref so repeated identity changes
don’t retrigger the readiness side effect.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsx`:
- Around line 113-132: The osType currently passed to useTimeWindowDataDetection
is read from the live selectedTab on each poll, so change the component to
"freeze" the OS when monitoring starts: add a new state like sessionOsType and
set it inside the same effect that sets sessionStartTime (useEffect that checks
isMonitoringStepActive && sessionStartTime === null), initializing it from
OS_TYPE_MAP[selectedTab]; then pass that frozen value to
useTimeWindowDataDetection via extraQueryParams (e.g., extraQueryParams: {
osType: sessionOsType ?? OS_TYPE_MAP[selectedTab] }) so subsequent polls keep
the original OS filter even if selectedTab changes. Ensure symbols touched are
sessionStartTime, sessionOsType (new), selectedTab, OS_TYPE_MAP, and
useTimeWindowDataDetection.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts`:
- Around line 41-52: The hook currently only initializes checkDataStartTime once
and never resets noDataPollCount, causing subsequent monitoring sessions to
reuse prior timers and counters; update the useEffect that watches
isMonitoringActive (and the similar effect around lines 84-96) to reset both
setCheckDataStartTime(Date.now()) and setNoDataPollCount(0) whenever a new
sessionStartTime/isMonitoringActive transition begins (i.e., when starting a new
detection window), and ensure the effect dependencies include the session start
identifier so the state is reinitialized for each new session.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts`:
- Around line 111-139: The current handler uses Promise.all to run the two
searches so a no_shard_available error from either call causes the whole
operation to be treated as "no data"; change the logic so logs and metrics
probes are handled independently (e.g., use Promise.allSettled for the two
searches or run them in separate try/catch blocks) and compute hasLogs from the
logsResult outcome and hasMetrics from the metricsResult outcome, treating only
the failing probe with isNoShardsAvailableError as { hasX: false } while
rethrowing non-shard errors via throwHasDataSearchError; update references to
logsResult, metricsResult, isNoShardsAvailableError, and throwHasDataSearchError
in the route handler accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 4e20b41b-512c-4e8e-a584-a07c7618bfaa
📒 Files selected for processing (3)
x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsxx-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.tsx-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts
| // Set sessionStartTime when monitoring begins (first blur) rather than on | ||
| // mount, to narrow the time-window and reduce false positives from other | ||
| // OTel collectors already ingesting data on the same cluster. | ||
| const [sessionStartTime, setSessionStartTime] = useState<string | null>(null); | ||
| useEffect(() => { | ||
| if (isMonitoringStepActive && sessionStartTime === null) { | ||
| setSessionStartTime(new Date().toISOString()); | ||
| } | ||
| }, [isMonitoringStepActive, sessionStartTime]); | ||
|
|
||
| const { hasData, isTroubleshootingVisible } = useTimeWindowDataDetection({ | ||
| isMonitoringActive: isMonitoringStepActive && sessionStartTime !== null, | ||
| sessionStartTime: sessionStartTime ?? '', | ||
| fetchInterval: FETCH_INTERVAL, | ||
| troubleshootingDelay: SHOW_TROUBLESHOOTING_DELAY, | ||
| flowType: 'otel_logs', | ||
| onboardingId: setupData?.onboardingId ?? '', | ||
| endpoint: '/internal/observability_onboarding/otel_host/has-data', | ||
| extraQueryParams: { osType: OS_TYPE_MAP[selectedTab] }, | ||
| }); |
There was a problem hiding this comment.
Freeze the OS filter when monitoring starts.
Line 131 recomputes osType from the live tab selection on every poll. If the user starts the Linux flow, blurs the window, and then clicks another platform tab while waiting, the query switches to that new OS and the success state is delayed until the shared hook drops the filter after 30 misses.
💡 Suggested fix
- const [sessionStartTime, setSessionStartTime] = useState<string | null>(null);
+ const [sessionStartTime, setSessionStartTime] = useState<string | null>(null);
+ const [monitoringOsType, setMonitoringOsType] = useState<string | null>(null);
useEffect(() => {
if (isMonitoringStepActive && sessionStartTime === null) {
setSessionStartTime(new Date().toISOString());
+ setMonitoringOsType(OS_TYPE_MAP[selectedTab]);
}
- }, [isMonitoringStepActive, sessionStartTime]);
+ }, [isMonitoringStepActive, sessionStartTime, selectedTab]);
const { hasData, isTroubleshootingVisible } = useTimeWindowDataDetection({
isMonitoringActive: isMonitoringStepActive && sessionStartTime !== null,
sessionStartTime: sessionStartTime ?? '',
fetchInterval: FETCH_INTERVAL,
troubleshootingDelay: SHOW_TROUBLESHOOTING_DELAY,
flowType: 'otel_logs',
onboardingId: setupData?.onboardingId ?? '',
endpoint: '/internal/observability_onboarding/otel_host/has-data',
- extraQueryParams: { osType: OS_TYPE_MAP[selectedTab] },
+ extraQueryParams: monitoringOsType ? { osType: monitoringOsType } : undefined,
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Set sessionStartTime when monitoring begins (first blur) rather than on | |
| // mount, to narrow the time-window and reduce false positives from other | |
| // OTel collectors already ingesting data on the same cluster. | |
| const [sessionStartTime, setSessionStartTime] = useState<string | null>(null); | |
| useEffect(() => { | |
| if (isMonitoringStepActive && sessionStartTime === null) { | |
| setSessionStartTime(new Date().toISOString()); | |
| } | |
| }, [isMonitoringStepActive, sessionStartTime]); | |
| const { hasData, isTroubleshootingVisible } = useTimeWindowDataDetection({ | |
| isMonitoringActive: isMonitoringStepActive && sessionStartTime !== null, | |
| sessionStartTime: sessionStartTime ?? '', | |
| fetchInterval: FETCH_INTERVAL, | |
| troubleshootingDelay: SHOW_TROUBLESHOOTING_DELAY, | |
| flowType: 'otel_logs', | |
| onboardingId: setupData?.onboardingId ?? '', | |
| endpoint: '/internal/observability_onboarding/otel_host/has-data', | |
| extraQueryParams: { osType: OS_TYPE_MAP[selectedTab] }, | |
| }); | |
| const [sessionStartTime, setSessionStartTime] = useState<string | null>(null); | |
| const [monitoringOsType, setMonitoringOsType] = useState<string | null>(null); | |
| useEffect(() => { | |
| if (isMonitoringStepActive && sessionStartTime === null) { | |
| setSessionStartTime(new Date().toISOString()); | |
| setMonitoringOsType(OS_TYPE_MAP[selectedTab]); | |
| } | |
| }, [isMonitoringStepActive, sessionStartTime, selectedTab]); | |
| const { hasData, isTroubleshootingVisible } = useTimeWindowDataDetection({ | |
| isMonitoringActive: isMonitoringStepActive && sessionStartTime !== null, | |
| sessionStartTime: sessionStartTime ?? '', | |
| fetchInterval: FETCH_INTERVAL, | |
| troubleshootingDelay: SHOW_TROUBLESHOOTING_DELAY, | |
| flowType: 'otel_logs', | |
| onboardingId: setupData?.onboardingId ?? '', | |
| endpoint: '/internal/observability_onboarding/otel_host/has-data', | |
| extraQueryParams: monitoringOsType ? { osType: monitoringOsType } : undefined, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/otel_logs/index.tsx`
around lines 113 - 132, The osType currently passed to
useTimeWindowDataDetection is read from the live selectedTab on each poll, so
change the component to "freeze" the OS when monitoring starts: add a new state
like sessionOsType and set it inside the same effect that sets sessionStartTime
(useEffect that checks isMonitoringStepActive && sessionStartTime === null),
initializing it from OS_TYPE_MAP[selectedTab]; then pass that frozen value to
useTimeWindowDataDetection via extraQueryParams (e.g., extraQueryParams: {
osType: sessionOsType ?? OS_TYPE_MAP[selectedTab] }) so subsequent polls keep
the original OS filter even if selectedTab changes. Ensure symbols touched are
sessionStartTime, sessionOsType (new), selectedTab, OS_TYPE_MAP, and
useTimeWindowDataDetection.
There was a problem hiding this comment.
I ran into this bheavior, it's kinda weird. Should we even do the osType filter? It's another thing that can go wrong
There was a problem hiding this comment.
@flash1293 That's a fair point. The sessionStartTime already handles the main false-positive risk — any data that arrived before the user started the flow is excluded. The osType filter only guards against the narrow edge case of another OTel collector from a different OS that began ingesting after the session start on the same cluster. That edge case barely justifies the extra complexity, especially since the fallback in useTimeWindowDataDetection already silently drops the filter after FALLBACK_POLL_THRESHOLD misses anyway (meaning in bad cases it self-heals, but delays detection in the meantime).
Dropping osType entirely would:
- Eliminate the tab-switching race I flagged
- Remove the fallback dependency for this flow
- Keep the logic consistent with the other flows (OTel APM, Cloud Forwarder) that don't use an OS-scoped filter
So yes, I'd suggest removing extraQueryParams here:
const { hasData, isTroubleshootingVisible } = useTimeWindowDataDetection({
isMonitoringActive: isMonitoringStepActive && sessionStartTime !== null,
sessionStartTime: sessionStartTime ?? '',
fetchInterval: FETCH_INTERVAL,
troubleshootingDelay: SHOW_TROUBLESHOOTING_DELAY,
flowType: 'otel_logs',
onboardingId: setupData?.onboardingId ?? '',
endpoint: '/internal/observability_onboarding/otel_host/has-data',
- extraQueryParams: { osType: OS_TYPE_MAP[selectedTab] },
});And the OS_TYPE_MAP constant can be removed as well if no other caller needs it.
🧠 Learnings used
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/test/security_solution_cypress/.cursor/rules/flaky_test_doctor.mdc:0-0
Timestamp: 2026-03-06T10:31:57.700Z
Learning: Applies to x-pack/solutions/security/test/security_solution_cypress/**/*.cy.ts : Investigate real application bugs vs test flakiness: if a test consistently catches the same failure after a feature flag change, the application likely has a race condition that needs fixing
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/test/security_solution_cypress/.cursor/rules/flaky_test_doctor.mdc:0-0
Timestamp: 2026-03-06T00:27:42.746Z
Learning: Applies to x-pack/solutions/security/test/security_solution_cypress/**/*.cy.ts : When a feature flag change causes a previously stable Cypress test to become flaky, investigate whether the flag change alters component loading order or data fetching timing. The fix is usually in the application code where a race condition was exposed by the new timing, not in the test. Do not skip the test—investigate the application.
Learnt from: yansavitski
Repo: elastic/kibana PR: 258871
File: x-pack/platform/plugins/shared/search_inference_endpoints/public/components/settings/use_model_settings_form.ts:76-78
Timestamp: 2026-03-20T21:38:32.153Z
Learning: In `x-pack/platform/plugins/shared/search_inference_endpoints/public/components/settings/use_model_settings_form.ts`, the `useEffect(() => { setAssignments(defaultAssignments); }, [defaultAssignments])` pattern is intentional and safe. `defaultAssignments` (a `useMemo`) only recomputes on initial load or after a successful PUT mutation calls `setQueryData`. There is no background refetch or polling on the inference settings endpoint, so in-flight user edits cannot be clobbered. Do not flag this as a race condition or stale-overwrite issue.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/test/security_solution_cypress/.cursor/rules/flaky_test_doctor.mdc:0-0
Timestamp: 2026-03-06T00:27:42.746Z
Learning: Applies to x-pack/solutions/security/test/security_solution_cypress/**/*.cy.ts : Do not assume the flaky test is wrong without verification. A flaky Cypress test may be correctly identifying a real bug in the application. Always follow the analysis framework to determine if the issue is in the test or the application, particularly after feature flag changes or React rendering changes.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T01:30:09.037Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/**/*.tsx : AI Insights conversation handoff must pass attachments containing the generated summary, raw prefetched context, and domain-specific identifiers (e.g., alertId, errorId)
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T10:48:39.498Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Include instructions for generating Kibana deep links to services, traces, errors, and other entities via markdown links in AI Insights prompts
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/*.ts?(x) : Pass attachments containing the generated summary, raw prefetched context, and domain-specific identifiers (e.g., `alertId`, `errorId`) when handing off to Agent conversation.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-18T14:14:05.681Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/**/*.{ts,tsx,js} : Design AI Insights with prefetched context and hardcoded prompts, ensure proper prompt design, UI integration, conversation handoff, and test thoroughly
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T10:48:39.498Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/ai_insight/**/*.tsx : Shared AI Insights rendering logic should live in public/components/ai_insight/ to be reused across all insight types
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-06T09:23:07.998Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/tools/**/*.{ts,tsx,js} : Follow tool design principles, ECS/OTel compatibility, APM data types, parameter conventions, testing workflows, and pre-merge checklist for Observability tools as documented in server/tools/AGENTS.md
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T12:00:53.329Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Define specific output format sections (e.g., Summary / Assessment / Next Steps) in AI Insights system prompts for each insight type
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T01:30:09.037Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/**/*.tsx : AI Insights React components should be thin wrappers that define which API endpoint to call and what attachments to build for conversation handoff
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T01:30:09.037Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/**/*.{ts,tsx} : Create thin React component wrappers in `public/components/insights/` for each AI Insights type (e.g., `AlertAiInsight`) that define API endpoint and attachments
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/*.ts : Include instructions in prompts for generating Kibana deep links to services, traces, errors, and other entities via markdown links.
| const [checkDataStartTime, setCheckDataStartTime] = useState<number | null>(null); | ||
| const [dataReceivedTelemetrySent, setDataReceivedTelemetrySent] = useState(false); | ||
| const [noDataPollCount, setNoDataPollCount] = useState(0); | ||
| const { | ||
| services: { analytics }, | ||
| } = useKibana<ObservabilityOnboardingContextValue>(); | ||
|
|
||
| useEffect(() => { | ||
| if (isMonitoringActive && checkDataStartTime === null) { | ||
| setCheckDataStartTime(Date.now()); | ||
| } | ||
| }, [isMonitoringActive, checkDataStartTime]); |
There was a problem hiding this comment.
Reset retry state for each monitoring session.
Lines 48-52 only initialize checkDataStartTime once, and noDataPollCount is never cleared. When a caller starts a second detection window with a new sessionStartTime, this shared hook reuses the previous troubleshooting timer and fallback counter, so retries can show troubleshooting immediately and skip straight to the unfiltered query.
💡 Suggested fix
const [checkDataStartTime, setCheckDataStartTime] = useState<number | null>(null);
const [dataReceivedTelemetrySent, setDataReceivedTelemetrySent] = useState(false);
const [noDataPollCount, setNoDataPollCount] = useState(0);
@@
- useEffect(() => {
- if (isMonitoringActive && checkDataStartTime === null) {
- setCheckDataStartTime(Date.now());
- }
- }, [isMonitoringActive, checkDataStartTime]);
+ useEffect(() => {
+ if (!isMonitoringActive) {
+ setCheckDataStartTime(null);
+ setNoDataPollCount(0);
+ return;
+ }
+
+ setCheckDataStartTime(Date.now());
+ setNoDataPollCount(0);
+ }, [isMonitoringActive, sessionStartTime]);
@@
useEffect(() => {
+ if (!isMonitoringActive) {
+ return;
+ }
+
const pendingStatusList = [FETCH_STATUS.LOADING, FETCH_STATUS.NOT_INITIATED];
if (pendingStatusList.includes(hasDataStatus) || hasDataResponse?.hasData === true) {
return;
}
@@
- }, [hasDataResponse?.hasData, refetchHasData, hasDataStatus, fetchInterval]);
+ }, [isMonitoringActive, hasDataResponse?.hasData, refetchHasData, hasDataStatus, fetchInterval]);Also applies to: 84-96
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/use_time_window_data_detection.ts`
around lines 41 - 52, The hook currently only initializes checkDataStartTime
once and never resets noDataPollCount, causing subsequent monitoring sessions to
reuse prior timers and counters; update the useEffect that watches
isMonitoringActive (and the similar effect around lines 84-96) to reset both
setCheckDataStartTime(Date.now()) and setNoDataPollCount(0) whenever a new
sessionStartTime/isMonitoringActive transition begins (i.e., when starting a new
detection window), and ensure the effect dependencies include the session start
identifier so the state is reinitialized for each new session.
| const [logsResult, metricsResult] = await Promise.all([ | ||
| elasticsearch.client.asCurrentUser.search({ | ||
| index: ['logs-*.otel-*', 'logs.otel', 'logs.otel.*'], | ||
| ignore_unavailable: true, | ||
| allow_partial_search_results: true, | ||
| size: 0, | ||
| terminate_after: 1, | ||
| query, | ||
| }), | ||
| elasticsearch.client.asCurrentUser.search({ | ||
| index: ['metrics-*.otel-*'], | ||
| ignore_unavailable: true, | ||
| allow_partial_search_results: true, | ||
| size: 0, | ||
| terminate_after: 1, | ||
| query, | ||
| }), | ||
| ]); | ||
|
|
||
| const hasLogs = (logsResult.hits.total as estypes.SearchTotalHits).value > 0; | ||
| const hasMetrics = (metricsResult.hits.total as estypes.SearchTotalHits).value > 0; | ||
|
|
||
| return { hasData: hasLogs || hasMetrics }; | ||
| } catch (error) { | ||
| if (isNoShardsAvailableError(error)) { | ||
| return { hasData: false }; | ||
| } | ||
|
|
||
| throwHasDataSearchError(error); |
There was a problem hiding this comment.
Handle the log and metric probes independently.
Line 111 uses Promise.all(), so a no_shard_available_action_exception from either search makes the whole handler return { hasData: false }, even when the other index set already has documents. For an "any data?" route, this should be Promise.allSettled() or per-branch catches so shard issues only zero out the failing probe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@x-pack/solutions/observability/plugins/observability_onboarding/server/routes/otel_host/route.ts`
around lines 111 - 139, The current handler uses Promise.all to run the two
searches so a no_shard_available error from either call causes the whole
operation to be treated as "no data"; change the logic so logs and metrics
probes are handled independently (e.g., use Promise.allSettled for the two
searches or run them in separate try/catch blocks) and compute hasLogs from the
logsResult outcome and hasMetrics from the metricsResult outcome, treating only
the failing probe with isNoShardsAvailableError as { hasX: false } while
rethrowing non-shard errors via throwHasDataSearchError; update references to
logsResult, metricsResult, isNoShardsAvailableError, and throwHasDataSearchError
in the route handler accordingly.
There was a problem hiding this comment.
that's a good point, we've seen this happening in telemetry
There was a problem hiding this comment.
@flash1293, here's a fix using Promise.allSettled so a shard failure on one index set doesn't mask data in the other:
const [logsResult, metricsResult] = await Promise.allSettled([
elasticsearch.client.asCurrentUser.search({
index: ['logs-*.otel-*', 'logs.otel', 'logs.otel.*'],
ignore_unavailable: true,
allow_partial_search_results: true,
size: 0,
terminate_after: 1,
query,
}),
elasticsearch.client.asCurrentUser.search({
index: ['metrics-*.otel-*'],
ignore_unavailable: true,
allow_partial_search_results: true,
size: 0,
terminate_after: 1,
query,
}),
]);
const resolveProbe = (
result: PromiseSettledResult<estypes.SearchResponse>
): boolean => {
if (result.status === 'fulfilled') {
return (result.value.hits.total as estypes.SearchTotalHits).value > 0;
}
if (isNoShardsAvailableError(result.reason)) {
return false;
}
throwHasDataSearchError(result.reason);
};
const hasLogs = resolveProbe(logsResult);
const hasMetrics = resolveProbe(metricsResult);
return { hasData: hasLogs || hasMetrics };The outer try/catch can be removed (or kept as a last-resort guard) since each probe is now handled individually — a shard error on metrics won't hide already-ingested logs, which matches what you've been seeing in telemetry.
🧠 Learnings used
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T20:41:20.088Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/route.ts : AI Insights API endpoints must be POST endpoints using the internal path pattern /internal/observability_agent_builder/ai_insights/{type}
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/route.ts : Define required body parameters in each AI Insights route schema and document them in `route.ts`
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/route.ts : Make all AI Insights endpoints internal and return Server-Sent Event (SSE) streams with the pattern `POST /internal/observability_agent_builder/ai_insights/{type}`.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Use the endpoint pattern `POST /internal/observability_agent_builder/ai_insights/{type}` for all AI Insights API routes
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/route.ts : Define required body parameters per insight type in the route schema.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : AI Insights endpoints must return Server-Sent Events (SSE) streams
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Locate AI Insights server route implementations in `server/routes/ai_insights/` with separate files per insight type (e.g., `get_alert_ai_insights.ts`, `generate_alert_ai_insight.ts`)
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Prefetch all data up front via predetermined queries without LLM consultation for AI Insights context assembly
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T11:58:56.168Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Return SSE (Server-Sent Events) streams from all AI Insights endpoints
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/test/security_solution_cypress/.cursor/rules/flaky_test_doctor.mdc:0-0
Timestamp: 2026-03-06T10:31:57.700Z
Learning: Applies to x-pack/solutions/security/test/security_solution_cypress/**/*.cy.ts : Investigate real application bugs vs test flakiness: if a test consistently catches the same failure after a feature flag change, the application likely has a race condition that needs fixing
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/*.ts : Use a system prompt that establishes the LLM as an 'SRE Assistant' analyzing Observability data, specifying the exact output sections expected, rules against speculation, and formatting requirements (markdown).
Learnt from: sphilipse
Repo: elastic/kibana PR: 256071
File: x-pack/platform/plugins/shared/inference/server/chat_complete/callback_api.ts:358-384
Timestamp: 2026-03-10T14:25:11.950Z
Learning: In `x-pack/platform/plugins/shared/inference/server/chat_complete/callback_api.ts`, the `catchError` in `resolveAndCreatePipeline` intentionally wraps the full `createChatCompletePipeline` (not just the `resolve()` call). On the `isInferenceEndpoint` path, any 404 triggers `endpointIdCache.invalidate()` and rethrows the original error. On the connector path, any 404 is translated to "No connector or inference endpoint found". This broad scope is by design — do not flag it as incorrectly capturing non-resolution 404s.
Learnt from: yiannisnikolopoulos
Repo: elastic/kibana PR: 258385
File: x-pack/solutions/observability/plugins/observability/test/scout/ui/fixtures/helpers.ts:30-34
Timestamp: 2026-03-20T12:21:57.608Z
Learning: In `x-pack/solutions/observability/plugins/observability/test/scout/ui/fixtures/helpers.ts`, the `deleteRuleByName` helper intentionally uses `.catch(() => {})` to swallow all deletion errors. This is by design: teardown helpers in Scout UI tests must not let cleanup failures (network blip, race condition, rule already deleted by another parallel worker) mask the original test failure or block subsequent tests. Do not flag this silent-swallow pattern as a bug in teardown/cleanup helpers in this test suite.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/.cursor/rules/bug_validator.mdc:0-0
Timestamp: 2026-03-06T01:30:09.037Z
Learning: Applies to x-pack/solutions/security/plugins/security_solution/public/common/components/**/*.{ts,tsx} : In Security Solution shared UI components used by multiple features (e.g., Inspect modal, filter bars, embeddables), verify that data format assumptions (e.g., standard ES search response format with 'took' field) are compatible with alternative query types (e.g., ESQL responses with 'columns'/'values' format) or provide feature-specific handling
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-09T19:00:02.484Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/*.ts : Use 'SRE Assistant' role in AI Insights system prompts analyzing Observability data with technical, actionable language for incident responders
Learnt from: crespocarlos
Repo: elastic/kibana PR: 253786
File: src/platform/packages/shared/kbn-synthtrace/src/lib/service_graph_logs/log_utils/noise_logs.ts:72-75
Timestamp: 2026-03-06T17:56:37.865Z
Learning: In `src/platform/packages/shared/kbn-synthtrace/src/lib/service_graph_logs/log_utils/noise_logs.ts`, the noise channel deliberately allows ~35% `info`-level logs even during degraded mode. This is intentional: `HEALTH_PROBS.failing = { error: 0.15, warn: 0.65 }` leaves ~20% chance of warn/error combined at ~80%, with the remaining ~20% landing on `info` to simulate realistic background chatter. Forcing all degraded noise to `warn` would eliminate the probability distribution and reduce synthetic data realism. Do not flag this as a bug.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-09T19:00:02.484Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/server/routes/ai_insights/**/prompt*.ts : AI Insights prompts must enforce strict factuality — the LLM must only reference data provided in the context with no speculation or hallucination
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/security/.cursor/rules/bug_validator.mdc:0-0
Timestamp: 2026-03-10T16:00:36.052Z
Learning: Applies to x-pack/solutions/security/plugins/security_solution/server/**/*.{ts,tsx} : In Security Solution Elasticsearch queries and API responses, verify that hardcoded size limits (e.g., size: 10000, MAX_* constants) are accompanied by pagination logic (scroll, search_after) to handle result sets beyond the limit, rather than silently dropping data beyond the threshold
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T00:28:47.567Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/*.ts?(x) : Pass attachments containing the generated summary, raw prefetched context, and domain-specific identifiers (e.g., `alertId`, `errorId`) when handing off to Agent conversation.
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-06T09:23:07.998Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/tools/**/*.{ts,tsx,js} : Follow tool design principles, ECS/OTel compatibility, APM data types, parameter conventions, testing workflows, and pre-merge checklist for Observability tools as documented in server/tools/AGENTS.md
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/AGENTS.md:0-0
Timestamp: 2026-03-06T01:30:09.037Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/public/components/insights/**/*.tsx : AI Insights conversation handoff must pass attachments containing the generated summary, raw prefetched context, and domain-specific identifiers (e.g., alertId, errorId)
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-18T14:14:05.681Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/**/*.{ts,tsx,js} : Follow design principles, ECS/OTel compatibility, APM data types, parameter conventions, and testing workflows as documented in the Development Guide
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-18T14:14:05.681Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/**/*.{ts,tsx,js} : Design AI Insights with prefetched context and hardcoded prompts, ensure proper prompt design, UI integration, conversation handoff, and test thoroughly
Learnt from: CR
Repo: elastic/kibana PR: 0
File: x-pack/solutions/observability/plugins/observability_agent_builder/AGENTS.md:0-0
Timestamp: 2026-03-06T12:00:53.329Z
Learning: Applies to x-pack/solutions/observability/plugins/observability_agent_builder/server/routes/ai_insights/**/*.{ts,tsx,js} : Follow AI Insights architecture guidelines including prefetched context, hardcoded prompts, prompt design, UI integration, conversation handoff, and testing as documented in server/routes/ai_insights/AGENTS.md
…tion-loading-indicators-to-onboarding-flows
…tion-loading-indicators-to-onboarding-flows
|
Thanks, Robert. I'm not sure whether we should do it even for wired streams in that case. If it's that slow, then we probably should fall back to a non-onboarding ID keyed strategy for looking for data flowing or not flowing to avoid that performance issue. If we do this for wired streams, then we will still have that issue if the user is using wired streams, most likely it will time out for them, which is not really a good experience. Once we improve the querying performance on the wired streams side, we can still add it back. |
…r wired streams Replace the Painless runtime mapping (which timed out on clusters with real data volume) with a time-range-only fallback for wired stream indices. Classic data streams still use fast indexed onboarding ID fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if (result.status === 'fulfilled') { | ||
| return (result.value.hits.total as estypes.SearchTotalHits).value > 0; | ||
| } |
There was a problem hiding this comment.
🟢 Low kubernetes/resolve_has_data_probes.ts:19
When result.value.hits.total is a number rather than a SearchTotalHits object, the type assertion (result.value.hits.total as estypes.SearchTotalHits).value produces undefined, and undefined > 0 evaluates to false. This causes resolveProbe to return false (no data) when documents were actually found. Consider handling both cases: if total is a number, use it directly; otherwise use total.value.
if (result.status === 'fulfilled') {
- return (result.value.hits.total as estypes.SearchTotalHits).value > 0;
+ const total = result.value.hits.total;
+ const count = typeof total === 'number' ? total : (total as estypes.SearchTotalHits).value;
+ return count > 0;
}🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/resolve_has_data_probes.ts around lines 19-21:
When `result.value.hits.total` is a `number` rather than a `SearchTotalHits` object, the type assertion `(result.value.hits.total as estypes.SearchTotalHits).value` produces `undefined`, and `undefined > 0` evaluates to `false`. This causes `resolveProbe` to return `false` (no data) when documents were actually found. Consider handling both cases: if `total` is a number, use it directly; otherwise use `total.value`.
Evidence trail:
1. x-pack/solutions/observability/plugins/observability_onboarding/server/routes/kubernetes/resolve_has_data_probes.ts:19-20 (REVIEWED_COMMIT) - shows the problematic code using `(result.value.hits.total as estypes.SearchTotalHits).value > 0`
2. src/platform/packages/shared/kbn-search-connectors/utils/fetch_with_pagination.ts:52-67 (REVIEWED_COMMIT) - shows the `totalToPaginateTotal` function that explicitly handles `number | SearchTotalHits | undefined` as input type, proving `hits.total` can be a number
3. Various other Kibana files use defensive patterns like `(body.hits.total as estypes.SearchTotalHits).value ?? body.hits.total ?? 0` (see git_grep results for `hits.total as estypes.SearchTotalHits`)
|
@flash1293 Yeah, right now it might work fine but on clusters with high data volume the runtime mapping would time out. I dropped it and switched wired stream detection to the same time-window approach used by the OTel host and APM flows. |
| @@ -67,34 +77,88 @@ export function DataIngestStatus({ onboardingId, ingestionMode }: Props) { | |||
| }, FETCH_INTERVAL); | |||
|
|
|||
| return () => clearTimeout(timeout); | |||
| }, [data?.hasData, refetch, status]); | |||
| }, [isReady, hasPreExistingData, refetch, status]); | |||
There was a problem hiding this comment.
🟠 High kubernetes/data_ingest_status.tsx:68
When hasPreExistingData is true but hasMetrics is false while needsMetrics is true, the polling useEffect at line 71 incorrectly stops refetching because hasPreExistingData triggers the early return. The UI shows "Waiting for metrics to be shipped" with a loading spinner that never updates, since the polling is disabled while the component is still waiting for metrics. Consider removing hasPreExistingData from the early return condition so polling continues until isReady is true.
const pendingStatusList = [FETCH_STATUS.LOADING, FETCH_STATUS.NOT_INITIATED];
- if (pendingStatusList.includes(status) || isReady || hasPreExistingData) {
+ if (pendingStatusList.includes(status) || isReady) {
return;
}🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx around lines 68-80:
When `hasPreExistingData` is true but `hasMetrics` is false while `needsMetrics` is true, the polling useEffect at line 71 incorrectly stops refetching because `hasPreExistingData` triggers the early return. The UI shows "Waiting for metrics to be shipped" with a loading spinner that never updates, since the polling is disabled while the component is still waiting for metrics. Consider removing `hasPreExistingData` from the early return condition so polling continues until `isReady` is true.
Evidence trail:
x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx:
- Lines 64-65: `isReady = needsMetrics ? hasMetrics : hasData`
- Lines 68-78: useEffect polling with early return condition `if (... || isReady || hasPreExistingData)`
- Lines 114-117: Progress title shows "Waiting for metrics" when `hasData && needsMetrics && !hasMetrics`
- Line 139: ProgressIndicator `isLoading={needsMetrics ? !hasMetrics : !hasData}`
There was a problem hiding this comment.
This is intentional by design. When hasPreExistingData is true, time-range-based detection is unreliable (it would produce false positives from data that was already flowing before the user started onboarding). So we deliberately stop polling and skip the progress indicator entirely.
|
@rStelmach Some fair points that came out of automated code review Troubleshooting UX when hasPreExistingData && !hasData In useTimeWindowDataDetection, isTroubleshootingVisible still treats hasData === false as “no data” (noDataConfirmed) and does not exclude hasPreExistingData. Telemetry Another point, if we fall back to just data availability without the onboarding ID, then we don't show the link to the next page right away but require the window blur. That feels a little weird - we only give the user the link to the next page if they do this action that is pretty hidden. Maybe it's fine... Thoughts @LucaWintergerst ? |
|
|
||
| if (hasData === true) { | ||
| setDataReceivedTelemetrySent(true); | ||
| analytics.reportEvent(OBSERVABILITY_ONBOARDING_TELEMETRY_EVENT.eventType, { |
There was a problem hiding this comment.
🟢 Low kubernetes/data_ingest_status.tsx:87
analytics.reportEvent(...) is called without optional chaining on lines 87 and 95, but analytics can be undefined (as shown by the analytics?. usage in GetStartedPanel). When the effect runs while analytics is undefined, it throws a TypeError. Consider adding optional chaining (analytics?.reportEvent(...)) to both calls.
| analytics.reportEvent(OBSERVABILITY_ONBOARDING_TELEMETRY_EVENT.eventType, { | |
| analytics?.reportEvent(OBSERVABILITY_ONBOARDING_TELEMETRY_EVENT.eventType, { |
🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx around line 87:
`analytics.reportEvent(...)` is called without optional chaining on lines 87 and 95, but `analytics` can be `undefined` (as shown by the `analytics?.` usage in `GetStartedPanel`). When the effect runs while `analytics` is undefined, it throws a `TypeError`. Consider adding optional chaining (`analytics?.reportEvent(...)`) to both calls.
Evidence trail:
1. x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/kubernetes/data_ingest_status.tsx lines 87, 95: `analytics.reportEvent(...)` without optional chaining
2. x-pack/solutions/observability/plugins/observability_onboarding/public/application/quickstart_flows/shared/get_started_panel.tsx line 58: `analytics?.reportEvent(...)` with optional chaining
3. src/platform/plugins/shared/kibana_react/public/context/types.ts line 15: `export type KibanaServices = Partial<CoreStart>;`
4. src/platform/plugins/shared/kibana_react/public/context/context.tsx lines 17-20: default context value is `services: {}`
|
@flash1293 I addressed your comments including the window blur one |
…tion-loading-indicators-to-onboarding-flows
|
Sorry for finding new edge cases here, but I just realized that the time range check with using session start as the beginning of the time range can be a bit problematic. For example, let's say I go, I have APM data streaming in, but only with like a 30 second delay or something, which isn't unusual. I go to the APM onboarding page, then the session start is recorded and from the beginning there's no data in there, so it starts polling. And then after a while data from existing shippers arrives and then it says, oh, your data is flowing now, which wasn't related to the new API key I just created, but it's just the regular data just having a slight delay. So maybe we shouldn't use session start, but something like couple minutes before session start to avoid these false positives. And then if that's the case falling back to the data is already flowing case. |
…tion-loading-indicators-to-onboarding-flows
|
@rStelmach Just tested on the serverless instance from above, and the comment #257870 (comment) is still a problem, see recording: apm_issue.mov |
|
Looks like the APM flow is not checking for pre-existing data at all |
💚 Build Succeeded
Metrics [docs]Module Count
Async chunks
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
|
flash1293
left a comment
There was a problem hiding this comment.
LGTM, I'm not comfortable with the cloudforwarder bit, could we reach out to the team owning the forwarder itself so they can check and make sure? If something's wrong it can go in via bugfix PR.
Things I'm not sure about:
- Should it check waf data streams as well?
- Are the ones in the list right now the correct ones?
- Some general issue I'm not seeing right now
* commit 'bfc2446fdbcba2b3183f4518817c9757198c95ef': [Cascade] make cascade layout enabled by default (elastic#260698) [Dashboard Agent] Extract safe dashboard attachment integration refactors (elastic#261422) [One Workflow] Replace workflows:aiAgent:enabled with agentBuilder:experimentalFeatures (elastic#261330) [EDR Workflows] Osquery: hide query code from dropdown and show Elastic for automated Run By (elastic#261394) [Observability Onboarding] Add data detection & loading indicators to onboarding flows (elastic#257870) [Significant events] Format event count with locale-aware number separators (elastic#261570) [Fleet] Fix deprecated filter in browse integrations (elastic#261459) [Lens as code] Split `xyStateSchema` config (elastic#261089) [Data Views as Code] Use `ref_id` and add metadata in data views schemas (elastic#261181) Made-with: Cursor # Conflicts: # x-pack/platform/packages/shared/dashboard-agent/dashboard-agent-common/types.ts
…r-uid-to-id * commit '6868ae2f195462f1f6809a6a544114f54e48239e': [One Workflow] Replace workflows:aiAgent:enabled with agentBuilder:experimentalFeatures (elastic#261330) [EDR Workflows] Osquery: hide query code from dropdown and show Elastic for automated Run By (elastic#261394) [Observability Onboarding] Add data detection & loading indicators to onboarding flows (elastic#257870) [Significant events] Format event count with locale-aware number separators (elastic#261570) [Fleet] Fix deprecated filter in browse integrations (elastic#261459) [Lens as code] Split `xyStateSchema` config (elastic#261089) [Data Views as Code] Use `ref_id` and add metadata in data views schemas (elastic#261181)
## Summary PR #257870 introduced a `checkPreExistingData` mechanism to the shared `DataIngestStatus` component that conditionally hides the `ProgressIndicator` when pre-existing data is detected but no new data has arrived yet. This broke the Ensemble nightly Playwright tests for the Kubernetes EA, OTel Kubernetes, and OTel Host onboarding flows, which were asserting the visibility of the progress indicator text ("We are monitoring your cluster" / "We are monitoring your host") as proof that data ingestion was detected. The fix updates each flow's page object to assert the "Explore logs" action link instead — an element that is always rendered once data (or pre-existing data) is detected, regardless of whether the progress indicator is shown or hidden.
## Summary PR elastic#257870 introduced a `checkPreExistingData` mechanism to the shared `DataIngestStatus` component that conditionally hides the `ProgressIndicator` when pre-existing data is detected but no new data has arrived yet. This broke the Ensemble nightly Playwright tests for the Kubernetes EA, OTel Kubernetes, and OTel Host onboarding flows, which were asserting the visibility of the progress indicator text ("We are monitoring your cluster" / "We are monitoring your host") as proof that data ingestion was detected. The fix updates each flow's page object to assert the "Explore logs" action link instead — an element that is always rendered once data (or pre-existing data) is detected, regardless of whether the progress indicator is shown or hidden.




closes: elastic/observability-dev#5296
Summary 📚
This PR adds data detection and loading indicators to all observability onboarding flows. Previously, flows either had no data detection (OTel Host, OTel APM, Cloud Forwarder) or lacked separation between logs and metrics arrival (OTel K8s). Now there is a progress indicator while data is being ingested, troubleshooting guidance if data doesn't arrive, and contextual action links once the relevant data types are confirmed.
Detection strategies per flow
logs-*.otel-*,logs.otel,metrics-*.otel-*@timestamp >= sessionStarttraces-apm*,logs-apm*,metrics-apm*,*-*.otel-*,apm-*@timestamp >= sessionStartlogs-aws.{vpcflow,elbaccess,cloudtrail}.otel-*@timestamp >= sessionStartlogs-*,logs.*,metrics-*,metrics.*onboarding.idvia Helm--setprocessorlogs-*,logs.*,metrics-*,metrics.*fields.onboarding_id(existing)What's new
Client-side
useTimeWindowDataDetectionhook: shared hook replacing ~150 lines of duplicated polling/state/telemetry logic across OTel Host, OTel APM, and Cloud Forwarder flows. Handles configurable polling interval, telemetry on data received, and troubleshooting visibility after a delay. Treats fetch failures as "no data" to prevent infinite spinner.DataIngestStatusgeneralized: now acceptsactionLinkswith arequiresfield ('any' | 'logs' | 'metrics' | 'traces') for conditional link visibility. Separates telemetry (fires on any data) from step completion (fires when all required data types arrive). Drives both K8s EA and OTel K8s flows.ActionLink.requiresonGetStartedPanel: action links declare which data type they need, so Dashboard links wait for metrics while Explore Logs links appear as soon as logs arrive.current→completestep status based on actual data arrival.Server-side
has-dataroutes:/cloudforwarder/has-data,/otel_apm/has-data,/otel_host/has-data, and enhanced/kubernetes/has-data(now returnshasLogs/hasMetricsseparately).isNoShardsAvailableError/throwHasDataSearchError): deduplicates identical catch blocks across all has-data routes.logs/metricsroot indices (per [Observability Onboarding] Wired streams onboarding #249705), added runtime field with._rtsuffix to avoid shadowing indexed fields on classic streams, queries bothresource.attributes.onboarding.idandlabels.onboarding_idfor broad correlation coverage.logTypevalidation:t.keyof({ vpcflow, elbaccess, cloudtrail })replaces unsafet.string+as LogTypecast.OTel K8s Helm command
resource/onboarding_idprocessor via--setflags to tag data withonboarding.idresource attribute.processors[0..7], custom processors start at index 8.E2E Ensemble tests
data-test-subjselectors for the newGetStartedPanel-based action links.Notes 📓
Why not inject
onboarding.idinto every flow for exact correlation?Injecting a correlation ID (like
onboarding.id) as an OTel resource attribute is permanent, it stays on all events from that collector for its entire lifetime, not just during onboarding. For K8s OTel this trade-off is acceptable because the Helm chart is an explicit deployment that can be reconfigured, and multi-cluster disambiguation benefits from a stable ID. For OTel Host, OTel APM, and Cloud Forwarder, we chose time-window detection (@timestamp >= start) to avoid persisting a non-functional field in production data.Known edge cases of the detection strategies used in this PR:
logs-*.otel-*,logs.otel) from a previous setup, the@timestamp >= sessionStartquery will match pre-existing data and report "ready" before the new collector actually sends anything.onboarding.idresource attribute injected via Helm is permanent, it stays on all events from that collector for its entire lifetime, not just during onboarding. This was an intentional trade-off for exact correlation (no false positives), but it means a non-functional field persists in production data.fields.onboarding_id(existing, unchanged): Same persistence concern - the Elastic Agent integration injectsonboarding_idinto all events permanently.How to test 🧪
Use this guide to test each flow: https://github.com/elastic/observability-dev/tree/main/docs/obs-onboarding/onboarding-flows
Demos 🎥
otel-k8s-demo.mov
otel-demo.mov
Summary by CodeRabbit
New Features
Refactor
Tests