Skip to content

[Infra UI] Stabilize Infra Custom Dashboards stateful API tests#265679

Merged
awahab07 merged 2 commits into
elastic:mainfrom
awahab07:appex-qa-bots-obs-presentation-test-failures
Apr 27, 2026
Merged

[Infra UI] Stabilize Infra Custom Dashboards stateful API tests#265679
awahab07 merged 2 commits into
elastic:mainfrom
awahab07:appex-qa-bots-obs-presentation-test-failures

Conversation

@awahab07
Copy link
Copy Markdown
Contributor

@awahab07 awahab07 commented Apr 26, 2026

Summary

The Deployment Agnostic Infra Custom Dashboards API tests lately got many test failures in the scheduled stateful runs.

The most probably cause is a result of #262618, which introduced caching layer around uiSettings. As a result, Infra Custom Dashboards stateful API test suite which frequently updates and retrieves dashboard settings, faced server-side eventual consistency for settings in multi-node setups.

The suite was updating observability:enableInfrastructureAssetCustomDashboards and immediately asserting route behavior. In scheduled cloud runs, the follow-up request can observe a stale cached value on another Kibana node, which flips expected status codes.

Example failure:
image

This PR:

  • stabilizes the suite by reducing advanced-setting flips.
  • groups disabled and enabled scenarios so the suite only transitions into the enabled state once.
  • waits for the enabled setting to propagate before running the enabled-path assertions (following ref).

…ing flips and wait for propagation so stateful cloud runs do not read stale cached values.
@awahab07 awahab07 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:obs-presentation Focus: APM UI, Infra UI, Hosts UI, Universal Profiling, Obs Overview and left Navigation labels Apr 26, 2026
@awahab07 awahab07 marked this pull request as ready for review April 26, 2026 21:51
@awahab07 awahab07 requested a review from a team as a code owner April 26, 2026 21:51
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/obs-presentation-team (Team:obs-presentation)

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented Apr 26, 2026

Catch flakiness early (recommended)

Recommended before merge: run the flaky test runner against this PR to catch flakiness early.

The new waitForEnabledCustomDashboardsSetting retry/timing logic (retry.tryForTime with a 12 s propagation delay and 20 s timeout) and the restructured shared before/after hooks introduce non-determinism that a single CI pass may not reliably catch.

Trigger a run with the Flaky Test Runner UI or post this comment on the PR:

/flaky ftrConfig:x-pack/solutions/observability/test/api_integration_deployment_agnostic/configs/stateful/oblt.stateful.config.ts:30 ftrConfig:x-pack/solutions/observability/test/api_integration_deployment_agnostic/configs/serverless/oblt.serverless.config.ts:30

Share feedback in the #appex-qa channel.

Posted via Macroscope — Flaky Test Runner nudge

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented Apr 26, 2026

Approvability

Verdict: Needs human review

Test-only changes reorganizing API integration tests and adding retry logic for multi-node environment stability. However, the modified file is owned by @elastic/obs-presentation-team and the author is not a designated owner, so designated reviewers should verify the test reorganization.

You can customize Macroscope's approvability policy. Learn more.

@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

@kibanamachine
Copy link
Copy Markdown
Contributor

Flaky Test Runner Stats

🎉 All tests passed! - kibana-flaky-test-suite-runner#11925

[✅] x-pack/solutions/observability/test/api_integration_deployment_agnostic/configs/stateful/oblt.stateful.config.ts: 30/30 tests passed.

see run history

@awahab07 awahab07 enabled auto-merge (squash) April 27, 2026 01:11
Comment on lines +60 to +62
if (Date.now() - startedAt < CUSTOM_DASHBOARDS_SETTING_PROPAGATION_DELAY_MS) {
throw new Error('Waiting for custom dashboards setting propagation');
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid having conditions in the tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense. This shouldn't be the way to do it, but gives us a probe into the multi node environment failures.

I've created a follow up PR to remove this and hopefully after this we'll have a deterministic and robust way for such probe.

? `/api/infra/${assetType}/custom-dashboards/${dashboardSavedObjectId}`
: `/api/infra/${assetType}/custom-dashboards`;

// TDDO: Ideally we should have a deterministic way to know when settings updates are propagated to the UI.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Any way this could be done here instead of leaving the comment. If not maybe we should link an issue so we don't forget about it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

I've created a follow up PR to remove this and created #265720.

@awahab07
Copy link
Copy Markdown
Contributor Author

@jennypavlova thanks for the review. Does it make sense to let this PR merge, to test if it mitigates the recurring test failures. Once confirmed, we can move forward with the follow up PR and a robust solution tracked by #265720.

@awahab07 awahab07 requested a review from jennypavlova April 27, 2026 10:20
@awahab07 awahab07 merged commit 87e0b6a into elastic:main Apr 27, 2026
35 checks passed
awahab07 added a commit that referenced this pull request Apr 30, 2026
… Dashboards stateful API test suite (#265717)

Follow-up to [#265679](#265679).

Remove the inline resolution TODO and cache-TTL checkpoint from Custom
Dashboards stateful API test suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-presentation Focus: APM UI, Infra UI, Hosts UI, Universal Profiling, Obs Overview and left Navigation v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants