[9.4] ci(cypress): default stateful ES to snapshot on CI, docker locally (#264218)#267726
Merged
kibanamachine merged 1 commit intoelastic:9.4from May 5, 2026
Merged
Conversation
…lastic#264218) ## Summary Default Cypress stateful Elasticsearch provisioning to `snapshot` on CI and keep `docker` for local development. The earlier switch to Docker as the universal default (elastic#254306) was motivated by: - making local dev match shipped artifacts, - multi-arch support for Apple Silicon, - avoiding per-spec snapshot extraction, - faster warm starts on developer machines. All four are genuine wins **for local dev**. On CI they either don't apply, are neutral, or are actively counter-productive. After gathering empirical data from Buildkite, the right default on CI is `snapshot`; on workstations the right default stays `docker`. ## Why snapshot on CI 1. **No version-skew race.** Kibana CI already resolves an ES snapshot manifest once per build in [`.buildkite/scripts/lifecycle/pre_build.sh`](https://github.com/elastic/kibana/blob/main/.buildkite/scripts/lifecycle/pre_build.sh) against `kibana-ci-es-snapshots-daily` — Kibana's own daily-verified bucket, version-locked to Kibana by construction. The post-version-bump window (`9.5.0`, `9.6.0`, …) that my earlier auto-detect probe tried to guard against doesn't actually exist for stateful Cypress on CI: the tar.gz is already there, or `pre_build.sh` has already failed the build before any Cypress agent starts. A Docker image for that same version is _not_ guaranteed to exist at the same moment — which is the exact failure mode we kept running into. 2. **Docker-on-CI is not meaningfully faster on the same hardware.** I pulled job durations from Buildkite for `kibana-on-merge` Security Solution Cypress jobs before and after elastic#254306 and reconciled them against the Buildkite agent machine-type change (`n2-standard-4` → `n2-highmem-4`) that landed in the same window. Controlling for that hardware change, ES start-up on a warm CI agent is ~5s different between snapshot tar.gz and Docker — within noise for a 20–40 minute Cypress group. The speedups originally attributed to Docker were largely a hardware upgrade. 3. **ES starts once per FTR config group, not per spec.** `parallel.ts` provisions ES once for each group in `specGroups`, runs all specs in that group against the same cluster, then shuts down (see [`runSpecGroup`](https://github.com/elastic/kibana/blob/main/x-pack/solutions/security/plugins/security_solution/scripts/run_cypress/parallel.ts)). Only retry runs go per-spec. So the "Docker avoids per-spec extraction on CI" argument is mostly about retries, which are a tiny fraction of total runtime. 4. **Fewer moving parts on CI.** No Docker registry auth, no Docker pull on every agent, no fallback logic between Docker and snapshot, no GCS probe script. Snapshot tar.gz is already pre-fetched/cached by the standard Kibana CI lifecycle. ## Why keep Docker for local dev 1. Matches shipped artifacts byte-for-byte. 2. Native multi-arch (Apple Silicon) without a separate tar.gz pipeline. 3. Warm starts are fast once the image is cached on the workstation. 4. `CYPRESS_ES_FROM=snapshot` (or `docker`) still works as an explicit override for both environments. ## Change ```ts const defaultEsFrom = process.env.CI ? 'snapshot' : 'docker'; const esFrom = configEsFrom === 'serverless' ? 'serverless' : esFromEnv || defaultEsFrom; ``` Also drops the earlier `detect_cypress_es_from.sh` probe and its hook in `setup_job_env.sh` — `pre_build.sh` already covers the version-skew concern at a better layer. The serverless routing fix (`configEsFrom === 'serverless'` wins over `CYPRESS_ES_FROM`) is retained from the first commit and is independent of the default flip — it prevents stateful `CYPRESS_ES_FROM=snapshot` from accidentally booting serverless suites against a stateful snapshot tar.gz and blowing up with `unknown setting [xpack.security.authc.native_roles.enabled]`. ## Test plan - [ ] Green `kibana-on-merge` Security Solution Cypress jobs (stateful + serverless). - [ ] Green `kibana-pull-request` Security Solution Cypress jobs with no `CYPRESS_ES_FROM` set. - [ ] Local: `yarn cypress:run ...` still uses Docker by default. - [ ] Local: `CYPRESS_ES_FROM=snapshot yarn cypress:run ...` uses snapshot. - [ ] Serverless suites remain on `serverless` regardless of `CYPRESS_ES_FROM`. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 66c8e08)
5 tasks
Contributor
Author
💚 Build Succeeded
Metrics [docs]
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport
This will backport the following commits from
mainto9.4:Questions ?
Please refer to the Backport tool documentation