Skip to content

[9.4] ci(cypress): default stateful ES to snapshot on CI, docker locally (#264218)#267726

Merged
kibanamachine merged 1 commit intoelastic:9.4from
kibanamachine:backport/9.4/pr-264218
May 5, 2026
Merged

[9.4] ci(cypress): default stateful ES to snapshot on CI, docker locally (#264218)#267726
kibanamachine merged 1 commit intoelastic:9.4from
kibanamachine:backport/9.4/pr-264218

Conversation

@kibanamachine
Copy link
Copy Markdown
Contributor

Backport

This will backport the following commits from main to 9.4:

Questions ?

Please refer to the Backport tool documentation

…lastic#264218)

## Summary

Default Cypress stateful Elasticsearch provisioning to `snapshot` on CI
and keep `docker` for local development.

The earlier switch to Docker as the universal default (elastic#254306) was
motivated by:

- making local dev match shipped artifacts,
- multi-arch support for Apple Silicon,
- avoiding per-spec snapshot extraction,
- faster warm starts on developer machines.

All four are genuine wins **for local dev**. On CI they either don't
apply, are neutral, or are actively counter-productive. After gathering
empirical data from Buildkite, the right default on CI is `snapshot`; on
workstations the right default stays `docker`.

## Why snapshot on CI

1. **No version-skew race.** Kibana CI already resolves an ES snapshot
manifest once per build in
[`.buildkite/scripts/lifecycle/pre_build.sh`](https://github.com/elastic/kibana/blob/main/.buildkite/scripts/lifecycle/pre_build.sh)
against `kibana-ci-es-snapshots-daily` — Kibana's own daily-verified
bucket, version-locked to Kibana by construction. The post-version-bump
window (`9.5.0`, `9.6.0`, …) that my earlier auto-detect probe tried to
guard against doesn't actually exist for stateful Cypress on CI: the
tar.gz is already there, or `pre_build.sh` has already failed the build
before any Cypress agent starts. A Docker image for that same version is
_not_ guaranteed to exist at the same moment — which is the exact
failure mode we kept running into.

2. **Docker-on-CI is not meaningfully faster on the same hardware.** I
pulled job durations from Buildkite for `kibana-on-merge` Security
Solution Cypress jobs before and after elastic#254306 and reconciled them
against the Buildkite agent machine-type change (`n2-standard-4` →
`n2-highmem-4`) that landed in the same window. Controlling for that
hardware change, ES start-up on a warm CI agent is ~5s different between
snapshot tar.gz and Docker — within noise for a 20–40 minute Cypress
group. The speedups originally attributed to Docker were largely a
hardware upgrade.

3. **ES starts once per FTR config group, not per spec.** `parallel.ts`
provisions ES once for each group in `specGroups`, runs all specs in
that group against the same cluster, then shuts down (see
[`runSpecGroup`](https://github.com/elastic/kibana/blob/main/x-pack/solutions/security/plugins/security_solution/scripts/run_cypress/parallel.ts)).
Only retry runs go per-spec. So the "Docker avoids per-spec extraction
on CI" argument is mostly about retries, which are a tiny fraction of
total runtime.

4. **Fewer moving parts on CI.** No Docker registry auth, no Docker pull
on every agent, no fallback logic between Docker and snapshot, no GCS
probe script. Snapshot tar.gz is already pre-fetched/cached by the
standard Kibana CI lifecycle.

## Why keep Docker for local dev

1. Matches shipped artifacts byte-for-byte.
2. Native multi-arch (Apple Silicon) without a separate tar.gz pipeline.
3. Warm starts are fast once the image is cached on the workstation.
4. `CYPRESS_ES_FROM=snapshot` (or `docker`) still works as an explicit
override for both environments.

## Change

```ts
const defaultEsFrom = process.env.CI ? 'snapshot' : 'docker';
const esFrom =
  configEsFrom === 'serverless' ? 'serverless' : esFromEnv || defaultEsFrom;
```

Also drops the earlier `detect_cypress_es_from.sh` probe and its hook in
`setup_job_env.sh` — `pre_build.sh` already covers the version-skew
concern at a better layer.

The serverless routing fix (`configEsFrom === 'serverless'` wins over
`CYPRESS_ES_FROM`) is retained from the first commit and is independent
of the default flip — it prevents stateful `CYPRESS_ES_FROM=snapshot`
from accidentally booting serverless suites against a stateful snapshot
tar.gz and blowing up with `unknown setting
[xpack.security.authc.native_roles.enabled]`.

## Test plan

- [ ] Green `kibana-on-merge` Security Solution Cypress jobs (stateful +
serverless).
- [ ] Green `kibana-pull-request` Security Solution Cypress jobs with no
`CYPRESS_ES_FROM` set.
- [ ] Local: `yarn cypress:run ...` still uses Docker by default.
- [ ] Local: `CYPRESS_ES_FROM=snapshot yarn cypress:run ...` uses
snapshot.
- [ ] Serverless suites remain on `serverless` regardless of
`CYPRESS_ES_FROM`.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 66c8e08)
@kibanamachine kibanamachine added the backport This PR is a backport of another PR label May 5, 2026
@kibanamachine kibanamachine enabled auto-merge (squash) May 5, 2026 12:49
@kibanamachine
Copy link
Copy Markdown
Contributor Author

💚 Build Succeeded

Metrics [docs]

✅ unchanged

cc @patrykkopycinski

@kibanamachine kibanamachine merged commit 07a1e70 into elastic:9.4 May 5, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport This PR is a backport of another PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants