Skip to content

[Saved Objects] Fix scroll_count test 502 on ECH by reducing import batch size#264965

Merged
gsoldevila merged 1 commit into
elastic:mainfrom
gsoldevila:fix/scroll-count-ech-502
Apr 22, 2026
Merged

[Saved Objects] Fix scroll_count test 502 on ECH by reducing import batch size#264965
gsoldevila merged 1 commit into
elastic:mainfrom
gsoldevila:fix/scroll-count-ech-502

Conversation

@gsoldevila
Copy link
Copy Markdown
Member

Summary

Fixes #262663

The scroll_count - more than 10k objects test was failing on ECH (Elastic Cloud Hosted) with a 502 during beforeAll setup, while passing on Serverless.

The failure was not in the scroll_count route itself, but in the test setup step that imports 12,000 visualizations to populate the dataset. The setup used 2 batches of 6,000 objects each via POST /api/saved_objects/_import.

On ECH, traffic flows through HAProxy which enforces a client-side timeout (typically 60 s). Importing 6,000 objects in a single call requires streaming and parsing a ~1 MB NDJSON multipart payload, multiple bulk-index calls to Elasticsearch, and building a response with successResults containing one entry per object (~600 KB JSON). If Kibana takes longer than the HAProxy timeout, the proxy closes the connection and returns 502.

Fix: Reduce the import batch size from 6,000 to 1,000 objects per call (12 batches total instead of 2). Each batch is ~165 KB request / ~100 KB response, well within ECH proxy limits and timeout. The total number of objects imported (12,000) and the test assertion remain unchanged.

Test plan

  • Existing test returns the correct count for each included types validates the fix end-to-end (still asserts { visualization: 12000 })
  • Verify the test passes on cloud-stateful-classic (ECH) in CI

Made with Cursor

…atch size

Reduces the import batch size in beforeAll from 6,000 to 1,000 objects
per call to avoid ECH HAProxy timeout when importing large datasets.

Fixes elastic#262663

Made-with: Cursor
@gsoldevila gsoldevila added Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting labels Apr 22, 2026
@gsoldevila gsoldevila marked this pull request as ready for review April 22, 2026 09:49
@gsoldevila gsoldevila requested a review from a team as a code owner April 22, 2026 09:49
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-core (Team:Core)

@gsoldevila gsoldevila enabled auto-merge (squash) April 22, 2026 11:00
@gsoldevila gsoldevila merged commit cad8778 into elastic:main Apr 22, 2026
38 checks passed
tiansivive pushed a commit to tiansivive/kibana that referenced this pull request Apr 23, 2026
…atch size (elastic#264965)

## Summary

Fixes elastic#262663

The `scroll_count - more than 10k objects` test was failing on ECH
(Elastic Cloud Hosted) with a 502 during `beforeAll` setup, while
passing on Serverless.

The failure was **not** in the `scroll_count` route itself, but in the
test setup step that imports 12,000 visualizations to populate the
dataset. The setup used 2 batches of 6,000 objects each via `POST
/api/saved_objects/_import`.

On ECH, traffic flows through HAProxy which enforces a client-side
timeout (typically 60 s). Importing 6,000 objects in a single call
requires streaming and parsing a ~1 MB NDJSON multipart payload,
multiple bulk-index calls to Elasticsearch, and building a response with
`successResults` containing one entry per object (~600 KB JSON). If
Kibana takes longer than the HAProxy timeout, the proxy closes the
connection and returns 502.

**Fix:** Reduce the import batch size from 6,000 to 1,000 objects per
call (12 batches total instead of 2). Each batch is ~165 KB request /
~100 KB response, well within ECH proxy limits and timeout. The total
number of objects imported (12,000) and the test assertion remain
unchanged.

## Test plan

- [x] Existing test `returns the correct count for each included types`
validates the fix end-to-end (still asserts `{ visualization: 12000 }`)
- [ ] Verify the test passes on `cloud-stateful-classic` (ECH) in CI

Made with [Cursor](https://cursor.com)
SoniaSanzV pushed a commit to SoniaSanzV/kibana that referenced this pull request Apr 27, 2026
…atch size (elastic#264965)

## Summary

Fixes elastic#262663

The `scroll_count - more than 10k objects` test was failing on ECH
(Elastic Cloud Hosted) with a 502 during `beforeAll` setup, while
passing on Serverless.

The failure was **not** in the `scroll_count` route itself, but in the
test setup step that imports 12,000 visualizations to populate the
dataset. The setup used 2 batches of 6,000 objects each via `POST
/api/saved_objects/_import`.

On ECH, traffic flows through HAProxy which enforces a client-side
timeout (typically 60 s). Importing 6,000 objects in a single call
requires streaming and parsing a ~1 MB NDJSON multipart payload,
multiple bulk-index calls to Elasticsearch, and building a response with
`successResults` containing one entry per object (~600 KB JSON). If
Kibana takes longer than the HAProxy timeout, the proxy closes the
connection and returns 502.

**Fix:** Reduce the import batch size from 6,000 to 1,000 objects per
call (12 batches total instead of 2). Each batch is ~165 KB request /
~100 KB response, well within ECH proxy limits and timeout. The total
number of objects imported (12,000) and the test assertion remain
unchanged.

## Test plan

- [x] Existing test `returns the correct count for each included types`
validates the fix end-to-end (still asserts `{ visualization: 12000 }`)
- [ ] Verify the test passes on `cloud-stateful-classic` (ECH) in CI

Made with [Cursor](https://cursor.com)
gsoldevila added a commit that referenced this pull request May 6, 2026
…ning import batches concurrently (#266628)

## Summary

Fixes #262663 (re-opened after #264965)

### What happened

PR #264965 fixed the per-request 502 errors on ECH by reducing the
import batch size from 6,000 → 1,000 objects per call, keeping each
request safely under HAProxy's ~60 s per-connection timeout. However,
the 12 batches now run **sequentially**, and their cumulative time
exceeds the `beforeAll` hook's 120 s limit:

```
"beforeAll" hook timeout of 120000ms exceeded.
```

Each 1,000-object `_import` call on ECH takes ~10–15 s (NDJSON multipart
parsing + Kibana bulk-index + response building). 12 × 10–15 s = 120–180
s.

### Fix

Run the import batches **concurrently** (`CONCURRENCY = 3`) rather than
sequentially, and increase the hook timeout to 5 minutes as a generous
safety net.

```
12 batches ÷ 3 concurrent = 4 rounds × ~15 s/round ≈ 60 s total
```

Even if server load slows concurrent batches to ~60 s each, 4 rounds ×
60 s = 240 s — still well within the 300 s limit. Each individual
request remains at 1,000 objects, so no per-request HAProxy concern.

This keeps the full Kibana `_import` pipeline in the setup path (proper
SO migration, correct index routing, correct namespace handling),
avoiding any coupling to internal storage format details.

### Why not insert directly via `esClient.bulk`?

A direct ES bulk insert was considered but rejected: it would bypass
Kibana's saved-objects migration pipeline, requiring the test to
manually track the correct index name (`ANALYTICS_SAVED_OBJECT_INDEX`),
document format, and migration version fields. Any change to the
`visualization` type registration or storage model would silently break
the test setup without a compile-time or schema error.

## Test plan

- [ ] Verify `returns the correct count for each included types` passes
on `cloud-stateful-classic` (ECH) in CI

Made with [Cursor](https://cursor.com)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failing test: scroll_count - more than 10k objects - returns the correct count for each included types

4 participants