Skip to content

[Streams] Add classic-stream field mapping performance journey#270636

Draft
rStelmach wants to merge 10 commits into
elastic:mainfrom
rStelmach:streams-perf-classic-field-mapping-journey
Draft

[Streams] Add classic-stream field mapping performance journey#270636
rStelmach wants to merge 10 commits into
elastic:mainfrom
rStelmach:streams-perf-classic-field-mapping-journey

Conversation

@rStelmach
Copy link
Copy Markdown
Contributor

@rStelmach rStelmach commented May 22, 2026

Summary

Adds a Streams performance journey that exercises the schema editor on a classic stream carrying 10,000 field_overrides, mirroring the existing wired-stream streams_field_mapping journey.

Follow-up to #252288 (review):

About the mapping, for classic streams we should test with more fields (up to 10k), it's something that does happen in practice (but can also happen on a separate PR).

Known limitation

While wiring this journey, I found that classic stream _ingest updates validate large field_overrides through PUT /_data_stream/{name}/_mappings?dry_run=true. That dry-run resolves index.mapping.total_fields.limit from the matching composable index template for the next backing index, not from settings raised only on the current live backing index.

This PR keeps the scope narrowed to the performance journey and handles the limit through logs@custom, which participates in the normal logs template composition. The server-side behavior is tracked in elastic/streams-program#958, with details in this comment: https://github.com/elastic/streams-program/issues/958#issuecomment-4543936511

Validation

https://buildkite.com/elastic/kibana-streams-performance/builds/20

Mirrors the existing wired schema TTFMP journey against a classic
stream with 1,000 field_overrides set via the public Streams API,
so the kibana-streams-performance pipeline measures schema editor
load on the classic mapping path.

Follow-up to elastic#252288 (review by @flash1293). Dashboard panel and
alerting rule deferred until 4 green main-branch runs produce a
calibration baseline, matching the Phase 2 §9.2 process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

rStelmach and others added 3 commits May 22, 2026 17:17
Matches the upper bound called out in elastic#252288 review. Buildkite
streams-performance pipeline will be triggered manually against this
branch before merge, so we will see at 10k whether the schema editor
loads within journey timeouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build elastic#11 of kibana-streams-performance ran 6 of the 6 existing journeys
but silently skipped the new streams_classic_field_mapping because the
JOURNEYS_GROUP=streams filter in run_performance_cli.ts is an explicit
allow-list. Adding the new journey there plus in the FTR manifest so
both pipelines discover it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ES default index.mapping.total_fields.limit is 1000. Our journey
deliberately maps 10000 fields to stress the schema editor, so the
backing data stream's limit must be raised before the _ingest PUT.
Without this, setup fails with HTTP 400 "Limit of total fields [1000]
has been exceeded" (seen in kibana-streams-performance build elastic#12).
rStelmach and others added 2 commits May 25, 2026 15:31
Build elastic#15 still failed with `Limit of total fields [1000] has been exceeded`
even though the previous fix called `indices.putSettings` against the data
stream name. The Streams classic ingest update validates via
`PUT /_data_stream/{name}/_mappings?dry_run=true`, which resolves the limit
through the data stream's settings (and the rollover template), not via the
live backing index that `putSettings` happened to touch.

Switch to `indices.putDataStreamSettings`, which is the API the Streams
server itself uses for allowlisted classic-stream settings. It applies to
current backing indices and to the next rollover, so the mapping dry-run
sees the raised limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build elastic#16 still failed with `Limit of total fields [1000] has been exceeded`
after switching to `putDataStreamSettings`. The Streams `_ingest` PUT
validates field_overrides via `PUT /_data_stream/{name}/_mappings?dry_run=true`,
which reads the limit from the current write index's live settings.
`putDataStreamSettings` applied the override at the data-stream level but
not down to the existing `.ds-*` backing index, so the validation kept
seeing the default 1000.

Resolve the backing indices via `getDataStream` and set
`index.mapping.total_fields.limit` directly on them with `putSettings`.
That guarantees the dry-run sees the raised limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rStelmach added a commit that referenced this pull request May 25, 2026
## Summary

Set `connect.timeout = 60s` on the undici `Agent` used by
`KbnClientRequester` (https path only).

## Why

#268531 migrated `KbnClient` from axios to native fetch but did not
override undici's 10s `connect.timeout` default. Axios had no equivalent
cutoff, so FTR callers talking to a busy local Kibana started failing
once that PR landed.

The `kibana-streams-performance` weekly pipeline went red in builds #9,
#11, #12, and #13 with:

```
ConnectTimeoutError: Connect Timeout Error (attempted address: localhost:5620, timeout: 10000ms)
```

The `10000ms` is undici's default. Bisect: build #8 last green
(2026-05-11) → #9 first red (2026-05-18), with #268531 in the window.

## What changed


`src/platform/packages/shared/kbn-kbn-client/src/kbn_client/kbn_client_requester.ts`:
one constant, one option on the https `Agent`. http branch unchanged.

## Related

Regression introduced in #268531. Companion streams perf PR: #270636.

## Validation

https://buildkite.com/elastic/kibana-streams-performance/builds/14
rStelmach and others added 4 commits May 25, 2026 19:15
…ation

Builds elastic#15-elastic#17 kept failing the classic_field_mapping setup with
`Limit of total fields [1000] has been exceeded`, even after switching
from `putSettings` to `putDataStreamSettings` to per-backing-index
`putSettings`. The Streams `_ingest` PUT applies `field_overrides` via
`PUT /_data_stream/{name}/_mappings` followed by a lazy rollover, so the
dry-run validation resolves `total_fields.limit` from the matching
index template (the next-rollover index), not from any live backing
index settings - which is why none of those `putSettings` calls moved
the validated number.

Install a narrow, high-priority data-stream index template that matches
only `logs-perf-classic-mapping` (priority 500, beats the built-in
`logs` template) and sets `total_fields.limit=20000` before
`_create_classic`. The data stream is then born with the raised limit
baked into its template, and the dry-run accepts 10000 field_overrides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build elastic#18 confirmed the index-template fix from f50b926 works:
setup applied all 10000 field_overrides cleanly (visible in the log as
`10000 field_overrides set on logs-perf-classic-mapping`). The journey
then advanced through `Go to schema page`, `Open add field flyout`,
`Configure new field mapping`, and `Add field mapping`, all green.

It now fails on the final `Review and submit field mapping` step. After
clicking the modal submit button, the test waits up to 30s for it to
detach. The Streams server applies the mapping update across all 10001
overrides (10000 existing plus the one the journey adds), which the
setup phase already showed takes around 23s in CI. The submit button
stays disabled for the duration of the in-flight request, so the
30s `detached` wait races the server and times out.

Raise that final wait to 120s so the step measures the actual submit
latency at scale instead of failing on an arbitrarily tight UI timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rStelmach rStelmach added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:obs-onboarding Observability Onboarding Team Feature:Streams This is the label for the Streams Project labels May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:Streams This is the label for the Streams Project release_note:skip Skip the PR/issue when compiling release notes Team:obs-onboarding Observability Onboarding Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant