Skip to content

[StorageIndexAdapter] Set auto_expand_replicas to fix yellow health on single-node ES clusters#263096

Merged
flash1293 merged 7 commits intoelastic:mainfrom
flash1293:flash1293/fix-replicas
Apr 18, 2026
Merged

[StorageIndexAdapter] Set auto_expand_replicas to fix yellow health on single-node ES clusters#263096
flash1293 merged 7 commits intoelastic:mainfrom
flash1293:flash1293/fix-replicas

Conversation

@flash1293
Copy link
Copy Markdown
Contributor

@flash1293 flash1293 commented Apr 14, 2026

Summary

Fixes #263048

StorageIndexAdapter did not include index settings in its template, causing all 24 managed indices (.kibana_streams, .chat-conversations, kibana-evaluation-datasets, etc.) to default to number_of_replicas: 1. On single-node Elasticsearch clusters, the replica shard cannot be allocated, leaving cluster health yellow indefinitely.

This is the same class of issue as #261933 (.workflows-events), but affecting all indices managed by StorageIndexAdapter.

Changes

  • Added settings: { auto_expand_replicas: '0-1', number_of_shards: 1 } to the index template in createOrUpdateIndexTemplate() — this is the standard pattern used by all other Kibana system indices (.kibana, .kibana_task_manager, event log, lock manager, blob storage, etc.)
  • Added updateSettingsOfExistingIndex() method that checks the current auto_expand_replicas value on an existing write index and updates it to '0-1' if it differs — this fixes existing deployments that already have indices with number_of_replicas: 1
  • Wired updateSettingsOfExistingIndex() into validateComponentsBeforeWriting() so it runs on every write to an existing index

Affected indices (all 24 automatically benefit)

Plugin Indices
streams (10) .chat-memory, .chat-memhistory, .kibana_streams, .kibana_streams_settings, .kibana_streams_features, .kibana_streams_assets, .kibana_streams_attachments, .kibana_streams_insights, .kibana_streams_tasks, .kibana_streams_content_packs
agent_builder (10) .chat-conversations, .chat-skills, .chat-tools, .chat-tool-health, .chat-plugins, .chat-agent-executions, .chat-agents, .chat-sml-data, .chat-sml-crawler-state, .chat-user-prompts
evals (2) kibana-evaluation-datasets, kibana-evaluation-dataset-examples
automatic_import (1) .kibana-automatic-import-samples
workflows_management (1) .workflows-workflows

Test plan

  • Unit tests: 9 passing (3 new tests for template settings, settings update, and no-op when already correct)
  • Integration tests: 20 passing (1 new test verifying existing index gets auto_expand_replicas updated on next write)
  • Type check passes

…n single-node ES clusters

StorageIndexAdapter did not include index settings in its template, causing
all managed indices to default to number_of_replicas: 1. On single-node
Elasticsearch clusters, the replica shard cannot be allocated, leaving cluster
health yellow indefinitely.

This adds auto_expand_replicas: '0-1' and number_of_shards: 1 to the index
template and updates existing indices on write if their settings differ.

Fixes elastic#263048
Comment thread src/platform/packages/shared/kbn-storage-adapter/src/index_adapter/index.ts Outdated
…gIndex

With flat_settings: true, Elasticsearch returns dot-notation keys like
'index.auto_expand_replicas' instead of nested objects. This caused
currentAutoExpandReplicas to always be undefined, making putSettings
run on every write even when the setting was already correct.
@flash1293
Copy link
Copy Markdown
Contributor Author

I tested this locally with the streams index, also in the upgrade scenario and it seemed to work fine, however, it only changes the settings on the first write call, so it does not auto heal existing problems. You think that's the right approach?

@flash1293 flash1293 added Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// release_note:fix backport:version Backport to applied version labels v9.4.0 v9.5.0 labels Apr 14, 2026
@flash1293 flash1293 marked this pull request as ready for review April 14, 2026 14:56
@flash1293 flash1293 requested review from a team as code owners April 14, 2026 14:56
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-core (Team:Core)

this.logger.debug(`Updating mappings of existing index due to schema version mismatch`);
await this.updateMappingsOfExistingIndex({
} else {
await this.updateSettingsOfExistingIndex({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already get the index on line 308 which includes index settings. Is it necessary to get the index again inside updateSettingsOfExistingIndex.

If it's necessary to fix the write index, isn't it necessary to fix all backing indices or can we safely assume no consumers have rolled over to a new index?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, ties into the one I posted above about how far we should go with proactively fixing the existing configurations or whether we should just fix it forward.

I don't have a strong opinion on it, we can also just make it a thing for newly created backing indices and ignore existing ones, wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards just fix it for new indices, not for existing ones.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my assumption is that single node clusters are test/demo/dev clusters only. So it's unlikely that we have production customers impacted and fixing for the write index/new indices would be sufficient.

I think it's still worth reusing the settings we already have

@flash1293
Copy link
Copy Markdown
Contributor Author

Ralph applied changes for: simplify the implementation so the auto_expand_replicas as rudolf suggested by reusing the settings we already have


Updated by Ralph Engine.

@flash1293
Copy link
Copy Markdown
Contributor Author

@rudolf simplified to just changing this for future indices - less moving parts and as you say it shouldn't have an impact on production systems.

@flash1293 flash1293 requested a review from rudolf April 17, 2026 15:34
@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #87 / Serverless Observability - Deployment-agnostic Synthetics API integration tests SyntheticsAPITests PrivateLocationCreateMonitor handles auto upgrading policies

Metrics [docs]

✅ unchanged

History

@flash1293 flash1293 merged commit b805e2e into elastic:main Apr 18, 2026
11 checks passed
@kibanamachine
Copy link
Copy Markdown
Contributor

Starting backport for target branches: 9.4

https://github.com/elastic/kibana/actions/runs/24599997646

kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Apr 18, 2026
…n single-node ES clusters (elastic#263096)

## Summary

Fixes elastic#263048

`StorageIndexAdapter` did not include index settings in its template,
causing all 24 managed indices (`.kibana_streams`,
`.chat-conversations`, `kibana-evaluation-datasets`, etc.) to default to
`number_of_replicas: 1`. On single-node Elasticsearch clusters, the
replica shard cannot be allocated, leaving cluster health yellow
indefinitely.

This is the same class of issue as elastic#261933 (`.workflows-events`), but
affecting all indices managed by `StorageIndexAdapter`.

### Changes

- Added `settings: { auto_expand_replicas: '0-1', number_of_shards: 1 }`
to the index template in `createOrUpdateIndexTemplate()` — this is the
standard pattern used by all other Kibana system indices (`.kibana`,
`.kibana_task_manager`, event log, lock manager, blob storage, etc.)
- Added `updateSettingsOfExistingIndex()` method that checks the current
`auto_expand_replicas` value on an existing write index and updates it
to `'0-1'` if it differs — this fixes existing deployments that already
have indices with `number_of_replicas: 1`
- Wired `updateSettingsOfExistingIndex()` into
`validateComponentsBeforeWriting()` so it runs on every write to an
existing index

### Affected indices (all 24 automatically benefit)

| Plugin | Indices |
|--------|---------|
| streams (10) | `.chat-memory`, `.chat-memhistory`, `.kibana_streams`,
`.kibana_streams_settings`, `.kibana_streams_features`,
`.kibana_streams_assets`, `.kibana_streams_attachments`,
`.kibana_streams_insights`, `.kibana_streams_tasks`,
`.kibana_streams_content_packs` |
| agent_builder (10) | `.chat-conversations`, `.chat-skills`,
`.chat-tools`, `.chat-tool-health`, `.chat-plugins`,
`.chat-agent-executions`, `.chat-agents`, `.chat-sml-data`,
`.chat-sml-crawler-state`, `.chat-user-prompts` |
| evals (2) | `kibana-evaluation-datasets`,
`kibana-evaluation-dataset-examples` |
| automatic_import (1) | `.kibana-automatic-import-samples` |
| workflows_management (1) | `.workflows-workflows` |

### Test plan

- [x] Unit tests: 9 passing (3 new tests for template settings, settings
update, and no-op when already correct)
- [x] Integration tests: 20 passing (1 new test verifying existing index
gets `auto_expand_replicas` updated on next write)
- [x] Type check passes

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit b805e2e)
kibanamachine added a commit that referenced this pull request Apr 18, 2026
…alth on single-node ES clusters (#263096) (#264262)

# Backport

This will backport the following commits from `main` to `9.4`:
- [StorageIndexAdapter] Set auto_expand_replicas to fix yellow health on
single-node ES clusters (#263096) (b805e2e)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Joe
Reuter","email":"johannes.reuter@elastic.co"},"sourceCommit":{"committedDate":"2026-04-18T07:38:46Z","message":"[StorageIndexAdapter]
Set auto_expand_replicas to fix yellow health on single-node ES clusters
(#263096)\n\n## Summary\n\nFixes #263048\n\n`StorageIndexAdapter` did
not include index settings in its template,\ncausing all 24 managed
indices (`.kibana_streams`,\n`.chat-conversations`,
`kibana-evaluation-datasets`, etc.) to default to\n`number_of_replicas:
1`. On single-node Elasticsearch clusters, the\nreplica shard cannot be
allocated, leaving cluster health yellow\nindefinitely.\n\nThis is the
same class of issue as #261933 (`.workflows-events`), but\naffecting all
indices managed by `StorageIndexAdapter`.\n\n### Changes\n\n- Added
`settings: { auto_expand_replicas: '0-1', number_of_shards: 1 }`\nto the
index template in `createOrUpdateIndexTemplate()` — this is
the\nstandard pattern used by all other Kibana system indices
(`.kibana`,\n`.kibana_task_manager`, event log, lock manager, blob
storage, etc.)\n- Added `updateSettingsOfExistingIndex()` method that
checks the current\n`auto_expand_replicas` value on an existing write
index and updates it\nto `'0-1'` if it differs — this fixes existing
deployments that already\nhave indices with `number_of_replicas: 1`\n-
Wired `updateSettingsOfExistingIndex()`
into\n`validateComponentsBeforeWriting()` so it runs on every write to
an\nexisting index\n\n### Affected indices (all 24 automatically
benefit)\n\n| Plugin | Indices |\n|--------|---------|\n| streams (10) |
`.chat-memory`, `.chat-memhistory`,
`.kibana_streams`,\n`.kibana_streams_settings`,
`.kibana_streams_features`,\n`.kibana_streams_assets`,
`.kibana_streams_attachments`,\n`.kibana_streams_insights`,
`.kibana_streams_tasks`,\n`.kibana_streams_content_packs` |\n|
agent_builder (10) | `.chat-conversations`,
`.chat-skills`,\n`.chat-tools`, `.chat-tool-health`,
`.chat-plugins`,\n`.chat-agent-executions`, `.chat-agents`,
`.chat-sml-data`,\n`.chat-sml-crawler-state`, `.chat-user-prompts` |\n|
evals (2) |
`kibana-evaluation-datasets`,\n`kibana-evaluation-dataset-examples` |\n|
automatic_import (1) | `.kibana-automatic-import-samples` |\n|
workflows_management (1) | `.workflows-workflows` |\n\n### Test
plan\n\n- [x] Unit tests: 9 passing (3 new tests for template settings,
settings\nupdate, and no-op when already correct)\n- [x] Integration
tests: 20 passing (1 new test verifying existing index\ngets
`auto_expand_replicas` updated on next write)\n- [x] Type check
passes\n\n---------\n\nCo-authored-by: kibanamachine
<42973632+kibanamachine@users.noreply.github.com>","sha":"b805e2e703e8c385da2386a819a1fbb727a71720"},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[]}]
BACKPORT-->

Co-authored-by: Joe Reuter <johannes.reuter@elastic.co>
kapral18 added a commit to kapral18/kibana that referenced this pull request Apr 19, 2026
* main: (114 commits)
  Fix observability_ai_assistant_tool_call EBT error when connector is an inference endpoint (elastic#263334)
  init on install (elastic#263732)
  [One Workflow] fail-fast TaskRecovery for interrupted runs (elastic#261275)
  [Entity Store] Reset state error after successful task run (elastic#263087)
  [api-docs] 2026-04-19 Daily api_docs build (elastic#264280)
  [UII] Fix integration card row height calculation (elastic#264212)
  [scout] migrate FTR logstash api tests (elastic#262953)
  [StorageIndexAdapter] Set auto_expand_replicas to fix yellow health on single-node ES clusters (elastic#263096)
  [api-docs] 2026-04-18 Daily api_docs build (elastic#264260)
  [Scout] Update test config manifests (elastic#264257)
  [Security Solution][Detection Engine] enables AI rule creation feature flag (elastic#264036)
  [dashboards as code] only validate id on PUT route when creating new dashboard (elastic#264161)
  chore(NA): bump version to 9.5.0 (elastic#262165)
  skip failing test suite (elastic#263649)
  skip failing test suite (elastic#264236)
  [Discover] Convert remaining Enzyme tests to RTL (elastic#259676)
  auto-implement: Labels in model endpoints table of the model details flyout look misaligned (elastic#263770)
  [ci] Promote ES docker image after verification (elastic#263890)
  [Observability:Onboarding] Remove suppress global announcements that was breaking ensemble tests (elastic#264169)
  [Cases][AttachmentV2] Migrate persistable state part 2 - ML and AIOps charts (elastic#262597)
  ...
viduni94 added a commit that referenced this pull request Apr 22, 2026
…64760)

Closes #264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR #263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
smith pushed a commit to smith/kibana that referenced this pull request Apr 23, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
rbrtj pushed a commit to walterra/kibana that referenced this pull request Apr 27, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
SoniaSanzV pushed a commit to SoniaSanzV/kibana that referenced this pull request Apr 27, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:fix Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v9.4.0 v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Streams] .kibana_streams index causes yellow health on single-node ES clusters

5 participants