Skip to content

[OTel Tracing] HTTP instrumentation#258663

Merged
afharo merged 8 commits intoelastic:mainfrom
afharo:otel/http-instrumentation
Mar 24, 2026
Merged

[OTel Tracing] HTTP instrumentation#258663
afharo merged 8 commits intoelastic:mainfrom
afharo:otel/http-instrumentation

Conversation

@afharo
Copy link
Copy Markdown
Member

@afharo afharo commented Mar 19, 2026

Summary

This PR instruments Kibana's HTTP server with OTel Traces.

It achieves this by using the @opentelemetry/instrumentation-http autoinstrumentation + some custom spans for identifying the pre-handlers + handler + post-handlers.

It removes the package @opentelemetry/instrumentation-hapi because it was overly noisy as it reported a span for each pre/post handler. It also made it more difficult to access the HTTP span to add our custom labels to it.

It aimed for parity with the existing Elastic APM instrumentation:

Elastic APM OTel Traces
image image

Checklist

Check the PR satisfies following conditions.

Reviewers should verify this PR satisfies this list as well.

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Identify risks

  • It is now enabling auto-instrumentations as soon as OTel Tracing is enabled. There is the risk of performance hit, as they use require-in-the-middle. As long as we make sure that Elastic APM and OTel Tracing are not enabled, it should be fine.

@afharo afharo self-assigned this Mar 19, 2026
@afharo afharo added the Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// label Mar 19, 2026
@afharo afharo requested a review from a team as a code owner March 19, 2026 22:17
@afharo afharo added the release_note:skip Skip the PR/issue when compiling release notes label Mar 19, 2026
@afharo afharo requested review from a team as code owners March 19, 2026 22:17
@afharo afharo added the backport:skip This PR does not require backporting label Mar 19, 2026
@afharo afharo requested a review from a team as a code owner March 19, 2026 22:17
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-core (Team:Core)

@afharo afharo linked an issue Mar 19, 2026 that may be closed by this pull request
@afharo afharo force-pushed the otel/http-instrumentation branch from 1552470 to 9c6ec95 Compare March 19, 2026 22:20
* Read incoming traceparent headers and create a new context with the traceparent set.
* This allows OpenTelemetry spans created in the next context to re-use the traceparent
* headers (and thus belonging to the same trace). It does not interfere with Elastic APM,
* and is temporary until we fully migrate to [OpenTelemetry
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment, it looks like we don't need this anymore.

cc @trentm, @david-luna can you help us confirm that we're implementing this the correct way?

this.server!.ext('onPreResponse', (request, responseToolkit) => {
const stop = (request.app as KibanaRequestState).measureElu;
const app = request.app as KibanaRequestState;
app.httpSpan?.updateName(`${request.route.method.toUpperCase()} ${request.route.path}`);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating it during the preResponse because the onRequest doesn't have the route figured out yet.

We need to update the name of the span because otherwise it only sets the http method as the name of the span.

Comment on lines +697 to +702
const span = trace.getTracer('kibana.http').startSpan(name);
context.with(context.active(), () => {
// Hold the active context until the otelSubSpan ends
context.bind(context.active(), span);
});
return span;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trentm, @david-luna, any suggestions about how to maintain the context active in these listener-based methods?
If you look at the image in the description of the PR, the pre and post route handlers look like sibling spans to the underlying db operations (instead of being their parents).

I assume that this is due to the context from this span not wrapping the logic of the middlewares.

Any idea how to solve this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like CodeRabbitAI is flagging a potential solution, but I cannot make it work.

Can you advise, please? 😇

...defaultConfig.kbnTestServer,
env: {
...defaultConfig.kbnTestServer.env,
...(shouldEnableTracing ? { KBN_OTEL_AUTO_INSTRUMENTATIONS: 'true' } : {}),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the env var is no-longer needed

Comment on lines -32 to -33
// and ensures context propagation for request-scoped correlation (like eval run ids).
new HapiInstrumentation(),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using Hapi's auto-instrumentation. Reasoning in the description.

Comment on lines +38 to +41
// require a parent so we don't create a new trace per request.
new UndiciInstrumentation({
requireParentforSpans: true,
}),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still doubt if we need Undici's instrumentation.

@afharo afharo added Team:Operations Kibana-Operations Team Team:QA Platform QA t// Team:obs-ai-team labels Mar 19, 2026
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/appex-qa (Team:QA)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0eb9a507-6ba1-4b58-b800-24281a7baf53

📥 Commits

Reviewing files that changed from the base of the PR and between f564c91 and 4481ec9.

📒 Files selected for processing (1)
  • .buildkite/scripts/steps/security/third_party_packages.txt
💤 Files with no reviewable changes (1)
  • .buildkite/scripts/steps/security/third_party_packages.txt

📝 Walkthrough

Walkthrough

This PR removes the @opentelemetry/instrumentation-hapi dependency and its allowlist/renovate entries, adds @kbn/tracing-utils to project references, and refactors HTTP tracing: the router now uses withActiveSpan instead of manual propagation extraction; the server creates and manages request-scoped OpenTelemetry spans and sub-spans across pre-route/route/post-route phases and records ELU metrics as span attributes; request state types were extended for OTEL spans. Auto-instrumentation now registers unconditionally and omits HapiInstrumentation, with HttpInstrumentation configured to ignore certain Elasticsearch-related outgoing requests. initTracing no longer calls setGlobalPropagator.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

This pull request removes the @opentelemetry/instrumentation-hapi package dependency across configuration and manifest files. It refactors the HTTP request router to use explicit span management via withActiveSpan instead of manual OpenTelemetry context extraction. The HTTP server instrumentation is enhanced to track OTEL spans across request lifecycle phases (pre-route, route, post-route), bind spans to request state, and record event-loop utilization as span attributes. Auto-instrumentation initialization is updated to register unconditionally and filter Elasticsearch outgoing requests. TypeScript and Moon project configurations are updated to reference the new tracing utilities.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/core/packages/http/server-internal/src/http_server.ts (1)

623-623: ⚠️ Potential issue | 🔴 Critical

createSubspan() never makes the new span current.

Line 700 binds the Span object itself, not a Context carrying that span, so the later Hapi lifecycle callbacks still run under the previous active context. Because Lines 623 and 688 rely on this helper, the new pre/post middleware spans won't become parents of the work they are supposed to group, which leaves the HTTP trace hierarchy incorrect.

In `@opentelemetry/api` for Node.js, does `context.bind(context.active(), span)` make `span` the active span for future async callbacks, or must the code propagate a `Context` created with `trace.setSpan(...)` / `context.with(...)` around the actual callback execution?

Also applies to: 688-688, 696-703

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/packages/http/server-internal/src/http_server.ts` at line 623, The
new span created by createSubspan(...) is being bound incorrectly as a Span
object instead of a Context, so it never becomes the active span for subsequent
async callbacks; change the code to create/propagate a Context carrying the span
(use trace.setSpan(context.active(), span) or context.with(...) to run callbacks
in that context) and store/attach that Context (e.g., app.otelSubSpanContext)
rather than the raw Span so downstream Hapi lifecycle callbacks become children
of the pre/post middleware spans; update createSubspan, the assignment at
app.otelSubSpan, and any places that call context.bind(...) (lines referenced by
createSubspan, and usages around 623, 688, 696-703) to use the Context-based
propagation approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/core/packages/http/server-internal/src/http_server.ts`:
- Line 623: The new span created by createSubspan(...) is being bound
incorrectly as a Span object instead of a Context, so it never becomes the
active span for subsequent async callbacks; change the code to create/propagate
a Context carrying the span (use trace.setSpan(context.active(), span) or
context.with(...) to run callbacks in that context) and store/attach that
Context (e.g., app.otelSubSpanContext) rather than the raw Span so downstream
Hapi lifecycle callbacks become children of the pre/post middleware spans;
update createSubspan, the assignment at app.otelSubSpan, and any places that
call context.bind(...) (lines referenced by createSubspan, and usages around
623, 688, 696-703) to use the Context-based propagation approach.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 29253b87-0c54-438a-b91e-74fa40f3124a

📥 Commits

Reviewing files that changed from the base of the PR and between 627e146 and f2ab5f3.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (10)
  • .buildkite/scripts/steps/security/third_party_packages.txt
  • package.json
  • renovate.json
  • src/core/packages/http/router-server-internal/moon.yml
  • src/core/packages/http/router-server-internal/src/router.ts
  • src/core/packages/http/router-server-internal/tsconfig.json
  • src/core/packages/http/server-internal/src/http_server.ts
  • src/core/packages/http/server/src/router/request.ts
  • src/platform/packages/shared/kbn-scout/src/servers/configs/config_sets/evals_tracing/stateful/classic.stateful.config.ts
  • src/platform/packages/shared/kbn-telemetry/src/init_autoinstrumentations.ts
💤 Files with no reviewable changes (4)
  • .buildkite/scripts/steps/security/third_party_packages.txt
  • src/platform/packages/shared/kbn-scout/src/servers/configs/config_sets/evals_tracing/stateful/classic.stateful.config.ts
  • renovate.json
  • package.json

Copy link
Copy Markdown
Contributor

@dmlemeshko dmlemeshko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/platform/packages/shared/kbn-scout/src/servers/configs/config_sets/evals_tracing/stateful/classic.stateful.config.ts changes LGTM

Co-authored-by: Ahmad Bamieh <ahmadbamieh@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/core/packages/http/server-internal/src/http_server.ts (1)

696-702: ⚠️ Potential issue | 🟠 Major

createSubspan() does not make the subspan current.

Line 700 binds the Span object itself, but OpenTelemetry's context.bind() only binds functions/EventEmitters and otherwise returns the target unchanged. So this helper never activates the subspan for later Hapi callbacks; if downstream spans are supposed to nest under these custom spans, the callback execution itself needs context.with(trace.setSpan(...), ...) or equivalent binding on the actual listener/emitter. (raw.githubusercontent.com)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/packages/http/server-internal/src/http_server.ts` around lines 696 -
702, createSubspan() currently calls context.bind() with the Span itself, which
doesn't make the span the active/current context; change the helper to activate
the span using context.with(trace.setSpan(context.active(), span), ...) and
ensure any callbacks/listeners created while the subspan should be active are
either executed inside that context.with block or have context.bind applied to
the function/emitter (not the span) so downstream Hapi callbacks nest under this
span; update createSubspan to wrap activation with trace.setSpan and return the
span while ensuring returned/created listeners are bound to the active context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/core/packages/http/server-internal/src/http_server.ts`:
- Around line 696-702: createSubspan() currently calls context.bind() with the
Span itself, which doesn't make the span the active/current context; change the
helper to activate the span using context.with(trace.setSpan(context.active(),
span), ...) and ensure any callbacks/listeners created while the subspan should
be active are either executed inside that context.with block or have
context.bind applied to the function/emitter (not the span) so downstream Hapi
callbacks nest under this span; update createSubspan to wrap activation with
trace.setSpan and return the span while ensuring returned/created listeners are
bound to the active context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 9823449a-6dcc-47af-a347-83cc228912fe

📥 Commits

Reviewing files that changed from the base of the PR and between f2ab5f3 and 796538b.

📒 Files selected for processing (1)
  • src/core/packages/http/server-internal/src/http_server.ts

@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Integration Tests #8 / Knowledge Base End-to-End Integration Test should save and retrieve knowledge base content through the complete flow

Metrics [docs]

Unknown metric groups

API count

id before after diff
@kbn/core-http-server 581 584 +3

History

cc @afharo

@afharo afharo enabled auto-merge (squash) March 24, 2026 15:09
@afharo
Copy link
Copy Markdown
Member Author

afharo commented Mar 24, 2026

/sync-ci

@afharo afharo merged commit a64caae into elastic:main Mar 24, 2026
20 checks passed
@afharo afharo deleted the otel/http-instrumentation branch March 24, 2026 15:39
mbondyra added a commit to mbondyra/kibana that referenced this pull request Mar 24, 2026
…ra/kibana into dashboard_align_attachment_to_api

* 'dashboard_align_attachment_to_api' of github.com:mbondyra/kibana: (45 commits)
  [OTel Tracing] HTTP instrumentation (elastic#258663)
  Replace deprecated EUI icons in files owned by @elastic/ml-ui (elastic#255624)
  [Codeowners] add missing codeowners for security_solution_api_integration tests (elastic#259223)
  [CI] fix bad imports that came from a merge-race (elastic#259383)
  Add `.claude/worktrees/` to `.gitignore` (elastic#259192)
  Improve unknown-key validation error message in @kbn/config-schema (elastic#258633)
  [ML] Update Security ML jobs to use entity analytics fields for host and user fields (elastic#255339)
  [Table sweep] Update table columns responsiveness in Index Management and Dashboards (elastic#259340)
  skip failing test suite (elastic#258790)
  skip failing test suite (elastic#259261)
  chore: util to clean cached images (elastic#259335)
  [Entity Store] Use last_seen for automated resolution watermark (elastic#258574)
  [One Workflow] Fix flaky alert trigger Scout test by removing order-dependent assertions (elastic#259299)
  Skip serverless Discover request counts tests for MKI (elastic#259333)
  [Security Solution] render header title in new document flyout in Security Solution and Discover (elastic#258166)
  [Agent Builder] register inference endpoint feature (elastic#259259)
  [Agent Builder] Skills Command Menu - Add descriptions and scope options to agent (elastic#258964)
  [Streams][Streamlang][API] Fully use meta({id}) to reuse schema partials in OAS output (elastic#259275)
  fix(files_example): add tableCaption to EuiInMemoryTable for a11y (elastic#258289)
  [Entity Store] Adding list endpoint with query filter (elastic#258320)
  ...
viduni94 added a commit that referenced this pull request Mar 25, 2026
…ector resolution (#259446)

Closes #259472

## Summary

Fixes two issues breaking `kbn-evals` runs (both local and CI):

### 1. APM / OpenTelemetry tracing conflict
A recent validation in `initTelemetry`
(#258303,
#258663) throws when Elastic APM
and OpenTelemetry tracing are both active. The `evals_tracing` Scout
config enables OTel tracing but didn't explicitly disable APM, causing
Kibana (and the Playwright worker) to crash on startup.

Fix:
- Added a `coerceCliValue` helper in `applyConfigOverrides`
(`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and
numeric strings to numbers before they're set in the config object.
- Added `--elastic.apm.active=false` and
`--elastic.apm.contextPropagationOnly=false` to the `evals_tracing`
Scout server config and to `require_init_apm.js` (for the Playwright
worker when `TRACING_EXPORTERS` is set).
- Updated the `kbn-evals` README to document the required APM settings
when configuring tracing in `kibana.dev.yml`.

### 2. Inference endpoint connector resolution
#258530 consolidated LLM connector
listing through the inference plugin's `getConnectorList()`, which now
returns inference endpoint IDs (e.g.:
`.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack
connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was
still passing the stack connector key to the inference API, which then
tried to execute it as a Kibana action - resulting in "Saved object
`[action/.anthropic-claude-4.6-opus-chat_completion]` not found".

Fix:
- `createConnectorFixture` now detects `.inference-type` connectors and
extracts their `inferenceId` from the config, using the ES inference
endpoint ID directly. This bypasses the Kibana actions framework and
aligns with the unified connector model from
[#258530](#258530).

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
## Summary

This PR instruments Kibana's HTTP server with OTel Traces.

It achieves this by using the `@opentelemetry/instrumentation-http`
autoinstrumentation + some custom spans for identifying the pre-handlers
+ handler + post-handlers.

It removes the package `@opentelemetry/instrumentation-hapi` because it
was overly noisy as it reported a span for each pre/post handler. It
also made it more difficult to access the HTTP span to add our custom
labels to it.

It aimed for parity with the existing Elastic APM instrumentation:

| **Elastic APM** | **OTel Traces** |
|--------|--------|
| <img width="961" height="663" alt="image"
src="https://github.com/user-attachments/assets/dc52b1b9-b5ad-4fcb-a68e-35735fc01d98"
/> | <img width="1439" height="681" alt="image"
src="https://github.com/user-attachments/assets/eaeb9b22-a224-4893-bba1-fa4adf81f71f"
/> |

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [x] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks


- [x] It is now enabling auto-instrumentations as soon as OTel Tracing
is enabled. There is the risk of performance hit, as they use
`require-in-the-middle`. As long as we make sure that Elastic APM and
OTel Tracing are not enabled, it should be fine.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Ahmad Bamieh <ahmadbamieh@gmail.com>
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
…ector resolution (elastic#259446)

Closes elastic#259472

## Summary

Fixes two issues breaking `kbn-evals` runs (both local and CI):

### 1. APM / OpenTelemetry tracing conflict
A recent validation in `initTelemetry`
(elastic#258303,
elastic#258663) throws when Elastic APM
and OpenTelemetry tracing are both active. The `evals_tracing` Scout
config enables OTel tracing but didn't explicitly disable APM, causing
Kibana (and the Playwright worker) to crash on startup.

Fix:
- Added a `coerceCliValue` helper in `applyConfigOverrides`
(`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and
numeric strings to numbers before they're set in the config object.
- Added `--elastic.apm.active=false` and
`--elastic.apm.contextPropagationOnly=false` to the `evals_tracing`
Scout server config and to `require_init_apm.js` (for the Playwright
worker when `TRACING_EXPORTERS` is set).
- Updated the `kbn-evals` README to document the required APM settings
when configuring tracing in `kibana.dev.yml`.

### 2. Inference endpoint connector resolution
elastic#258530 consolidated LLM connector
listing through the inference plugin's `getConnectorList()`, which now
returns inference endpoint IDs (e.g.:
`.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack
connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was
still passing the stack connector key to the inference API, which then
tried to execute it as a Kibana action - resulting in "Saved object
`[action/.anthropic-claude-4.6-opus-chat_completion]` not found".

Fix:
- `createConnectorFixture` now detects `.inference-type` connectors and
extracts their `inferenceId` from the config, using the ES inference
endpoint ID directly. This bypasses the Kibana actions framework and
aligns with the unified connector model from
[elastic#258530](elastic#258530).

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
markov00 pushed a commit to markov00/kibana that referenced this pull request Mar 26, 2026
…ector resolution (elastic#259446)

Closes elastic#259472

## Summary

Fixes two issues breaking `kbn-evals` runs (both local and CI):

### 1. APM / OpenTelemetry tracing conflict
A recent validation in `initTelemetry`
(elastic#258303,
elastic#258663) throws when Elastic APM
and OpenTelemetry tracing are both active. The `evals_tracing` Scout
config enables OTel tracing but didn't explicitly disable APM, causing
Kibana (and the Playwright worker) to crash on startup.

Fix:
- Added a `coerceCliValue` helper in `applyConfigOverrides`
(`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and
numeric strings to numbers before they're set in the config object.
- Added `--elastic.apm.active=false` and
`--elastic.apm.contextPropagationOnly=false` to the `evals_tracing`
Scout server config and to `require_init_apm.js` (for the Playwright
worker when `TRACING_EXPORTERS` is set).
- Updated the `kbn-evals` README to document the required APM settings
when configuring tracing in `kibana.dev.yml`.

### 2. Inference endpoint connector resolution
elastic#258530 consolidated LLM connector
listing through the inference plugin's `getConnectorList()`, which now
returns inference endpoint IDs (e.g.:
`.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack
connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was
still passing the stack connector key to the inference API, which then
tried to execute it as a Kibana action - resulting in "Saved object
`[action/.anthropic-claude-4.6-opus-chat_completion]` not found".

Fix:
- `createConnectorFixture` now detects `.inference-type` connectors and
extracts their `inferenceId` from the config, using the ES inference
endpoint ID directly. This bypasses the Kibana actions framework and
aligns with the unified connector model from
[elastic#258530](elastic#258530).

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
@afharo afharo mentioned this pull request Mar 26, 2026
3 tasks
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Apr 1, 2026
…ector resolution (elastic#259446)

Closes elastic#259472

## Summary

Fixes two issues breaking `kbn-evals` runs (both local and CI):

### 1. APM / OpenTelemetry tracing conflict
A recent validation in `initTelemetry`
(elastic#258303,
elastic#258663) throws when Elastic APM
and OpenTelemetry tracing are both active. The `evals_tracing` Scout
config enables OTel tracing but didn't explicitly disable APM, causing
Kibana (and the Playwright worker) to crash on startup.

Fix:
- Added a `coerceCliValue` helper in `applyConfigOverrides`
(`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and
numeric strings to numbers before they're set in the config object.
- Added `--elastic.apm.active=false` and
`--elastic.apm.contextPropagationOnly=false` to the `evals_tracing`
Scout server config and to `require_init_apm.js` (for the Playwright
worker when `TRACING_EXPORTERS` is set).
- Updated the `kbn-evals` README to document the required APM settings
when configuring tracing in `kibana.dev.yml`.

### 2. Inference endpoint connector resolution
elastic#258530 consolidated LLM connector
listing through the inference plugin's `getConnectorList()`, which now
returns inference endpoint IDs (e.g.:
`.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack
connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was
still passing the stack connector key to the inference API, which then
tried to execute it as a Kibana action - resulting in "Saved object
`[action/.anthropic-claude-4.6-opus-chat_completion]` not found".

Fix:
- `createConnectorFixture` now detects `.inference-type` connectors and
extracts their `inferenceId` from the config, using the ES inference
endpoint ID directly. This bypasses the Kibana actions framework and
aligns with the unified connector model from
[elastic#258530](elastic#258530).

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request Apr 2, 2026
…ector resolution (elastic#259446)

Closes elastic#259472

## Summary

Fixes two issues breaking `kbn-evals` runs (both local and CI):

### 1. APM / OpenTelemetry tracing conflict
A recent validation in `initTelemetry`
(elastic#258303,
elastic#258663) throws when Elastic APM
and OpenTelemetry tracing are both active. The `evals_tracing` Scout
config enables OTel tracing but didn't explicitly disable APM, causing
Kibana (and the Playwright worker) to crash on startup.

Fix:
- Added a `coerceCliValue` helper in `applyConfigOverrides`
(`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and
numeric strings to numbers before they're set in the config object.
- Added `--elastic.apm.active=false` and
`--elastic.apm.contextPropagationOnly=false` to the `evals_tracing`
Scout server config and to `require_init_apm.js` (for the Playwright
worker when `TRACING_EXPORTERS` is set).
- Updated the `kbn-evals` README to document the required APM settings
when configuring tracing in `kibana.dev.yml`.

### 2. Inference endpoint connector resolution
elastic#258530 consolidated LLM connector
listing through the inference plugin's `getConnectorList()`, which now
returns inference endpoint IDs (e.g.:
`.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack
connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was
still passing the stack connector key to the inference API, which then
tried to execute it as a Kibana action - resulting in "Saved object
`[action/.anthropic-claude-4.6-opus-chat_completion]` not found".

Fix:
- `createConnectorFixture` now detects `.inference-type` connectors and
extracts their `inferenceId` from the config, using the ES inference
endpoint ID directly. This bypasses the Kibana actions framework and
aligns with the unified connector model from
[elastic#258530](elastic#258530).

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting OpenTelemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// Team:obs-ai-team Team:Operations Kibana-Operations Team Team:QA Platform QA t// v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Server-side OTel] HTTP auto-instrumentation: default and performance

6 participants