Skip to content

Native cli launcher#143712

Merged
mark-vieira merged 41 commits intoelastic:mainfrom
mark-vieira:native-cli-launcher
Mar 15, 2026
Merged

Native cli launcher#143712
mark-vieira merged 41 commits intoelastic:mainfrom
mark-vieira:native-cli-launcher

Conversation

@mark-vieira
Copy link
Copy Markdown
Contributor

@mark-vieira mark-vieira commented Mar 5, 2026

Server launcher CLI: split preparer/launcher and optional GraalVM native image

Summary

The Elasticsearch server startup path is refactored into a preparer (existing server-cli) and a launcher (new server-launcher). The startup script runs either a Linux native binary (when present) or falls back to running the launcher on the JVM. This enables a small, JDK-only native launcher on Linux while keeping a single code path and fallback on all platforms.

Motivation

The CLI launcher was a long-lived process: it started the preparer, then started the Elasticsearch server JVM and stayed running for the life of the node (pumping stderr, handling shutdown, etc.). That meant two long-lived JVMs—launcher and server—each with its own heap and metaspace. On Linux, the launcher is now a native binary instead of a JVM. It still does the same job (run preparer, then server, pump output, wait for exit), but as a small native process. So on Linux there is now only one long-lived JVM—the Elasticsearch server itself—reducing the memory footprint of the startup path and of the running node.

Design decisions

  1. Split into preparer vs launcher (GraalVM-friendly)
    Preparer (server-cli): does all heavy work (options, secure settings, auto-config, plugin sync, JVM options, etc.) and keeps full Elasticsearch dependencies. Launcher (server-launcher): only spawns the preparer, reads its output, then starts the server JVM. It depends only on JDK + server-launcher-common (no ES libs), so it stays GraalVM-native-image friendly. Shared data is a small LaunchDescriptor (JDK-only serialization in server-launcher-common).
  2. Preparer → launcher over stdout
    The launcher runs the preparer with stdout redirected to a pipe and ES_REDIRECT_STDOUT_TO_STDERR=true. The preparer writes the binary LaunchDescriptor to its stdout (the pipe); user-facing output goes to stderr. The launcher reads the descriptor from the preparer’s stdout, then starts the server process. No temp files; simple and scriptable.
  3. Docker-based GraalVM native-image
    Native launcher builds use Docker (NativeImageBuildTask): a fixed GraalVM OL8 image and --platform linux/amd64 or linux/aarch64. This gives reproducible builds and cross-architecture builds (e.g. build x86_64 on aarch64 and vice versa via Docker emulation) without requiring a host GraalVM install.
  4. Linux-only for native images
    Native binaries are built and shipped only for Linux (x86_64 and aarch64). Windows and Darwin native builds are out of scope for now to avoid platform-specific native-image and distribution complexity.
  5. Fallback when no native binary
    The startup script checks for an executable native launcher (e.g. $ES_HOME/lib/tools/server-launcher/server-launcher). If it’s not present or not executable (Windows, Darwin, or no native build), it falls back to java … org.elasticsearch.server.launcher.ServerLauncher with the same arguments. Behavior is identical; only the entry point (native vs JVM) changes.

Other changes

• ServerProcessBuilder removed; process construction lives in the launcher and tests.
• Windows service and packaging updated to use the new launcher path and to assert Linux distributions include the native launcher where applicable.
• RedirectedStdoutTerminal in cli-launcher: when ES_REDIRECT_STDOUT_TO_STDERR is set, user output goes to stderr and the real stdout is used for binary (e.g. descriptor) only.

@mark-vieira mark-vieira added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >refactoring :Core/Infra/CLI CLI utilities, scripts, and infrastructure test-arm Pull Requests that should be tested against arm agents labels Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Docker-based GraalVM native-image support via a new NativeImageBuildTask and native-image tasks/artifacts for Linux x86_64 and aarch64. Introduces a server-launcher component: LaunchDescriptor (binary format), ServerLauncher runtime, and shared utilities (ProcessUtil, ErrorPumpThread, ServerProcess). ServerCli now emits launch descriptors instead of starting processes; ServerProcessBuilder was removed. Docker resolution and per-run docker executable propagation were added. Distribution packaging and startup scripts were updated to prefer native launchers with a Java fallback. Multiple tests and build configurations were added or updated.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

@mark-vieira mark-vieira force-pushed the native-cli-launcher branch 2 times, most recently from f212711 to b237730 Compare March 6, 2026 18:10
@mark-vieira mark-vieira force-pushed the native-cli-launcher branch from 3a3c06f to bc7e013 Compare March 9, 2026 21:03
@mark-vieira
Copy link
Copy Markdown
Contributor Author

All the existing CI failures are either unrelated or #144072, which is also unrelated.

Copy link
Copy Markdown
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. I have a slight dislike for the name "launcher" because when -d isn't used it lives for the lifetime of ES. What about "runner"? That could be in a followup.

/**
* Test CliToolProvider that supplies {@link RedirectTestCommand} for redirect tests.
*/
public class RedirectTestCliToolProvider implements CliToolProvider {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean for this to be in production code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a test fixture.

Copy link
Copy Markdown
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comments are mostly optional, as long as we didn't lose the filtering of those log messages

@mark-vieira mark-vieira merged commit 8c783ab into elastic:main Mar 15, 2026
43 of 49 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Mar 16, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 16, 2026
…elocations

* upstream/main: (33 commits)
  Unmute InferenceRestIT and DefaultEndPointsIT (elastic#144217)
  feat: add keep_alive to async task status (elastic#144010)
  Add explicit isNoOpUpdate() method to MapperService (elastic#144113)
  Always attach APM Agent (elastic#144120)
  Fix random_score nightly tests (elastic#144176)
  Add nested query checks for disabled sequence numbers (elastic#144185)
  Return sentinel values from Fetch when sequence numbers are disabled (elastic#144212)
  [Test] Test peer-recovery with sequence numbers pruning (elastic#144116)
  Remove `scaled-*` field assertions from mixed cluster downsampling test (elastic#144295)
  Refactor: Use range syntax in ES|QL exponential histogram tests (elastic#144110)
  Move resolve aliases to IndexAbstractionOptions (elastic#143953)
  unmute test (elastic#144299)
  Fix approximation csvtests (elastic#144233)
  fix test (elastic#144171)
  Add int4 vector scoring benchmarks (elastic#144105)
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#143023
  Mute org.elasticsearch.test.apmintegration.MetricsApmIT testApmIntegration {withOTel=false} elastic#144282
  Native cli launcher (elastic#143712)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#143023
  Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackSubqueryIT testManyRandomKeywordFieldsInSubqueryIntermediateResults elastic#144274
  ...
@wwang500
Copy link
Copy Markdown
Contributor

wwang500 commented Mar 16, 2026

@mark-vieira 👋 , could you please let us know if it is expected behaviour?

before this PR:
when I ran "~/elasticsearch/bin/elasticsearch --version", the version was printed to stdout,
es_bin_output: CompletedProcess(args=['/Users/weiwang/.qaf/data/distributions/MSB/9.3.2/9.3-80/elasticsearch/bin/elasticsearch', '--version'], returncode=0, stdout='Version: 9.3.2-SNAPSHOT, Build: tar/43a703737aab6baefa748bc7b69e4054926f2b2c/2026-03-16T10:33:08.997464325Z, JVM: 25.0.2\n', stderr='')

after this PR:
when I ran "~/elasticsearch/bin/elasticsearch --version", the version was printed to stderr,
CompletedProcess(args=['/Users/weiwang/.qaf/data/distributions/MSB/9.4.0/main-3725/elasticsearch/bin/elasticsearch', '--version'], returncode=0, stdout='', stderr='Version: 9.4.0-SNAPSHOT, Build: tar/08badde5669624e3a53e9a2f13c0a4e83caf3114/2026-03-16T10:58:09.531920866Z, JVM: 25.0.2\n')

@mark-vieira
Copy link
Copy Markdown
Contributor Author

@wwang500 this is a side-effect of some of the change and is indeed expected behavior. Does this cause issues anywhere?

@wwang500
Copy link
Copy Markdown
Contributor

@wwang500 this is a side-effect of some of the change and is indeed expected behavior. Does this cause issues anywhere?

Right now it causes some internal ml automated tests failing (on_prem only), as we use that stdout to determine the es version. If it is expected behavior, we will adjust our code. Thanks for confirmation, @mark-vieira.

@mark-vieira
Copy link
Copy Markdown
Contributor Author

Right now it causes some internal ml automated tests failing (on_prem only), as we use that stdout to determine the es version. If it is expected behavior, we will adjust our code. Thanks for confirmation, @mark-vieira.

Hmm. I can see how this would be weird with automation. I'll look at potentially forwarding this stuff to stdout to keep the existing behavior.

@mark-vieira
Copy link
Copy Markdown
Contributor Author

@wwang500 I've opened #144356 to restore the original behavior of things like --version.

mamazzol added a commit to mamazzol/elasticsearch that referenced this pull request Mar 17, 2026
GalLalouche pushed a commit to GalLalouche/elasticsearch that referenced this pull request Mar 18, 2026
@wwang500
Copy link
Copy Markdown
Contributor

@wwang500 I've opened #144356 to restore the original behavior of things like --version.

I can confirm the version is back to stdout after PR144356, thanks.

delanni pushed a commit to elastic/kibana that referenced this pull request Mar 19, 2026
Skips the optional elasticsearch native launcher build step introduced
in elastic/elasticsearch#143712. We're running
this build using docker-in-docker and required host filesystem paths are
not available.

As a follow up, we can look into splitting the docker build (the DinD
portion) out from of the native build.

Fixes
https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7896
Test build
https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7899

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Optimized build process for distribution and cloud image packages by
streamlining compilation steps.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
Skips the optional elasticsearch native launcher build step introduced
in elastic/elasticsearch#143712. We're running
this build using docker-in-docker and required host filesystem paths are
not available.

As a follow up, we can look into splitting the docker build (the DinD
portion) out from of the native build.

Fixes
https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7896
Test build
https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7899

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Optimized build process for distribution and cloud image packages by
streamlining compilation steps.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/CLI CLI utilities, scripts, and infrastructure :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >refactoring serverless-linked Added by automation, don't add manually Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team test-arm Pull Requests that should be tested against arm agents test-windows Trigger CI checks on Windows v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants