Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Docker-based GraalVM native-image support via a new NativeImageBuildTask and native-image tasks/artifacts for Linux x86_64 and aarch64. Introduces a server-launcher component: LaunchDescriptor (binary format), ServerLauncher runtime, and shared utilities (ProcessUtil, ErrorPumpThread, ServerProcess). ServerCli now emits launch descriptors instead of starting processes; ServerProcessBuilder was removed. Docker resolution and per-run docker executable propagation were added. Distribution packaging and startup scripts were updated to prefer native launchers with a Java fallback. Multiple tests and build configurations were added or updated. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
f212711 to
b237730
Compare
3a3c06f to
bc7e013
Compare
|
All the existing CI failures are either unrelated or #144072, which is also unrelated. |
rjernst
left a comment
There was a problem hiding this comment.
Overall this looks good. I have a slight dislike for the name "launcher" because when -d isn't used it lives for the lifetime of ES. What about "runner"? That could be in a followup.
...n/tools/server-launcher/src/main/java/org/elasticsearch/server/launcher/ErrorPumpThread.java
Show resolved
Hide resolved
| /** | ||
| * Test CliToolProvider that supplies {@link RedirectTestCommand} for redirect tests. | ||
| */ | ||
| public class RedirectTestCliToolProvider implements CliToolProvider { |
There was a problem hiding this comment.
did you mean for this to be in production code?
There was a problem hiding this comment.
No, this is a test fixture.
rjernst
left a comment
There was a problem hiding this comment.
my comments are mostly optional, as long as we didn't lose the filtering of those log messages
…elocations * upstream/main: (33 commits) Unmute InferenceRestIT and DefaultEndPointsIT (elastic#144217) feat: add keep_alive to async task status (elastic#144010) Add explicit isNoOpUpdate() method to MapperService (elastic#144113) Always attach APM Agent (elastic#144120) Fix random_score nightly tests (elastic#144176) Add nested query checks for disabled sequence numbers (elastic#144185) Return sentinel values from Fetch when sequence numbers are disabled (elastic#144212) [Test] Test peer-recovery with sequence numbers pruning (elastic#144116) Remove `scaled-*` field assertions from mixed cluster downsampling test (elastic#144295) Refactor: Use range syntax in ES|QL exponential histogram tests (elastic#144110) Move resolve aliases to IndexAbstractionOptions (elastic#143953) unmute test (elastic#144299) Fix approximation csvtests (elastic#144233) fix test (elastic#144171) Add int4 vector scoring benchmarks (elastic#144105) Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#143023 Mute org.elasticsearch.test.apmintegration.MetricsApmIT testApmIntegration {withOTel=false} elastic#144282 Native cli launcher (elastic#143712) Mute org.elasticsearch.xpack.esql.qa.multi_node.GenerativeIT test elastic#143023 Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackSubqueryIT testManyRandomKeywordFieldsInSubqueryIntermediateResults elastic#144274 ...
|
@mark-vieira 👋 , could you please let us know if it is expected behaviour? before this PR: after this PR: |
|
@wwang500 this is a side-effect of some of the change and is indeed expected behavior. Does this cause issues anywhere? |
Right now it causes some internal ml automated tests failing (on_prem only), as we use that |
Hmm. I can see how this would be weird with automation. I'll look at potentially forwarding this stuff to stdout to keep the existing behavior. |
This reverts commit 8c783ab.
Skips the optional elasticsearch native launcher build step introduced in elastic/elasticsearch#143712. We're running this build using docker-in-docker and required host filesystem paths are not available. As a follow up, we can look into splitting the docker build (the DinD portion) out from of the native build. Fixes https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7896 Test build https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7899 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Optimized build process for distribution and cloud image packages by streamlining compilation steps. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Skips the optional elasticsearch native launcher build step introduced in elastic/elasticsearch#143712. We're running this build using docker-in-docker and required host filesystem paths are not available. As a follow up, we can look into splitting the docker build (the DinD portion) out from of the native build. Fixes https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7896 Test build https://buildkite.com/elastic/kibana-elasticsearch-snapshot-build/builds/7899 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Optimized build process for distribution and cloud image packages by streamlining compilation steps. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Server launcher CLI: split preparer/launcher and optional GraalVM native image
Summary
The Elasticsearch server startup path is refactored into a preparer (existing server-cli) and a launcher (new server-launcher). The startup script runs either a Linux native binary (when present) or falls back to running the launcher on the JVM. This enables a small, JDK-only native launcher on Linux while keeping a single code path and fallback on all platforms.
Motivation
The CLI launcher was a long-lived process: it started the preparer, then started the Elasticsearch server JVM and stayed running for the life of the node (pumping stderr, handling shutdown, etc.). That meant two long-lived JVMs—launcher and server—each with its own heap and metaspace. On Linux, the launcher is now a native binary instead of a JVM. It still does the same job (run preparer, then server, pump output, wait for exit), but as a small native process. So on Linux there is now only one long-lived JVM—the Elasticsearch server itself—reducing the memory footprint of the startup path and of the running node.
Design decisions
Preparer (server-cli): does all heavy work (options, secure settings, auto-config, plugin sync, JVM options, etc.) and keeps full Elasticsearch dependencies. Launcher (server-launcher): only spawns the preparer, reads its output, then starts the server JVM. It depends only on JDK + server-launcher-common (no ES libs), so it stays GraalVM-native-image friendly. Shared data is a small LaunchDescriptor (JDK-only serialization in server-launcher-common).
The launcher runs the preparer with stdout redirected to a pipe and ES_REDIRECT_STDOUT_TO_STDERR=true. The preparer writes the binary LaunchDescriptor to its stdout (the pipe); user-facing output goes to stderr. The launcher reads the descriptor from the preparer’s stdout, then starts the server process. No temp files; simple and scriptable.
Native launcher builds use Docker (NativeImageBuildTask): a fixed GraalVM OL8 image and --platform linux/amd64 or linux/aarch64. This gives reproducible builds and cross-architecture builds (e.g. build x86_64 on aarch64 and vice versa via Docker emulation) without requiring a host GraalVM install.
Native binaries are built and shipped only for Linux (x86_64 and aarch64). Windows and Darwin native builds are out of scope for now to avoid platform-specific native-image and distribution complexity.
The startup script checks for an executable native launcher (e.g. $ES_HOME/lib/tools/server-launcher/server-launcher). If it’s not present or not executable (Windows, Darwin, or no native build), it falls back to java … org.elasticsearch.server.launcher.ServerLauncher with the same arguments. Behavior is identical; only the entry point (native vs JVM) changes.
Other changes
• ServerProcessBuilder removed; process construction lives in the launcher and tests.
• Windows service and packaging updated to use the new launcher path and to assert Linux distributions include the native launcher where applicable.
• RedirectedStdoutTerminal in cli-launcher: when ES_REDIRECT_STDOUT_TO_STDERR is set, user output goes to stderr and the real stdout is used for binary (e.g. descriptor) only.