[9.3] ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (#143030) by kanoshiou · Pull Request #144200 · elastic/elasticsearch

kanoshiou · 2026-03-13T14:18:28Z

Backport

This will backport the following commits from main to 9.3:

ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (#143030)

Questions ?

Please refer to the Backport tool documentation

…lastic#143568) * Adds alerting v2 views permissions. * Change alerting v2 view privileges to create_view only. * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

…143379)

…ic#143662)

…lastic#143728) Updated `.coderabbit.yml` to add `review_status: false`, preventing unnecessary "Review skipped" comments on pull requests without matching labels.

Move TSDBSyntheticIdUpgradeIT to x-pack/plugin/logsdb/qa/rolling-upgrade This module is shadowed in serverless which means the test will also test rolling upgrade in serverless. * Use TIME_SERIES_SYNTHETIC_ID node feature in upgrade test, remove IndexVersion * Avoid analyse disk size in serverless, because that api is not available * Add logging to track failure * Add writing to all existing indices throughout the rolling upgrade --------- Co-authored-by: Cursor <cursoragent@cursor.com>

* IndexRoutingTests with and without synthetic id Time_series index routing tests now randomly run with index.mapping.synthetic_id true or false when the feature flag allow it. The setting is set explicitly (when supported) so behaviour stays consistent if the default changes. (This is preparatory work for enabling synthetic id by default) - Introduce TimeSeriesRoutingFixture (routing + useSyntheticId) and pass it into assertIndexShard; - idForReadPath (called from assertIndexShard) is used to generate id based on IndexRouting and useSyntheticId - BWC tests keep deterministic routing by passing useSyntheticId=false. and introduce new BWC tests for useSyntheticId=true - testRerouteToTargetTsid randomly enables synthetic id; - testRerouteToTargetTsidBeforeWrittenToRouting doesn't set synthetic id because indexVersion is too low. also: - cleanup throws IOException Written together with Cursor

…43601)

## Summary This PR fixes an issue where valid ESQL `ROW` commands failed when a field referenced a previously defined field within the same command (e.g., `ROW x = 4, y = 2, z = x + y`). ## Changes - **LogicalPlanBuilder**: Removed the call to `mergeOutputExpressions` in `visitRowCommand`. We now intentionally defer duplicate field handling and reference resolution to the Analyzer. - **AnalyzerRules**: Updated `ResolveRefs` to specifically **not** skip `Row` nodes, even if they flag as `resolved()` (which happens when they only contain literals). - **Analyzer**: Implemented `resolveRow` within the `ResolveRefs` rule. This logic mimics `resolveEval` but for literals: - It iterates through the fields sequentially. - It allows later fields to resolve references to earlier fields by substituting them with their defined expressions. - It handles field shadowing by removing earlier definitions if a duplicate name is encountered. ## Related Issue Closes elastic#140119

These tests require a feature flag to be enabled to pass, which is not available in nightly builds.

…ic#143460) This PR fixes a bug in the behavior of SET UNMAPPED_FIELDS=LOAD where a potentially unmapped field would be pushed down to Lucene, and, when it doesn't exist in the mapping, would cause wrong results. Resolves: elastic#141920, elastic#141925.

This PR wires ES|QL views into index authorization and view resolution so view expansion preserves the same security/error semantics as direct index access. - Adds `EsqlResolveViewAction` plus `ViewResolutionService` to resolve views from project metadata using `ResolvedIndexExpressions`, including strict handling for unauthorized and missing concrete targets. - Refactors `ViewResolver` to resolve views asynchronously, recurse through nested/wildcard view references, detect circular references/max depth, and keep unresolved non-view targets in the rewritten plan rather than dropping them. - Updates `EsqlSession` to run view replacement asynchronously and carry `viewQueries` in `Configuration`, so source/view context survives later planning and serialization. - Updates view CRUD/read request handling to resolve views via indices options (`resolveViews=true`) and reuse the same resolution path for `GET view`. - Aligns security behavior by persisting resolved expressions for view-resolving requests and by handling view abstractions correctly in index-permission resource accounting (including selector/failure-store edge cases).

…3724)

The default int8_hnsw index type quantizes float32 vectors to int8, introducing enough scoring error to non-deterministically reorder documents with close cosine similarities. With only 4 dimensions the quantization is particularly coarse. Use explicit hnsw index type on test dense_vector mappings to get exact float scoring and deterministic KNN result ordering. Update expected results to reflect exact cosine ordering. Closes elastic#143430 Closes elastic#143609

…ces (elastic#143703) Reduces overhead on the hot path for external datasource I/O across columnar formats (ORC, Parquet) and text formats (CSV, NDJSON). **I/O SPI**: Adds positional `readBytes(long, ByteBuffer)` to StorageObject so providers can use native buffer I/O instead of stream-based reads. Local files use FileChannel, GCS uses ReadChannel, others get a stream-based default. **ORC**: Replaces per-element Block.Builder loops with bulk array operations via `ColumnBlockConversions`. The no-null path does `Arrays.copyOf` + vector wrap; nulls use a BitSet from the boolean mask. **Parquet**: Replaces `GroupRecordConverter` row-at-a-time with `ColumnReadStoreImpl` column-at-a-time decoding. Eliminates Group allocation, per-row dispatch, and the `getString`/`getBytes` round-trip for binary columns. **Parallel text parsing**: Introduces `SegmentableFormatReader` SPI and `ParallelParsingCoordinator` for intra-file parallel parsing of line-oriented formats. A file is split into byte-range segments at record boundaries, each segment is parsed on a separate thread, and results are reassembled in order. CSV boundary detection is quote-aware (RFC 4180, handles escaped `""`). The coordinator handles executor rejection, uses volatile signaling for cross-thread close, and applies a bounded timeout on shutdown. The parallel parsing path was iteratively optimized: first for correctness and safety, then by replacing byte-by-byte boundary scanning with 8 KB buffered reads, and finally by eliminating the `InputStream → Reader → String → BytesRef` chain in favor of direct `byte[]` scanning with zero-copy `BytesRef` views. ### JMH results (`ParallelParsingBenchmark`, 200K lines) | Threads | Baseline | After | Improvement | |---|---|---|---| | 1 | 10.99 ms | 4.02 ms | **-63%** | | 2 | 12.06 ms | 4.27 ms | **-65%** | | 4 | 10.91 ms | 3.60 ms | **-67%** | Developed using AI-assisted tooling

…43673) * Enhance monitoring template by adding 'mode' and 'codec' fields to monitoring mapping * Added changelog entry * Fixed tests * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

…FragmentExec (elastic#143696) Wrap ExternalRelation in FragmentExec (like EsRelation) so that pipeline breakers (Aggregate, Limit, TopN) above external sources are naturally distributed to data nodes via ExchangeExec. Previously, ExternalRelation was mapped directly to ExternalSourceExec on the coordinator, bypassing the FragmentExec-based distribution that EsRelation uses. This meant all three data-node optimization stages were skipped: LocalLogicalPlanOptimizer, LocalMapper, and LocalPhysicalPlanOptimizer (including filter pushdown via FilterPushdownRegistry). The filter pushdown is especially critical since PushFiltersToSource creates an opaque pushedFilter on ExternalSourceExec that is not serializable and must be created on the same JVM where the operator runs. With this change, on data nodes localPlan() expands the FragmentExec through LocalMapper into ExternalSourceExec, enabling local optimizations. The coordinator-only path correctly discovers and injects splits after localPlan expansion. AdaptiveStrategy now also recognizes TopN as a pipeline breaker for distribution decisions. ReplaceFieldWithConstantOrNull is taught to retain fields from ExternalRelation (like it already does for lookup index fields) since SearchStats is empty for external sources. ExchangeExec nodes wrapping external FragmentExec are collapsed and TopNExec InputOrdering is reset when falling back to coordinator-only execution. Developed using AI-assisted tooling

Integrates `RecyclerBytesStreamOutput` with an optional `CircuitBreaker` that tracks the pages in use as they are obtained from the recycler.

…traction (elastic#143702) * JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction Replace copyCurrentStructure() re-serialization with zero-copy byte slicing for JSON input. When the extracted value is an object, array, or number, slice bytes directly from the input buffer using XContentLocation.byteOffset() offsets (exposed in elastic#143501). Also refactors navigation from recursive descent to iterative loop, confining raw byte access to the extraction point. Adds JMH benchmarks for JSON_EXTRACT through the full eval pipeline. * Add changelog for elastic#143702 * [CI] Auto commit changes from spotless * Clean up navigation helpers to avoid threading unused parameters Navigation methods now only position the parser — they no longer carry builder, segments, depth, rawBytes, or rawOffset. * Use full variable names instead of abbreviations --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

In elastic#143252 we saw a case where a chunked response appears to have computed far too many chunks at once, with some indication that it might relate to client cancellation. We have a test for this case already but it's a little weak. This commit reworks the test to use a Netty HTTP client giving much deeper control over the client's behaviour, allowing to close the connection with either a FIN or a RST and to do so after starting to receive the response body. It also randomizes the chunk size and allows for a delay to simulate computation happening while yielding the chunk.

If sequence numbers are trimmed from an index, then it should not be possible to search or sort against the metadata field. Related to elastic#136305

…tic#143743) This fixes debug log messages added in elastic#143567 where the endpoint id is not correctly included.

…coringTests testAccumulateSearchLoad elastic#143750

…-avg-over-time.Avg_over_time_of_double_no_grouping} elastic#143757

…-avg-over-time.Avg_over_time_of_double_no_grouping_promql} elastic#143758

)

ESQL: Add some easy error tests

* Document ES|QL SORT by expression Update SORT command docs to state that sort keys can be any expression (column, function, or arithmetic), not only column names. Add syntax, parameter description, and a 'Sort by expression' example. Align limitations.md wording with expression terminology. Fixes elastic#115011 Made-with: Cursor * Update docs/reference/query-languages/esql/_snippets/commands/layout/sort.md Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com> * Use DATE_EXTRACT for year in SORT expression examples Replace year(date) with DATE_EXTRACT("year", date) so examples match the documented ES|QL date functions. Made-with: Cursor --------- Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>

The `ExpressionEvaluator` interface lived inside of `EvalOperator` and that bothered me. It's used in `FilterOperator` and `StringExtractOperator` and, well, lots of places. This moves it to a top level class. All things used in 123093432143214 places should be. Relates to elastic#100634

…eSerializationTests testSerialization elastic#144079

…eSerializationTests testConcurrentSerialization elastic#144080

…eSerializationTests testConcurrentEquals elastic#144081

Add a new histogram metric `es.repositories.snapshots.shards.queue_time.histogram` that reports how long each shard snapshot spent waiting in queues before its actual operation began. The creation time is stored as a new field. Queue time is computed when the shard is dequeued in `BlobStoreRepository.doSnapshotShard`. This new field is an internal stat and not exposed to end users, i.e. not available in `IndexShardSnapshotStatus.Copy`. Closes ES-14292

…lastic#143995)

…led (elastic#143935) lookupPrimaryTerm is called by FollowingEngine when a duplicate operation is detected on the primary (i.e., an operation with a seq_no that was already processed). It retrieves the primary term of the existing operation on the follower primary shard so that TransportBulkShardOperationsAction can rewrite the duplicate with the correct term before replicating it to followers replicas, ensuring consistency between primary and replicas. But the method currently only fetch the _seq_no indexed with SeqNoIndexOptions.POINTS_AND_DOC_VALUES option, not DOC_VALUES_ONLY. This is now fixed. Also, if the _seq_no cannot be fetched because it has been merged away on the follower primary, the method now returns OptionalLong.empty() as it would for duplicate that are past the global checkpoint. Since _seq_no should be retained for operations beyond the global checkpoint this is not a production issue, rather a safety net to avoid throwing the IllegalStateException. The only test we have is FollowEnginTests.testProcessOnPrimary but making it work for the DOC_VALUES_ONLY is already a pain, even more with DISABLE_SEQUENCE_NUMBERS.

…#144013) This commit adds tests to verify that CCR works correctly with pruned sequence numbers. The test is inspired by SeqNoPruningIT. Relates elastic#136305

* Randomly nullify also in multi-node spec tests * Move child output removal into collectUnresolved * Add more reproducers (original reproducer also got fixed by another change)

…astic#144083)

This test is really only testing the task framework itself. There is also a race condition that is causing sporadic failures. As the test does not really offer much, this commit removes it completely. Closes elastic#137911

* ES|QL: Validate TOP_SNIPPETS query argument is foldable at verification time Implements compile-time validation for the TOP_SNIPPETS function to ensure that its query parameter is foldable (constant). This follows the same pattern as other full-text functions like Match. Changes: - TopSnippets.java: Implements PostOptimizationVerificationAware interface and adds postOptimizationVerification() method using Foldables.resolveTypeQuery() - TopSnippetsValidationTests.java: New unit tests for validation logic - TopSnippetsTests.java: Updated assertion style per code review feedback - LogicalPlanOptimizerTests.java: Added post-optimization validation tests - top-snippets.csv-spec: Added integration tests for foldable query validation - docs/changelog/142462.yaml: Added changelog entry Fixes elastic#142462 * Update docs/changelog/142763.yaml * [CI] Auto commit changes from spotless * [CI] Auto commit changes from spotless * Update 142763.yaml * Delete docs/changelog/142462.yaml * Removed validation test # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/TopSnippetsValidationTests.java * Removed the constantQuery Test * Moved validation to verification level * tests added * modified the test cases * Update TopSnippets.java * [CI] Auto commit changes from spotless * fixed bug * [CI] Auto commit changes from spotless * Update docs/changelog/142763.yaml * Add foldable validation to TOP_SNIPPETS query parameter * [CI] Auto commit changes from spotless * Fix YAML tests * Fix tests * Remove protected access * Use postOptimizationVerification to let optimizer to fold constants before checking for Literals * Move test to optimization tests as folding check happens there --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: carlosdelest <carlos.delgado@elastic.co> Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>

…es (elastic#143944) This updates the Generative IT infra to keep track of fields that come from indices, helpful especially for full-text functions which, for now, can work only on fields that come from real indices. Also, there is some cleanup of errors that shouldn't be valid anymore, as the corresponding issues have been closed. AI-assisted PR.

…e disabled (elastic#144046) We can't fall back to using sequence numbers as a source for random scores when they are disabled, so instead a random score function defined with no seed and no source field will throw an error in this situation.

Fix nullify/load for commands like `STATS VALUES(field) WHERE field IS NOT NULL` where a missing field leads to unresolved references twice for the same field name.

The release-tests CI pipeline (`.buildkite/scripts/release-tests.sh`) runs with `-Dbuild.snapshot=false`, simulating a release build. This causes `Build.current().isSnapshot()` to return `false`, which makes `EsqlConfig.isDevVersion()` return `false`, disabling the `EXTERNAL` command in the ESQL ANTLR grammar (`{this.isDevVersion()}? externalCommand`). The **ndjson** QA module was missing the `enabled = buildParams.snapshotBuild` guard that all other datasource QA modules already had, so its `EXTERNAL`-based tests ran in release-tests and failed with `mismatched input 'EXTERNAL'`. This PR adds the snapshot guard, consistent with csv, parquet, orc, iceberg, grpc, and gcs. Also fixes `StoragePathTests.testFileUriFunction` to be platform-agnostic by dynamically constructing expected URIs instead of hardcoding Unix paths (fixes elastic#143979). Closes elastic#143979

…tion IT (elastic#143885) After elastic#143866, this is a better attempt at fixing those tests. ensureGreen calls the cluster _health API. With providing the index pattern, this returns green when no matching index exists. We actually want to ensure the inference indices are created. Calling the GET _inference API does that. Closes elastic#140849

…astic#143030) This PR fixes a bug where `Fork.withSubPlans()` incorrectly reassigned new `NameId`s to its output attributes, breaking references in the upper plan. This issue specifically manifests when using `FORK` alongside the `SET unmapped_fields="nullify"` mode. By design, a `FORK` assigns new `NameId`s to its output attributes via `refreshOutput()` to decouple them from the internal branches. This isolation is necessary to prevent unintended side effects during plan optimizations, such as aggressive constant folding leaking across branches. However, the previous implementation unconditionally re-minted these `NameId`s every time `withSubPlans()` was called. Because of this, any node sitting above the `FORK` (like `EVAL` or `STATS`) that already held a reference to the initial `NameId`s would suddenly point to a nonexistent ID. Downstream analysis rules would then fail to resolve these orphaned references, causing the plan execution to fail with an *"optimized incorrectly due to missing references"* error. Fixes elastic#142762

adcoelho and others added 30 commits March 6, 2026 09:56

Use templates for native arm and x64 f32 methods (elastic#143684)

e13dfbe

Implement queryDSL range support for exponential histograms (elastic#…

4312134

…143379)

Cleanup ES|QL T-Digest code duplication, add memory accounting (elast…

cea9eb7

…ic#143662)

Disable "Review skipped" comments for PRs without specified labels (e…

e864533

…lastic#143728) Updated `.coderabbit.yml` to add `review_status: false`, preventing unnecessary "Review skipped" comments on pull requests without matching labels.

ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#1…

d19e113

…43601)

Don't run seq_no pruning tests in release CI (elastic#143725)

f25ff47

These tests require a feature flag to be enabled to pass, which is not available in nightly builds.

ES|QL codegen: check builder arguments for vector support (elastic#14…

2ad21ee

…3724)

Track recycler pages in circuit breaker (elastic#143738)

5fdbb49

Integrates `RecyclerBytesStreamOutput` with an optional `CircuitBreaker` that tracks the pages in use as they are obtained from the recycler.

Bar searching or sorting on _seq_no when disabled (elastic#143600)

58250da

If sequence numbers are trimmed from an index, then it should not be possible to search or sort against the metadata field. Related to elastic#136305

Correctly include endpoint id in log msg in AuthorizationPoller (elas…

9db3d75

…tic#143743) This fixes debug log messages added in elastic#143567 where the endpoint id is not correctly included.

ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739)

3c675d5

Mute org.elasticsearch.compute.lucene.query.LuceneTopNSourceOperatorS…

293907d

…coringTests testAccumulateSearchLoad elastic#143750

Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries…

dac2ad2

…-avg-over-time.Avg_over_time_of_double_no_grouping} elastic#143757

Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:k8s-timeseries…

fc7c851

…-avg-over-time.Avg_over_time_of_double_no_grouping_promql} elastic#143758

Add slicing mode and fix operation-level reindex metrics (elastic#143415

1a56af0

)

ESQL: Add some easy error tests (elastic#143535)

69aa419

ESQL: Add some easy error tests

elasticsearchmachine and others added 22 commits March 12, 2026 04:15

Mute org.elasticsearch.xpack.esql.expression.function.vector.Magnitud…

3b8eb8d

…eSerializationTests testSerialization elastic#144079

Mute org.elasticsearch.xpack.esql.expression.function.vector.Magnitud…

5e0abbd

…eSerializationTests testConcurrentSerialization elastic#144080

Mute org.elasticsearch.xpack.esql.expression.function.vector.Magnitud…

a37a996

…eSerializationTests testConcurrentEquals elastic#144081

SQL: better management of allowPartialSearchResults defaults for CPS (e…

cafdc49

…lastic#143995)

[Test] Add CCR integration tests for pruned sequence numbers (elastic…

4e84ac5

…#144013) This commit adds tests to verify that CCR works correctly with pruned sequence numbers. The test is inspired by SeqNoPruningIT. Relates elastic#136305

ESQL: Small nullify fix cleanups (elastic#143998)

0af3229

* Randomly nullify also in multi-node spec tests * Move child output removal into collectUnresolved * Add more reproducers (original reproducer also got fixed by another change)

Restoring docs for REGISTERED_DOMAIN and URI_PARTS ES|QL commands (el…

31cf1d2

…astic#144083)

Add missing synthetic_id: false in yaml tsdb/26 (elastic#144033)

41b4a9f

ES|QL: Fix MagnitudeSerializationTests release tests (elastic#144088)

1d95a03

ESQL: Fix nullify/load for STATS with WHERE (elastic#144029)

af52ceb

Fix nullify/load for commands like `STATS VALUES(field) WHERE field IS NOT NULL` where a missing field leads to unresolved references twice for the same field name.

Drop non-finite samples in Prometheus remote write (elastic#144055)

0f956c1

Refactoring UserAgentPlugin (elastic#140712)

c409f7d

Fix MMR release test using subqueries (elastic#144087)

4a1f52c

kanoshiou requested review from a team as code owners March 13, 2026 14:18

kanoshiou mentioned this pull request Mar 13, 2026

ESQL: Fix incorrectly optimized fork with nullify unmapped_fields #143030

Merged

elasticsearchmachine added v9.3.3 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 13, 2026

kanoshiou closed this Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[9.3] ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (#143030)#144200

[9.3] ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (#143030)#144200
kanoshiou wants to merge 2308 commits intoelastic:9.3from
kanoshiou:backport/9.3/pr-143030

kanoshiou commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

kanoshiou commented Mar 13, 2026

Backport

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants