Skip to content

ESQL: Prevent pushdown of unmapped fields in filters and sorts#143460

Merged
GalLalouche merged 54 commits intoelastic:mainfrom
GalLalouche:prevent_pushdown_cursor
Mar 6, 2026
Merged

ESQL: Prevent pushdown of unmapped fields in filters and sorts#143460
GalLalouche merged 54 commits intoelastic:mainfrom
GalLalouche:prevent_pushdown_cursor

Conversation

@GalLalouche
Copy link
Copy Markdown
Contributor

@GalLalouche GalLalouche commented Mar 3, 2026

This PR fixes a bug in the behavior of SET UNMAPPED_FIELDS=LOAD where a potentially unmapped field would be pushed down to Lucene, and, when it doesn't exist in the mapping, would cause wrong results.

Resolves: #141920, #141925.

GalLalouche and others added 5 commits March 3, 2026 12:10
PotentiallyUnmappedKeywordEsField (used for fields loaded from _source
when unmapped_fields="load") had isAggregatable=true, which caused
isPushableFieldAttribute to short-circuit past the SearchStats check and
push filters down to Lucene. On shards where the field is not indexed,
the Lucene query returns no results instead of letting the compute
engine evaluate the filter on _source-loaded values.

Guard isPushableFieldAttribute against PotentiallyUnmappedKeywordEsField
so these fields are always filtered in the compute engine.

Co-authored-by: Cursor <cursoragent@cursor.com>
The isPushableFieldAttribute fix (rejecting PotentiallyUnmappedKeywordEsField)
already covers sort pushdown since PushTopNToSource uses the same method.
This test verifies correct sort order when a field is mapped in one index
but unmapped in another under unmapped_fields="load".

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove _index from the SORT clause — it prevented the entire sort from
being pushed to Lucene (canPushDownOrders requires all fields pushable),
masking the bug. With only SORT message, the sort is pushed down and
produces wrong order on the unmapped shard without the fix.

Co-authored-by: Cursor <cursoragent@cursor.com>
Capture single-index unmapped behavior for nullify/load and switch GoldenTestCase to read unmapped settings from parsed statements so SET-based golden tests run like type_conflicts.

Co-authored-by: Cursor <cursoragent@cursor.com>
Made-with: Cursor
@GalLalouche GalLalouche added >feature >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Mar 3, 2026
@GalLalouche GalLalouche requested a review from alex-spies March 3, 2026 12:06
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @GalLalouche, I've created a changelog YAML for you.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some minor test suggestions (and rebasing on the other PR that needs to precede this).

Comment on lines +734 to +736
required_capability: unmapped_fields
required_capability: optional_fields
required_capability: field_alias_support
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One capability is probably enough, and it should be a new capability added in this PR to disable bwc tests with earlier versions that can't properly run the added tests. (Serverless + 9.3 bwc tests should fail.)

OPTIONAL_FIELDS_V2, for instance.

fieldAliasAndNonExistent
required_capability: unmapped_fields
required_capability: optional_fields
required_capability: field_alias_support
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think field_alias_support is a hallucination, which means this test doesn't actually run at all.

I thought that @idegtiarenko put in a check that prevents using non-existant caps? Please double check that :)

FROM sample_data, no_mapping_sample_data METADATA _index
| KEEP _index, message
| WHERE message == "Connection error?"
| SORT _index
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly irrelevant sorting, but not wrong

SET unmapped_fields="load"\;
FROM sample_data, no_mapping_sample_data METADATA _index
| KEEP _index, message
| SORT message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For good measure, let's maybe add a test that sorts by multiple fields, and expressions based on unmapped fields. Also a case where an unmapped/partially mapped field is cast to a long or so.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The location of the query.esql is a little inconsistent. Let's make it consistent :) There should be a separate query.esql for nullify with a separate SET as well.

@@ -0,0 +1,4 @@
SET unmapped_fields="load"; FROM sample_data
| KEEP message, does_not_exist
| WHERE does_not_exist::KEYWORD == "Connection error?"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see: a case where a pushable and non-pushable (because unmapped) expression are conjuncted.

Similarly for the expressions in the SORT.

Comment on lines +6 to +7
\_TopNExec[[Order[does_not_exist{r}#1,ASC,LAST]],5[INTEGER],70]
\_EvalExec[[null[NULL] AS does_not_exist#1]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopN by a constant can be optimized to a limit, but that's a different issue.

@GalLalouche, could you please open an issue and track it internally so we consider this optimization before nullify's GA?

Comment on lines +2 to +3
| KEEP message
| WHERE message == "Connection error?"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a version of this that just does the filtering, no sort.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, tests look very nice, but we can try and add a couple more funky cases. E.g. more complex filters (also disjunctions with mixed pushable/non-pushable), more complex sorts; even moreso than in the spec tests.

GalLalouche and others added 13 commits March 3, 2026 20:52
- Add OPTIONAL_FIELDS_V2 capability for pushdown-elimination BWC.
- Remove fieldAliasAndNonExistent test; fix capabilities on pushdown tests.
- Remove SORT from filterOnPartiallyUnmappedField; add fully unmapped
  multi-sort, and cast tests in unmapped-load.csv-spec.
- Add comment in LucenePushdownPredicates for PotentiallyUnmappedKeywordEsField.
- Add cross-refs between PushdownGoldenTests and LocalPhysicalPlanOptimizerTests.
- Fix assumeTrue: use OPTIONAL_FIELDS_V2 for load, remove redundant nullify.
- Write query.esql to nested path in GoldenTestCase. This was actually a
  pretty "noisy" change, since it also affect
  SubstituteRoundToGoldenTests!
- Add filter-only, conjunction pushable/non-pushable golden tests.

Made-with: Cursor
Datasource tests fail on Windows CI and FIPS CI builds due to two
independent issues introduced with the external sources feature.

**Windows:** `StoragePath.of()` cannot parse `file://` URIs with
Windows drive letters. A path like `file://C:\bk\path\file.txt`
causes the colon after the drive letter `C` to be misinterpreted
as a port separator, resulting in `NumberFormatException`. Both the
production code (`LocalStorageProvider.toStoragePath()`) and the tests
construct file URIs via manual string concatenation instead of using
the existing `StoragePath.fileUri()` helper that normalizes Windows
paths correctly.

**FIPS:** `ExternalDistributedSpecIT` starts a test cluster with
`xpack.security.enabled=false`, but FIPS mode requires security to
be enabled. The Elasticsearch process dies during startup before any
test method runs. Since the test relies on plain HTTP S3 fixtures
that are inherently incompatible with FIPS, the test is now skipped
in FIPS mode.

Developed using AI-assisted tooling
…l sources (elastic#143420)

* ESQL: Add extended distribution tests and fault injection for external sources

Extends the test coverage for external source distributed execution with
property tests for weighted round-robin and coalescing, and adds fault
injection infrastructure for testing resilience under storage failures.

- ExtendedDistributionPropertyTests: weighted RR load balancing bounds,
  coalescing preservation, coalesced+distribution integration
- FaultInjectionRetryTests: retry policy behavior under transient and
  persistent fault patterns (503, connection reset, timeout)
- FaultInjectingS3HttpHandler: configurable S3 fault injection with
  countdown, path filtering, and auto-clearing
- FaultInjectingS3HttpHandlerIT: real HTTP server tests for the handler

Developed using AI-assisted tooling

* Update docs/changelog/143420.yaml
* Fix CSV-escaped quotes rendering in generated docs examples

DocsV3Support.renderTableLine() now unescapes RFC 4180 CSV quoting
(strips outer quote delimiters and replaces "" with ") so that JSON
strings in function example tables render correctly — e.g.
{"key":"value"} instead of {""key"":""value""}.

Affects json_extract and to_tdigest doc examples.

* Refine CSV unescaping to only unescape RFC 4180 doubled quotes; add tests

The previous approach stripped outer quotes from ALL quoted values, breaking
simple quoted values like "POINT(...)" and "foo". Now only cells with actual
RFC 4180 doubled-quote escaping ("") are unescaped, leaving simple quoted
values unchanged.

Added tests: testRenderingExampleResultCsvJsonUnescaping verifies JSON
unescaping works, testRenderingExampleResultSimpleQuotesPreserved verifies
simple quoted values are not modified.

Also adds changelog YAML for the PR.
…stic#143402)

* Fix ES|QL function list versioning metadata

Audit all _snippets/lists/ files against Java @FunctionAppliesTo
annotations. Adds missing applies_to tags, corrects wrong versions,
and applies cumulative preview→ga tags where functions graduated.

Also adds missing applies_to front matter to
time-series-aggregation-functions.md landing page.

* update tags on commands lists
Stopping the transform at the end of test, before the reset, can help
other tests running in parallel.

Resolve elastic#122980
Update the pattern_text downgrade test so that it includes adding docs and querying for docs.\
Specifically the test now does the following:
  1. Create data stream with trial license and pattern text field
  2. Index docs, verify they're searchable.
  3. Downgrade to basic license.
  4. Index more docs in same backing index, verify all docs searchable, verify settings unchanged.
  5. Rollover the data stream, verify the new backing index has disable_templating=true.
  6. Index more docs into the new backing index, verify all docs searchable across both indices.
  7. Search with "fields": ["pattern_field"] to verify the valueFetcher() code path works across both backing indices.
@GalLalouche GalLalouche enabled auto-merge (squash) March 3, 2026 22:31
@GalLalouche GalLalouche disabled auto-merge March 4, 2026 08:10
@GalLalouche GalLalouche enabled auto-merge (squash) March 4, 2026 16:01
@GalLalouche GalLalouche disabled auto-merge March 4, 2026 22:01
@GalLalouche GalLalouche force-pushed the prevent_pushdown_cursor branch from 30a7824 to 1db923f Compare March 5, 2026 10:18
@GalLalouche GalLalouche enabled auto-merge (squash) March 5, 2026 10:20
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: accb0fd0-7851-4a04-8b29-55e887ab8ae1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@GalLalouche GalLalouche merged commit 02fb937 into elastic:main Mar 6, 2026
36 checks passed
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
…ic#143460)

This PR fixes a bug in the behavior of SET UNMAPPED_FIELDS=LOAD where a potentially unmapped field would be pushed down to Lucene, and, when it doesn't exist in the mapping, would cause wrong results.

Resolves: elastic#141920, elastic#141925.
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 6, 2026
…locations

* upstream/main: (153 commits)
  ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739)
  Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743)
  Bar searching or sorting on _seq_no when disabled (elastic#143600)
  Generalize `testClientCancellation` test (elastic#143586)
  JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702)
  Track recycler pages in circuit breaker (elastic#143738)
  [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696)
  Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673)
  [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703)
  Fix flaky MMR diversification YAML tests (elastic#143706)
  ES|QL codegen: check builder arguments for vector support (elastic#143724)
  Add Views Security Model (elastic#141050)
  ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460)
  Don't run seq_no pruning tests in release CI (elastic#143725)
  ESQL: Support intra-row field references in ROW command (elastic#140217)
  ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601)
  IndexRoutingTests with and without synthetic id (elastic#143566)
  Synthetic id upgrade test in serverless (elastic#142471)
  Disable "Review skipped" comments for PRs without specified labels (elastic#143728)
  Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
…ic#143460)

This PR fixes a bug in the behavior of SET UNMAPPED_FIELDS=LOAD where a potentially unmapped field would be pushed down to Lucene, and, when it doesn't exist in the mapping, would cause wrong results.

Resolves: elastic#141920, elastic#141925.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: unmapped_fields="load" can lead to 0 pages emitted