Skip to content

ES|QL: Remove implicit limit in FORK branches in CSV tests#143601

Merged
ioanatia merged 5 commits intoelastic:mainfrom
ioanatia:random_pragma_fork
Mar 6, 2026
Merged

ES|QL: Remove implicit limit in FORK branches in CSV tests#143601
ioanatia merged 5 commits intoelastic:mainfrom
ioanatia:random_pragma_fork

Conversation

@ioanatia
Copy link
Copy Markdown
Member

@ioanatia ioanatia commented Mar 4, 2026

We only remove the implicit limit we add to FORK branches, when the tests are running under snapshot (with a 50% chance, since we randomize this).
The more interesting case is when we running GenerativeForkIT which reuses all the existing CSV tests to generate new queries using FORK.

When I removed the implicit limit, I noticed that GenerativeForkIT started failing with unbounded SORT errors.
For example:

FROM semantic_text
| KEEP semantic_text_field
| sort semantic_text_field asc;

will get transformed into:

FROM semantic_text
| KEEP semantic_text_field
| sort semantic_text_field asc;
| FORK (WHERE true) (WHERE true)
| WHERE _fork == \"fork1\"
| DROP _fork 

This query would then fail because the SORT is not bounded by a LIMIT and we cannot execute it.

I had to adjust GenerativeForkRestTest to make sure a LIMIT is added immediately after FORK, so the query becomes:

FROM semantic_text
| KEEP semantic_text_field
| sort semantic_text_field asc;
| FORK (WHERE true) (WHERE true)
| LIMIT 300
| WHERE _fork == \"fork1\"
| DROP _fork 

At the moment we don't push filters that are present after FORK into FORK branches (we should), so the implicit LIMIT we add at the end of each ES|QL query does not get to be pushed down into FORK branches.
Adding a LIMIT after FORK, ensured our optimizations will push the LIMIT into FORK branches.
I also had to adjust our existing optimizations, to recognize that when we have an unbounded SORT in the FORK branches, and we have a LIMIT (or a SORT + LIMIT) following FORK, we should still do a push down into the FORK branches, even if another pipeline breaker is present in the FORK branch.

@ioanatia ioanatia added >non-issue Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL labels Mar 4, 2026
if (preference != null) {
pragmaBuilder.put(QueryPragmas.FIELD_EXTRACT_PREFERENCE.getKey(), preference.toString()).build();
}
if (randomBoolean()) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this check here, because we already use randomBoolean in addRandomPragma

@ioanatia ioanatia marked this pull request as ready for review March 5, 2026 18:30
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Copy Markdown
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for explaining the problem so thoroughly 👍

@ioanatia ioanatia merged commit d19e113 into elastic:main Mar 6, 2026
35 checks passed
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 6, 2026
…locations

* upstream/main: (153 commits)
  ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739)
  Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743)
  Bar searching or sorting on _seq_no when disabled (elastic#143600)
  Generalize `testClientCancellation` test (elastic#143586)
  JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702)
  Track recycler pages in circuit breaker (elastic#143738)
  [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696)
  Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673)
  [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703)
  Fix flaky MMR diversification YAML tests (elastic#143706)
  ES|QL codegen: check builder arguments for vector support (elastic#143724)
  Add Views Security Model (elastic#141050)
  ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460)
  Don't run seq_no pruning tests in release CI (elastic#143725)
  ESQL: Support intra-row field references in ROW command (elastic#140217)
  ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601)
  IndexRoutingTests with and without synthetic id (elastic#143566)
  Synthetic id upgrade test in serverless (elastic#142471)
  Disable "Review skipped" comments for PRs without specified labels (elastic#143728)
  Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants