Skip to content

ESQL: Add generative tests for LIMIT BY#144238

Merged
ivancea merged 14 commits intoelastic:mainfrom
ivancea:esql-limit-by-generative-tests
Apr 8, 2026
Merged

ESQL: Add generative tests for LIMIT BY#144238
ivancea merged 14 commits intoelastic:mainfrom
ivancea:esql-limit-by-generative-tests

Conversation

@ivancea
Copy link
Copy Markdown
Contributor

@ivancea ivancea commented Mar 13, 2026

Generative tests generators for LIMIT BY, which was added in #144069 and #144279

@ivancea ivancea added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Mar 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds generative test support for the new LIMIT BY command in ESQL, which retains at most N rows per group defined by grouping key expressions.

Changes:

  • New LimitByGenerator that generates random LIMIT N BY expr1, expr2, ... commands and validates output row counts per group
  • Registers LimitByGenerator in EsqlQueryGenerator's pipe command list
  • Blocks full-text functions after LIMIT BY (same as after LIMIT)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
LimitByGenerator.java New generator for LIMIT BY commands with per-group row count validation
EsqlQueryGenerator.java Registers LimitByGenerator in the pipe commands list
FullTextFunctionGenerator.java Disallows full-text functions after a LIMIT BY command

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
) {
int limit = (int) commandDescription.context().get(LIMIT);

if (output.size() > previousOutput.size()) {
Copy link
Copy Markdown
Contributor

@luigidellaquila luigidellaquila Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is too strict in general.
Consider a query like following:

FROM idx
| LIMIT 10
| WHERE foo > 5
| LIMIT BY bar

The result of the first LIMIT is completely non-deterministic, so the result of the WHERE filtering (also the number of returned records) could be different at every run.
It's not a problem with LIMIT BY, it's just that ES|QL without a sort is not fully deterministic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the checks are correct as they're very lenient. They check that:

  • LIMIT BY does not produce more rows than the previous one
  • There's no more than rows per group

In neither case it's really checking values or strange expectations

Copy link
Copy Markdown
Contributor

@luigidellaquila luigidellaquila Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is LIMIT BY does not produce more rows than the previous one.
I'll give you a practical example, bear with me:


dataset:

a     |  b
------------
foo   |  1
foo   |  2
bar   |  3
bar   |  4

previous:

FROM idx
| LIMIT 2
| WHERE b > 2

current:

FROM idx
| LIMIT 2
| WHERE b > 2
| LIMIT 2 BY a

Execution of previous

FROM idx:

a     |  b
------------
foo   |  1
foo   |  2
bar   |  3
bar   |  4

so

FROM idx | LIMIT 2:

a     |  b
------------
foo   |  1
foo   |  2

so

FROM idx | LIMIT 2 | WHERE b > 2:

a     |  b
------------        
                              <----- no records

👆 this is previousOutput


Execution of current

FROM idx:

a     |  b
------------
bar   |  3
bar   |  4               <----- different execution, different order
foo   |  1
foo   |  2

so

FROM idx | LIMIT 2:

a     |  b
------------
bar   |  3
bar   |  4   

so

FROM idx | LIMIT 2 | WHERE b > 2:

a     |  b
------------ 
bar   |  3
bar   |  4          <---------  we have two records now!

so

FROM idx | LIMIT 2 | WHERE b > 2 | LIMIT 2 BY a:

a     |  b
------------ 
bar   |  3
bar   |  4          <---------  we still have two records!

👆 this is output


Conclusion:

  • previousOutput.size() is 0
  • output.size() is 2

but the results are formally correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my. Fixing it, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! We'll ask for another review when the planning logic gets merged

QuerySchema schema,
QueryExecutor executor
) {
if (previousCommands.stream().anyMatch(cmd -> cmd.commandName().equals(SortGenerator.SORT))) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this after merging TOP-N BY

@ivancea ivancea marked this pull request as ready for review April 6, 2026 12:11
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds generative-query support for the new ESQL LIMIT … BY … command in the QA random query generator, including validation that the output respects per-group limits, and updates a few generator internals to integrate the new command and normalize multi-word command rendering.

Changes:

  • Add a new LimitByGenerator that generates LIMIT N BY … pipelines and validates per-group row counts.
  • Register LimitByGenerator in the pipeline command list and disallow full-text expressions after LIMIT BY.
  • Normalize StatsGenerator command rendering for underscore-based command names and rename inline-stats’ command name constant.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/function/FullTextFunctionGenerator.java Disallows full-text expressions after limit_by (and refactors comparisons to use command constants).
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/EsqlQueryGenerator.java Adds LimitByGenerator to the set of pipe commands used by the random pipeline generator.
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/StatsGenerator.java Renders underscore command names as spaced command text when building the query string.
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/LimitByGenerator.java New generator + validator for LIMIT … BY … behavior.
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/InlineStatsGenerator.java Changes inline-stats command identifier to inline_stats (with rendering handled by StatsGenerator).
Comments suppressed due to low confidence (1)

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/generator/command/pipe/InlineStatsGenerator.java:22

  • INLINE_STATS was changed from "inline stats" to "inline_stats" (commandName), but there is existing QA logic that switches on command.commandName() using the spaced form (e.g. GenerativeRestTest.updateIndexMapped has case "stats", "inline stats"). With this change, inline stats commands will no longer hit that branch and indexMapped propagation for inline stats will be wrong. Consider updating those consumers to accept "inline_stats" (or better, reference InlineStatsGenerator.INLINE_STATS / normalize by replacing '_' with ' ' consistently).
public class InlineStatsGenerator extends StatsGenerator {
    public static final String INLINE_STATS = "inline_stats";
    public static final CommandGenerator INSTANCE = new InlineStatsGenerator();

    @Override
    public String commandName() {
        return INLINE_STATS;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left minor comments only

int limit = (int) commandDescription.context().get(LIMIT_CONTEXT);

if (limit == 0 && output.isEmpty() == false) {
return new ValidationResult(false, "LIMIT 0 BY should return no rows, got [" + output.size() + "]");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not checking expectSameColumns in this case. Should we?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of them fail, we'll report a failure. We can check both things and make a composed message, but I think it's not worth it here (?)

@ivancea ivancea enabled auto-merge (squash) April 7, 2026 11:19
@ivancea ivancea merged commit fe9239c into elastic:main Apr 8, 2026
35 checks passed
@ivancea ivancea deleted the esql-limit-by-generative-tests branch April 8, 2026 14:36
szybia added a commit to szybia/elasticsearch that referenced this pull request Apr 8, 2026
* upstream/main:
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.FirstDocIdGroupingAggregatorFunctionTests testSimple elastic#145923
  Reindex relocation: store source TaskResult at destination node (elastic#145488)
  Bump versions after 9.2.8 release
  [CI] DLMFrozenTransitionServiceTests testCheckForFrozenIndicesReturnsEarlyWhenCapacityExhausted failing [elastic#145778] (elastic#145906)
  Update branches.json for 9.2.8 release
  ESQL: Clarify inheriting from Attributes (elastic#145898)
  Bump versions after 9.3.3 release
  Update branches.json for 9.3.3 release
  Prune changelogs after 8.19.14 release
  Bump versions after 8.19.14 release
  Update branches.json for 8.19.14 release
  [ML] Call old inference API (elastic#145690)
  ESQL: Unmute CsvIT sumWithOverflowRow (elastic#145893)
  Index a document when testing runtime fields shadowing dimensions & metrics (elastic#145882)
  [TEST] Fix version check in testSequenceNumbersDisabled (elastic#145879)
  [ESQL] Per-file filter pushdown awareness (elastic#145755)
  Unmute testGetReindexFollowsRelocation (elastic#145841)
  Correctly ignore system indices when validating dot-prefixed indices (elastic#128868)
  [Transform] Remove tests for deleted code (elastic#145685)
  ESQL: Add generative tests for LIMIT BY (elastic#144238)
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
Generative tests generators for LIMIT BY, which was added in elastic#144069 and elastic#144279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants