Skip to content

ESQL: Support intra-row field references in ROW command#140217

Merged
elasticsearchmachine merged 38 commits intoelastic:mainfrom
kanoshiou:esql-row-command-references
Mar 6, 2026
Merged

ESQL: Support intra-row field references in ROW command#140217
elasticsearchmachine merged 38 commits intoelastic:mainfrom
kanoshiou:esql-row-command-references

Conversation

@kanoshiou
Copy link
Copy Markdown
Contributor

Summary

This PR fixes an issue where valid ESQL ROW commands failed when a field referenced a previously defined field within the same command (e.g., ROW x = 4, y = 2, z = x + y).

Changes

  • LogicalPlanBuilder: Removed the call to mergeOutputExpressions in visitRowCommand. We now intentionally defer duplicate field handling and reference resolution to the Analyzer.
  • AnalyzerRules: Updated ResolveRefs to specifically not skip Row nodes, even if they flag as resolved() (which happens when they only contain literals).
  • Analyzer: Implemented resolveRow within the ResolveRefs rule. This logic mimics resolveEval but for literals:
    • It iterates through the fields sequentially.
    • It allows later fields to resolve references to earlier fields by substituting them with their defined expressions.
    • It handles field shadowing by removing earlier definitions if a duplicate name is encountered.

Related Issue

Closes #140119

This commit fixes an issue where the ROW command failed to resolve references
to fields defined earlier in the same command (e.g., ROW x=1, y=x+1).
Previously, ROW commands containing only literals were considered "resolved"
immediately after parsing, which caused the Analyzer to skip reference
resolution. This change:
1. Modifies LogicalPlanBuilder to avoid pre-merging output expressions,
   deferring resolution and duplicate handling to the Analyzer.
2. Updates AnalyzerRules to ensure ResolveRefs always processes Row plans,
   even if they appear resolved.
3. Implements a dedicated resolveRow method in the Analyzer. This method
   iterates through fields sequentially, maintaining a context of defined
   expressions to support forward references and correctly handling field
   shadowing (last write wins).
@elasticsearchmachine elasticsearchmachine added v9.4.0 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Jan 6, 2026
Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping into this! Looks good to me, but I'll ask first somebody else with more expertise to take a look at this code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we have a failing test like: ROW a = b + c, b = 1, c = 2

@ivancea ivancea added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL >feature and removed needs:triage Requires assignment of a team area label labels Jan 7, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@kanoshiou
Copy link
Copy Markdown
Contributor Author

Thank you for your review, @ivancea! Failing test has now been added!

In 9c70b68, I have refactored the redundant logic from resolveEval and resolveRow. I also implemented an optimization to expand aliases directly—for example, transforming eval a = 1, b = a + 1 into eval a = 1, b = 1 + 1.

However, this change requires updates to a large number of tests. While some of the resulting optimized plans look correct, others are less intuitive. I am unsure if I should restrict this logic's scope to the ROW command only; I would appreciate some guidance on this.

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@ivancea ivancea self-assigned this Feb 10, 2026
Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, sorry for the delay! We had a discussion about this issue, and concluded that it's worth doing (There was a parallel initiative regarding multi-ROWs in process).

I see the extra pieces, like column pruning, cause some test changes and side-effects. Even if they would be good to have, I wouldn't add them in this PR.

Ideally, this PR should affect only ROW. Having other plan changes may lead to the subtle introduction of bugs or unexpected cases, plus a more complex review process. Better to do them in continuation PRs unless they're needed for this functionality

assertTrue(e.getMessage().startsWith("Found "));
assertEquals("1:43: argument of [x::date_period] must be a constant, received [x]", e.getMessage().substring(header.length()));
assertEquals(
"1:23: argument of [x::date_period] must be a constant, received [keyword]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not correct as 1:23 corresponds to keyword, not x::date_period. The error is in 1:43, as it was before.
Whatever changes we make to the plan, the errors should always correctly point to the user-provided query, and the line/column numbers must be coherent with the "received" text

Comment on lines +33 to +34
// When Row has no fields, we still need 1 row for downstream Eval operations to work on
return new LocalRelation(row.source(), row.output(), LocalSupplier.of(blocks.length == 0 ? new Page(1) : new Page(blocks)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required for this PR's changes? I fear this could have further implications, or some untested behaviour

@kanoshiou
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback, @ivancea!

I’ve narrowed the scope to intra-row references as suggested. I personally feel the extra functionality is a very useful 'nice-to-have' for the ROW command, but I agree it's better to keep this PR clean. I’ve pushed the updates for your review!

@ivancea
Copy link
Copy Markdown
Contributor

ivancea commented Feb 12, 2026

@elasticmachine test this

Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Some minor test nits and one "major" topic before merging

newFields.add(result);

if (result.resolved()) {
fieldExpressions.put(result.name(), result.child());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, for ROW a = FN(1), b = a, the UnresolvedAttribute a is replaced with the actual expression FN(1). The unexpected side-effect of this is non-deterministic functions making a and b now having different values.

I'm not aware of existing public non-deterministic functions, but for example we have the Random, which is for internal use only, and would probably make this "problem" visible.

Now, I'm not sure if this is worth worrying now, or if we already have this problem in EVAL or in other planning steps. We could fold the values here before inserting them in the map. I guess that would add quite some complexity though. WDYT @astefan?

If we decide to go with it, we could reuse the Random() function in an AnalyzerTest, probably

@astefan astefan requested review from ivancea and removed request for alex-spies March 4, 2026 08:06
@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 4, 2026

buildkite test this

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 4, 2026

@ivancea that was a good point about references, and the idea of using random() was brilliant. I added few tests that use that internal function to double check the logic is sound. Thank you!

@kanoshiou apologies for the initial push that discarded your solution. After @ivancea mentioned references I realized that was the better solution. I have brought back most of the initial solution you proposed, cleaned up the code a bit and added more tests (both unit tests and IT tests).

My solution was failing for a new tests that I also added: ROW a = [1, 2], b = 3, c = b + a, a = b + a. It doesn't matter that a is multi-valued, what mattered was that the original deduplication logic I had (in resolveRow) removes the alias for the first a ([1,2]), but c and the second a still reference it via ReferenceAttribute. After deduplication, those references become dangling: PropagateEvalFoldables can't find the original alias to resolve them, so ReplaceRowAsLocalRelation fails when trying to fold. This meant that deduplication has to happen somewhere else - Row.output() and in ReplaceRowAsLocalRelation. In a nutshell, do the deduplication after the Analyzer finishes its job.

For ROW a = [1, 2], b = 3, c = b + a, a = b + a after PropagateEvalFoldables resolves all refs and ConstantFolding folds everything, all four fields are literals. ReplaceRowAsLocalRelation folds all four, then collects only the three output values [b, c, second a].

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 4, 2026

buildkite test this

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 4, 2026

buildkite test this

Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random() cases added to Analyzer tests check structure, but not results. Can you add a test like the one on AnalyzerTests, here, so it's resolved to a LocalRelation, and we can assert that, for ROW a = random(10000), b = a, both a and b have the same value?

Not sure how complicated this will be, as random isn't foldable afaik. Honestly, given this fact, it may be a bad function to try, as I suppose ROW will fail trying to fold it? If that's the case, ignore this, I think we're good at least until another function like that appears

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @ivancea

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 6, 2026

buildkite test this

@astefan astefan added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 6, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: f274cfb2-dba1-4eaa-af77-a053f5e31812

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@astefan
Copy link
Copy Markdown
Contributor

astefan commented Mar 6, 2026

buildkite test this

@elasticsearchmachine elasticsearchmachine merged commit 1aa1875 into elastic:main Mar 6, 2026
38 checks passed
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
## Summary This PR fixes an issue where valid ESQL `ROW` commands failed
when a field referenced a previously defined field within the same
command (e.g., `ROW x = 4, y = 2, z = x + y`).

## Changes - **LogicalPlanBuilder**: Removed the call to
`mergeOutputExpressions` in `visitRowCommand`. We now intentionally
defer duplicate field handling and reference resolution to the Analyzer.
- **AnalyzerRules**: Updated `ResolveRefs` to specifically **not** skip
`Row` nodes, even if they flag as `resolved()` (which happens when they
only contain literals). - **Analyzer**: Implemented `resolveRow` within
the `ResolveRefs` rule. This logic mimics `resolveEval` but for
literals:     - It iterates through the fields sequentially.     - It
allows later fields to resolve references to earlier fields by
substituting them with their defined expressions.     - It handles field
shadowing by removing earlier definitions if a duplicate name is
encountered.

## Related Issue Closes elastic#140119
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 6, 2026
…locations

* upstream/main: (153 commits)
  ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739)
  Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743)
  Bar searching or sorting on _seq_no when disabled (elastic#143600)
  Generalize `testClientCancellation` test (elastic#143586)
  JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702)
  Track recycler pages in circuit breaker (elastic#143738)
  [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696)
  Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673)
  [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703)
  Fix flaky MMR diversification YAML tests (elastic#143706)
  ES|QL codegen: check builder arguments for vector support (elastic#143724)
  Add Views Security Model (elastic#141050)
  ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460)
  Don't run seq_no pruning tests in release CI (elastic#143725)
  ESQL: Support intra-row field references in ROW command (elastic#140217)
  ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601)
  IndexRoutingTests with and without synthetic id (elastic#143566)
  Synthetic id upgrade test in serverless (elastic#142471)
  Disable "Review skipped" comments for PRs without specified labels (elastic#143728)
  Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
## Summary This PR fixes an issue where valid ESQL `ROW` commands failed
when a field referenced a previously defined field within the same
command (e.g., `ROW x = 4, y = 2, z = x + y`).

## Changes - **LogicalPlanBuilder**: Removed the call to
`mergeOutputExpressions` in `visitRowCommand`. We now intentionally
defer duplicate field handling and reference resolution to the Analyzer.
- **AnalyzerRules**: Updated `ResolveRefs` to specifically **not** skip
`Row` nodes, even if they flag as `resolved()` (which happens when they
only contain literals). - **Analyzer**: Implemented `resolveRow` within
the `ResolveRefs` rule. This logic mimics `resolveEval` but for
literals:     - It iterates through the fields sequentially.     - It
allows later fields to resolve references to earlier fields by
substituting them with their defined expressions.     - It handles field
shadowing by removing earlier definitions if a duplicate name is
encountered.

## Related Issue Closes elastic#140119
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: ROW not allowing using of previously-declared fields

5 participants