ESQL: Support intra-row field references in ROW command#140217
ESQL: Support intra-row field references in ROW command#140217elasticsearchmachine merged 38 commits intoelastic:mainfrom
Conversation
This commit fixes an issue where the ROW command failed to resolve references to fields defined earlier in the same command (e.g., ROW x=1, y=x+1). Previously, ROW commands containing only literals were considered "resolved" immediately after parsing, which caused the Analyzer to skip reference resolution. This change: 1. Modifies LogicalPlanBuilder to avoid pre-merging output expressions, deferring resolution and duplicate handling to the Analyzer. 2. Updates AnalyzerRules to ensure ResolveRefs always processes Row plans, even if they appear resolved. 3. Implements a dedicated resolveRow method in the Analyzer. This method iterates through fields sequentially, maintaining a context of defined expressions to support forward references and correctly handling field shadowing (last write wins).
ivancea
left a comment
There was a problem hiding this comment.
Thanks for jumping into this! Looks good to me, but I'll ask first somebody else with more expertise to take a look at this code
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Also, can we have a failing test like: ROW a = b + c, b = 1, c = 2
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Thank you for your review, @ivancea! Failing test has now been added! In 9c70b68, I have refactored the redundant logic from However, this change requires updates to a large number of tests. While some of the resulting optimized plans look correct, others are less intuitive. I am unsure if I should restrict this logic's scope to the |
# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
ivancea
left a comment
There was a problem hiding this comment.
Again, sorry for the delay! We had a discussion about this issue, and concluded that it's worth doing (There was a parallel initiative regarding multi-ROWs in process).
I see the extra pieces, like column pruning, cause some test changes and side-effects. Even if they would be good to have, I wouldn't add them in this PR.
Ideally, this PR should affect only ROW. Having other plan changes may lead to the subtle introduction of bugs or unexpected cases, plus a more complex review process. Better to do them in continuation PRs unless they're needed for this functionality
| assertTrue(e.getMessage().startsWith("Found ")); | ||
| assertEquals("1:43: argument of [x::date_period] must be a constant, received [x]", e.getMessage().substring(header.length())); | ||
| assertEquals( | ||
| "1:23: argument of [x::date_period] must be a constant, received [keyword]", |
There was a problem hiding this comment.
This change is not correct as 1:23 corresponds to keyword, not x::date_period. The error is in 1:43, as it was before.
Whatever changes we make to the plan, the errors should always correctly point to the user-provided query, and the line/column numbers must be coherent with the "received" text
| // When Row has no fields, we still need 1 row for downstream Eval operations to work on | ||
| return new LocalRelation(row.source(), row.output(), LocalSupplier.of(blocks.length == 0 ? new Page(1) : new Page(blocks))); |
There was a problem hiding this comment.
Is this required for this PR's changes? I fear this could have further implications, or some untested behaviour
...c/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateEvalFoldables.java
Show resolved
Hide resolved
|
Thanks for the feedback, @ivancea! I’ve narrowed the scope to intra-row references as suggested. I personally feel the extra functionality is a very useful 'nice-to-have' for the ROW command, but I agree it's better to keep this PR clean. I’ve pushed the updates for your review! |
|
@elasticmachine test this |
ivancea
left a comment
There was a problem hiding this comment.
LGTM! Some minor test nits and one "major" topic before merging
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
| newFields.add(result); | ||
|
|
||
| if (result.resolved()) { | ||
| fieldExpressions.put(result.name(), result.child()); |
There was a problem hiding this comment.
In the current implementation, for ROW a = FN(1), b = a, the UnresolvedAttribute a is replaced with the actual expression FN(1). The unexpected side-effect of this is non-deterministic functions making a and b now having different values.
I'm not aware of existing public non-deterministic functions, but for example we have the Random, which is for internal use only, and would probably make this "problem" visible.
Now, I'm not sure if this is worth worrying now, or if we already have this problem in EVAL or in other planning steps. We could fold the values here before inserting them in the map. I guess that would add quite some complexity though. WDYT @astefan?
If we decide to go with it, we could reuse the Random() function in an AnalyzerTest, probably
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…esql-row-command-references
…shiou/elasticsearch into esql-row-command-references
|
buildkite test this |
|
@ivancea that was a good point about references, and the idea of using random() was brilliant. I added few tests that use that internal function to double check the logic is sound. Thank you! @kanoshiou apologies for the initial push that discarded your solution. After @ivancea mentioned references I realized that was the better solution. I have brought back most of the initial solution you proposed, cleaned up the code a bit and added more tests (both unit tests and IT tests). My solution was failing for a new tests that I also added: For |
|
buildkite test this |
|
buildkite test this |
There was a problem hiding this comment.
The random() cases added to Analyzer tests check structure, but not results. Can you add a test like the one on AnalyzerTests, here, so it's resolved to a LocalRelation, and we can assert that, for ROW a = random(10000), b = a, both a and b have the same value?
Not sure how complicated this will be, as random isn't foldable afaik. Honestly, given this fact, it may be a bad function to try, as I suppose ROW will fail trying to fold it? If that's the case, ignore this, I think we're good at least until another function like that appears
reference attributes, as well
…esql-row-command-references
|
buildkite test this |
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
buildkite test this |
## Summary This PR fixes an issue where valid ESQL `ROW` commands failed when a field referenced a previously defined field within the same command (e.g., `ROW x = 4, y = 2, z = x + y`). ## Changes - **LogicalPlanBuilder**: Removed the call to `mergeOutputExpressions` in `visitRowCommand`. We now intentionally defer duplicate field handling and reference resolution to the Analyzer. - **AnalyzerRules**: Updated `ResolveRefs` to specifically **not** skip `Row` nodes, even if they flag as `resolved()` (which happens when they only contain literals). - **Analyzer**: Implemented `resolveRow` within the `ResolveRefs` rule. This logic mimics `resolveEval` but for literals: - It iterates through the fields sequentially. - It allows later fields to resolve references to earlier fields by substituting them with their defined expressions. - It handles field shadowing by removing earlier definitions if a duplicate name is encountered. ## Related Issue Closes elastic#140119
…locations * upstream/main: (153 commits) ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739) Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743) Bar searching or sorting on _seq_no when disabled (elastic#143600) Generalize `testClientCancellation` test (elastic#143586) JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702) Track recycler pages in circuit breaker (elastic#143738) [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696) Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673) [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703) Fix flaky MMR diversification YAML tests (elastic#143706) ES|QL codegen: check builder arguments for vector support (elastic#143724) Add Views Security Model (elastic#141050) ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460) Don't run seq_no pruning tests in release CI (elastic#143725) ESQL: Support intra-row field references in ROW command (elastic#140217) ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601) IndexRoutingTests with and without synthetic id (elastic#143566) Synthetic id upgrade test in serverless (elastic#142471) Disable "Review skipped" comments for PRs without specified labels (elastic#143728) Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662) ...
## Summary This PR fixes an issue where valid ESQL `ROW` commands failed when a field referenced a previously defined field within the same command (e.g., `ROW x = 4, y = 2, z = x + y`). ## Changes - **LogicalPlanBuilder**: Removed the call to `mergeOutputExpressions` in `visitRowCommand`. We now intentionally defer duplicate field handling and reference resolution to the Analyzer. - **AnalyzerRules**: Updated `ResolveRefs` to specifically **not** skip `Row` nodes, even if they flag as `resolved()` (which happens when they only contain literals). - **Analyzer**: Implemented `resolveRow` within the `ResolveRefs` rule. This logic mimics `resolveEval` but for literals: - It iterates through the fields sequentially. - It allows later fields to resolve references to earlier fields by substituting them with their defined expressions. - It handles field shadowing by removing earlier definitions if a duplicate name is encountered. ## Related Issue Closes elastic#140119
Summary
This PR fixes an issue where valid ESQL
ROWcommands failed when a field referenced a previously defined field within the same command (e.g.,ROW x = 4, y = 2, z = x + y).Changes
mergeOutputExpressionsinvisitRowCommand. We now intentionally defer duplicate field handling and reference resolution to the Analyzer.ResolveRefsto specifically not skipRownodes, even if they flag asresolved()(which happens when they only contain literals).resolveRowwithin theResolveRefsrule. This logic mimicsresolveEvalbut for literals:Related Issue
Closes #140119