ESQL: Fix variable shadowing when pushing down past Project by alex-spies · Pull Request #108360 · elastic/elasticsearch

alex-spies · 2024-05-07T12:28:33Z

The issue was caused due to the following situation:

Eval[[x{r}#2 * 5[INTEGER] AS y]]
\_Project[[x{r}#2, y{r}#3, y{r}#3 AS z]]
  \_Row[[1[INTEGER] AS x, 2[INTEGER] AS y]]

Pushing down the Eval here was wrong and inconsistent, because we broke the rename y{r}#3 AS z.

To push down the Eval, we give a different name to the y produced by the Eval, which is the main change in this PR:

Project[[x{r}#2, y{r}#3 AS z, $$y$temp_name${r}#6 AS y]]
\_Eval[[x{r}#2 * 5[INTEGER] AS $$y$temp_name$]]
  \_Row[[1[INTEGER] AS x, 2[INTEGER] AS y]]

For Eval and Enrich, we can use the existing aliasing mechanisms existing in the logical plans; for Dissect and Grok, this PR enables naming their generated attributes to deviate from the names obtained from the dissect/grok patterns.

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

…roject-shadowing

Changing this and relying on being able to rename the attributes generated by Dissect/Grok will break bwc: old nodes cannot rename the generated attributes.

…roject-shadowing

alex-spies · 2024-07-22T11:47:49Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

+        // Names in the pattern and layout can differ.
+        String[] patternNames = Expressions.names(dissect.parser().keyAttributes(Source.EMPTY)).toArray(new String[0]);

        Layout layout = layoutBuilder.build();
        source = source.with(
            new StringExtractOperator.StringExtractOperatorFactory(
-                attributeNames,
+                patternNames,
                EvalMapper.toEvaluator(expr, layout),
                () -> (input) -> dissect.parser().parser().parse(input)
            ),


This and the corresponding change to planGrok are one of the main points of this PR.

elasticsearchmachine · 2024-07-22T15:26:01Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

luigidellaquila

LGTM, thanks Alex!

I left a couple of comments, but I think the general approach and the implementation correct

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/GeneratingPlan.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/LogicalPlanBuilder.java

luigidellaquila · 2024-07-22T16:39:44Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

+
+countSameFieldWithEval
+required_capability: fixed_pushdown_past_project
+from employees | stats  b = count(gender), c = count(gender) by gender | eval b = gender | sort c asc


I'd really love to have a couple more tests here, where you have multiple expressions in the same EVAL (and GROK, DISSECT...), where some are masked and some are not.
Also, some tests where the EVAL uses masked names in following expressions

I increased unit test coverage (multiple expressions in the same EVAL) and will add a couple more csv tests where we sometimes shadow, sometimes not.

I'll avoid adding them to stats.csv-spec, though, and will not include STATS commands in the tests; the fact that STATS triggered this bug is merely accidentaly, it's really RENAME ... | EVAL ... and similar that lead to this problem.

fang-xing-esql · 2024-07-23T03:42:10Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

+
+countSameFieldWithEval
+required_capability: fixed_pushdown_past_project
+from employees | stats  b = count(gender), c = count(gender) by gender | eval b = gender | sort c asc


I understand the goal is to make the query run to completion, but the first time I looked at this query, the alias b looks ambiguous, should it be gender or count(gender)? Seems like we return b as gender, it overwrites count(gender), but when pushdown happens, the order might be reversed. In SQL, if users code ambiguous column references, an error will return. Should we return an error here to indicate that b is ambiguous or make it return successfully here(if we have agreement in ES|QL that if the same alias is defined in multiple places, the last one will take effect)?

I don't think we should throw errors. Masking happens all the time, even a simple | eval a = 1 | ... | eval a = 2 could be considered masking.
The intention is exactly to make sure that the final result for b is the value of gender, even if EVAL gets pushed down before STATS (that is what is supposed to happen with current planning rules)

Thanks @fang-xing-esql - ESQL in general allows shadowing attribute names that have been available previously. Take a look at the shadowing... csv tests to see this in action. (The tests exist for most commands, eval.csv-spec may be the most important one, though.) Some PRs ago, I also updated our docs to describe behavior in case of conflicting names.

The main idea is that we want to be able to compose expressions in eval, like EVAL x = to_upper(field), x = concat(x, some_other_field).

…roject-shadowing

astefan · 2024-07-23T10:36:09Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Dissect.java

+        if (newNames.size() != extractedFields.size()) {
+            throw new IllegalArgumentException(
+                "Number of new names is [" + newNames.size() + "] but there are [" + extractedFields.size() + "] existing names."
+            );
+        }


This check is common to all classes that implement GeneratingPlan. Maybe you could extract this in a default method in the interface or define an abstract class that extends UnaryPlan and implements GeneratingPlan instead (the abstract class is more appropriate I think).

Default method is simple enough! I'm afraid of changing the class hierarchy of RegexExtract, Enrich and Eval to put an abstract class in between there: this might, maybe, mess up some instanceOf checks, and we might have to fiddle with EsqlNodeSubclassTests, which I like to avoid.

astefan · 2024-07-23T11:01:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/RegexExtract.java

    }

+    @Override
+    public List<Attribute> generatedAttributes() {


What is the difference between generatedAttributes and extractedFields? Why not calling extractedFields() directly?

extractedFields() exists on Grok and Dissect, but not on Eval nor Enrich; these have fields() (Aliases, though) and enrichFields() (NamedExpressions).

Having all of them implement a common interface with this generatedAttributes makes it much easier to write the pushdown rule for Grok, Dissect, Eval and Enrich - and also to test them.

Additionally, we're in the process of adding more plan nodes that, with respect to shadowing, should behave the same: Lookup and Inlinestats. Having an interface should make the rules easier to reason about.

astefan · 2024-07-23T11:03:31Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

        layoutBuilder.append(dissect.extractedFields());
        final Expression expr = dissect.inputExpression();
-        String[] attributeNames = Expressions.names(dissect.extractedFields()).toArray(new String[0]);
+        // Names in the pattern and layout can differ.


When is this happening? Can you give an example?

Expanded the comment in the latest push.

This happens whenever we call GeneratingPlan.withGeneratedNames on Grok and Dissect.

This enables us to have consistent names in our logical plans, without having to rewrite the format strings for grok and dissect.

astefan · 2024-07-23T11:05:08Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Dissect.java


    public record Parser(String pattern, String appendSeparator, DissectParser parser) {

+        public List<Attribute> keyAttributes(Source src) {


Moving this one here is a bit forced. Especially since it seems to act at the parser level (ie the use of ParsingException).

Moving it - just the validation - back into LogicalPlanBuilder.visitDissectCommand.

astefan · 2024-07-23T11:06:34Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Dissect.java

+        public List<Attribute> keyAttributes(Source src) {
+            Set<String> referenceKeys = parser.referenceKeys();
+            if (referenceKeys.size() > 0) {
+                throw new ParsingException(


@luigidellaquila when is this possible, in practical terms? Can you give an example of query?

DissectParser can create field names (together with values) at runtime, from data, see https://www.elastic.co/guide/en/elasticsearch/reference/current/dissect-processor.html#dissect-modifier-reference-keys

In ES|QL we don't support it because we need to know the schema at planning time.

We have a test for this as well https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/StatementParserTests.java#L756

…roject-shadowing

D'oh

astefan

LGTM

alex-spies · 2024-07-23T12:23:00Z

Thanks for your reviews, @astefan , @luigidellaquila and @fang-xing-esql !

elasticsearchmachine · 2024-07-23T13:03:51Z

💔 Backport failed

Status	Branch	Result
❌	8.15	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 108360

alex-spies · 2024-07-24T09:45:55Z

💚 All backports created successfully

Status	Branch	Result
✅	8.15

Questions ?

Please refer to the Backport tool documentation

…108360) Fix bugs caused by pushing down Eval, Grok, Dissect and Enrich past Rename, where after the pushdown, the columns added shadowed the columns to be renamed. For Dissect and Grok, this enables naming their generated attributes to deviate from the names obtained from the dissect/grok patterns. (cherry picked from commit e8a01bb) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Dissect.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Eval.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/RegexExtract.java

…#111229) Fix bugs caused by pushing down Eval, Grok, Dissect and Enrich past Rename, where after the pushdown, the columns added shadowed the columns to be renamed. For Dissect and Grok, this enables naming their generated attributes to deviate from the names obtained from the dissect/grok patterns. (cherry picked from commit e8a01bb) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Dissect.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Eval.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/RegexExtract.java

Add reproducing tests

2554ee1

elasticsearchmachine added the v8.15.0 label May 7, 2024

alex-spies force-pushed the fix-pushdown-past-project-shadowing branch from 4ad33b1 to c5c85d1 Compare May 7, 2024 12:29

alex-spies added 3 commits May 7, 2024 14:30

Turn RegexExtract extracted fields into Aliases

16dc1ff

Fix pushDownPastProject

c6396fc

Fix physical planning/optimization

79a2f2d

alex-spies force-pushed the fix-pushdown-past-project-shadowing branch from c5c85d1 to 79a2f2d Compare May 7, 2024 12:30

alex-spies changed the title ~~ESQL: Fix pushdown past project shadowing~~ ESQL: Fix variable shadowing when pushing down past Project May 7, 2024

alex-spies commented May 7, 2024

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec Show resolved Hide resolved

alex-spies added 8 commits May 22, 2024 16:33

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

2979314

…roject-shadowing

Merge branch 'main' into fix-pushdown-past-project-shadowing

96c55b3

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

f21e47b

…roject-shadowing

Make tests deterministic

717f9cd

Update StatementParserTests

7a74e46

Update unit tests

c456b2f

Fix withGeneratedNames for Eval

06cd647

Unit test: pushdown shadowing eval past project

dbe12b6

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

alex-spies added 10 commits July 4, 2024 14:03

Generalize unit test, add DISSECT

dce915f

Add test cases for grok and enrich

c9b94e8

Improve comment

c061b64

Align push down past order by with new approach

b93f481

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

3684d13

…roject-shadowing

Revert to previous push down past order by

dd6abad

Changing this and relying on being able to rename the attributes generated by Dissect/Grok will break bwc: old nodes cannot rename the generated attributes.

Use different names without new transport version

2ac1257

Update tests

ac31ef4

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

b87ae72

…roject-shadowing

Move randomConfiguration() back

fc5fec5

alex-spies commented Jul 22, 2024

View reviewed changes

alex-spies requested review from fang-xing-esql and luigidellaquila July 22, 2024 15:27

luigidellaquila approved these changes Jul 22, 2024

View reviewed changes

fang-xing-esql reviewed Jul 23, 2024

View reviewed changes

alex-spies added the test-full-bwc Trigger full BWC version matrix tests label Jul 23, 2024

alex-spies mentioned this pull request Jul 23, 2024

ESQL: a second ENRICH enriching with a RENAMEd field fails #110041

Closed

alex-spies added 4 commits July 23, 2024 10:40

Add simpler tests

4e8f828

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

17cfbc8

…roject-shadowing

Apply remarks

f904f54

Make unit tests a bit spicier

33e8ac4

astefan reviewed Jul 23, 2024

View reviewed changes

alex-spies added 5 commits July 23, 2024 13:15

Moar tests

edd2f7f

Merge remote-tracking branch 'upstream/main' into fix-pushdown-past-p…

32c5fba

…roject-shadowing

Move dissect pattern validation back into parsing

73ba36c

DRY

db3304d

Fix test

984bd98

D'oh

astefan approved these changes Jul 23, 2024

View reviewed changes

alex-spies merged commit e8a01bb into elastic:main Jul 23, 2024

alex-spies deleted the fix-pushdown-past-project-shadowing branch July 23, 2024 13:02

elasticsearchmachine added the backport pending label Jul 23, 2024

alex-spies mentioned this pull request Jul 24, 2024

[8.15] ESQL: Fix variable shadowing when pushing down past Project (#108360) #111229

Merged

costin mentioned this pull request Jul 26, 2024

ESQL: remove cycle between LogicalPlanOptimizer and its rules #111317

Closed

3 tasks

lkts mentioned this pull request Aug 13, 2024

Fix references to logsdb index mode in release highlights lkts/elasticsearch#1

Closed

kanoshiou mentioned this pull request Feb 15, 2026

ESQL: Prune unused regex extract nodes in optimizer #140982

Merged


		public record Parser(String pattern, String appendSeparator, DissectParser parser) {

		public List<Attribute> keyAttributes(Source src) {

Conversation

alex-spies commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alex-spies Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 22, 2024

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

alex-spies commented Jul 23, 2024

Uh oh!

elasticsearchmachine commented Jul 23, 2024

💔 Backport failed

Uh oh!

alex-spies commented Jul 24, 2024

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alex-spies commented May 7, 2024 •

edited

Loading

alex-spies Jul 22, 2024 •

edited

Loading

astefan Jul 23, 2024 •

edited

Loading