ESQL: Prune unused regex extract nodes in optimizer by kanoshiou · Pull Request #140982 · elastic/elasticsearch

kanoshiou · 2026-01-20T16:55:14Z

Summary

Enables the optimizer to remove entire RegexExtract operations (Dissect and Grok) when none of their extracted fields are used downstream, eliminating unnecessary pattern matching overhead.

Context

Previously, RegexExtract nodes remained in the logical plan even when all extracted fields were unused, causing unnecessary pattern matching execution. Due to RegexExtract's design constraints requiring field count to match the pattern, the optimizer can only remove the entire node, not prune individual fields.

Closes #132437

elasticsearchmachine · 2026-01-22T07:55:49Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

astefan · 2026-02-10T15:43:19Z

buildkite test this

astefan

Thank you for providing this fix. It does look ok conceptually, but I think it needs more complex tests. Both those in csv-spec files and the unit tests in LogicalPlanOptimizerTests test simple scenarios; add some tests where you drop the fields generated by grok and dissect, not only keep and stats. Test the functionality with lookup join and inline stats as well. Shadow the fields generated by grok and dissect with renames, evals and redefine those fields as well.

astefan · 2026-02-11T13:39:36Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

        var firstBranch = fork.children().getFirst();
        var firstBranchProject = as(firstBranch, Project.class);
        assertThat(firstBranchProject.projections().size(), equalTo(3));
+        // Dissect has been pruned since x, y, z fields are not used in the final aggregation


Since you added this comment here, it would be more complete to add comments about the other x, y and z fields (from other fork branches) that are dropped since they are not used anymore.

astefan · 2026-02-11T14:56:06Z

...in/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumns.java

+    /**
+     * Prunes RegexExtract operations (Dissect and Grok) when none of their extracted fields are used.
+     * <p>
+     * Note: Due to limitations in {@link RegexExtract#withGeneratedNames(List)}, which requires the exact same


I don't understand this comment. Why the presence of that method be a reason for not partially pruning grok/dissect? Can you, please, explain?
Also, withGeneratedNames is not in RegexExtract, but GeneratingPlan. And eval is a GeneratingPlan as well, but that can be partially pruned (see pruneColumnsInEval method from PruneColumns).

I initially assumed we couldn’t create a dissect or grok with a different number of extractedFields. However, I’ve updated the logic for partially pruning RegexExtract plans and used sealed to ensure no future subclasses of RegexExtract are missed in the switch inside pruneUnusedRegexExtract.

This patch now appears to break some queries. Please take a look at the comment I posted below.

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

kanoshiou · 2026-02-15T06:34:48Z

Thank you for your review @astefan! At the moment, I’m not confident about the correct architectural direction here. If you have a better approach in mind, I’d appreciate your input.

Context
I am implementing partial pruning in PruneColumns. When only a subset of extractedFields is used, we create a new Dissect or Grok node with only the used fields.

The Problem
In LocalExecutionPlanner, the Dissect and Grok planning logic assumes a 1:1 positional correspondence between the extractedFields list (logical plan) and the parser's pattern keys (parser implementation).

Layout is built from extractedFields (size $N$, pruned). This determines channel indices for all downstream operators.
Operator (StringExtractOperator / ColumnExtractOperator) is initialized using the full pattern definition from the parser (size $M$, unpruned). This means the operator produces $M$ blocks at runtime.

When pruning occurs ($N < M$):

The operator appends $M$ blocks to the page.
The layout expects only $N$ blocks.
This causes a corrupted page structure where downstream operators (like Aggregator) read from the wrong channel indices (off by $M - N$).

Why we can't just use extractedFields.name()
We cannot simply verify the operator using extractedFields.name() because of variable shadowing (ref PR #108360). When PushDownRegexExtract pushes logic past a Rename, the attribute names in extractedFields are changed to avoid conflicts, but the underlying parser still returns a map keyed by the original pattern names. The operator must use pattern names to look up values in the parser's result.

Example

# dissect.dissectStats
from employees 
| eval x = concat(gender, " foobar") 
| dissect x "%{a} %{b}" 
| stats n = max(emp_no) by a 
| keep a, n 
| sort a asc

astefan · 2026-02-25T13:19:13Z

@kanoshiou apologies for the delay of my reply.

Please, go ahead and keep only the full pruning part of the Regex nodes. We'll consider the partial pruning for another future PR. It would be pity to not move on with this PR, it has some good code and tests that we should definitely have in the language. Thank you very much! Looking forward to review this PR after partial pruning is, for now, removed.

kanoshiou · 2026-02-26T07:35:18Z

@astefan I’ve removed the partial pruning logic. Feel free to review whenever you're free!

astefan · 2026-02-26T13:58:05Z

buildkite test this

kanoshiou · 2026-02-26T15:10:52Z

@astefan the failed test has now heen resolved

astefan · 2026-02-26T15:12:28Z

buildkite test this

kanoshiou · 2026-02-27T08:13:59Z

The failing test is not caused by this PR.

Reference: #143174

astefan · 2026-02-27T08:15:31Z

buildkite test this

astefan · 2026-02-27T16:20:27Z

buildkite test this

astefan · 2026-03-02T15:29:24Z

...ql/src/test/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumnsTests.java

     * Limit[1000[INTEGER],false,false]
-     * \_Project[[id{f}#12]]
-     *   \_Dissect[x{r}#5,Parser[pattern=%{foo}, appendSeparator=, parser=org.elasticsearch.dissect.DissectParser@18e5d3b5],[foo{r}#6]]
-     *     \_Project[[id{f}#12, $$languages$converted_to$keyword{f$}#14, $$languages$converted_to$keyword{f$}#14 AS x#5]]


This specific test has a special purpose and it should remain as is. I'll update it

astefan

LGTM. Thank you @kanoshiou

…prune-unused-regex-extract-nodes

…/kanoshiou/elasticsearch into prune-unused-regex-extract-nodes

astefan · 2026-03-02T15:52:13Z

buildkite test this

astefan · 2026-03-02T16:11:36Z

buildkite test this

astefan · 2026-03-02T20:43:06Z

buildkite test this

astefan · 2026-03-03T07:50:34Z

buildkite test this

astefan · 2026-03-03T07:51:09Z

buildkite test this

kanoshiou · 2026-03-03T09:57:24Z

Thank you for picking this up and polishing the final changes, @astefan! I appreciate the help in getting this merged.

…cations * upstream/main: (56 commits) Mute org.elasticsearch.compute.lucene.read.ValueSourceReaderTypeConversionTests testLoadAll elastic#143471 [DOCS] Fix ES|QL function and commands lists versioning metadata (elastic#143402) Fix MMROperatorTests (elastic#143453) Fix CSV-escaped quotes in generated docs examples (elastic#143449) Fix SQL client parsing of array header values (elastic#143408) ESQL: Add extended distribution tests and fault injection for external sources (elastic#143420) ESQL: Fix datasource test failures on Windows and FIPS (elastic#143417) Add circuit breaker for query construction to prevent OOM from automaton-based queries (elastic#142150) Cleanup SpecIT logging configuration (elastic#143365) ESQL: Prune unused regex extract nodes in optimizer (elastic#140982) Ensure supported locale outside of Entitlements check (elastic#143405) feat(es|ql): add dense_vector support in coalesce (elastic#142974) [Test] Unmute SnapshotStressTestsIT (elastic#143359) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.LookupJoinWithCoalesceFilterOnRight} elastic#143443 Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndex} elastic#143442 ESQL: Fix CCS exchange sink cleanup (elastic#143325) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyOnTheLookupIndexAfterStats} elastic#143434 Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:lookup-join.MvJoinKeyFromRow} elastic#143432 Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:k8s-timeseries.Datenanos_derivative_compared_to_rate} elastic#143431 Mute org.elasticsearch.multiproject.test.CoreWithMultipleProjectsClientYamlTestSuiteIT test {yaml=search.retrievers/result-diversification/10_mmr_result_diversification_retriever/Test MMR result diversification single index float type} elastic#143430 ...

feat: prune unused regex extract nodes in optimizer

416584d

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.4.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jan 20, 2026

Update docs/changelog/140982.yaml

3178ee0

gareth-ellis added :Analytics/ES|QL AKA ESQL >enhancement labels Jan 22, 2026

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jan 22, 2026

Merge branch 'refs/heads/main' into prune-unused-regex-extract-nodes

a0f9f2b

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

astefan self-requested a review February 6, 2026 10:18

astefan self-assigned this Feb 6, 2026

astefan reviewed Feb 11, 2026

View reviewed changes

kanoshiou and others added 4 commits February 13, 2026 09:46

Merge branch 'main' into prune-unused-regex-extract-nodes

18a2b8e

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

Update tests

bd3096c

Merge branch 'refs/heads/main' into prune-unused-regex-extract-nodes

933cdee

Add more tests

9f20104

kanoshiou added 2 commits February 26, 2026 14:37

Merge branch 'main' into prune-unused-regex-extract-nodes

ce474f2

Cannot partially prune regex extract fields

e9d8b1b

Merge branch 'main' into prune-unused-regex-extract-nodes

cfd4fae

fix test

a6957ae

Merge branch 'main' into prune-unused-regex-extract-nodes

c9512b0

Merge branch 'main' into prune-unused-regex-extract-nodes

a40dc42

astefan and others added 2 commits February 27, 2026 18:19

Add another capability

9967fc0

Merge branch 'main' into prune-unused-regex-extract-nodes

96dde47

astefan reviewed Mar 2, 2026

View reviewed changes

astefan approved these changes Mar 2, 2026

View reviewed changes

astefan added 3 commits March 2, 2026 17:46

Update one test to keep its original semantics

50eb620

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

3fa6c0b

…prune-unused-regex-extract-nodes

Merge branch 'prune-unused-regex-extract-nodes' of https://github.com…

193efa6

…/kanoshiou/elasticsearch into prune-unused-regex-extract-nodes

[CI] Auto commit changes from spotless

64741d9

Merge branch 'main' into prune-unused-regex-extract-nodes

b74a498

Remove leftovers

b255db1

Merge branch 'main' into prune-unused-regex-extract-nodes

edf114e

astefan merged commit 3be56b9 into elastic:main Mar 3, 2026
37 checks passed

tballison pushed a commit to tballison/elasticsearch that referenced this pull request Mar 3, 2026

ESQL: Prune unused regex extract nodes in optimizer (elastic#140982)

24667d0

shmuelhanoch pushed a commit to shmuelhanoch/elasticsearch that referenced this pull request Mar 4, 2026

ESQL: Prune unused regex extract nodes in optimizer (elastic#140982)

d0eb555

This was referenced Mar 6, 2026

[ML] Wait for cluster state in test #143767

Merged

[Transform] Disable PIT for CPS #143876

Closed

Conversation

kanoshiou commented Jan 20, 2026

Summary

Context

Uh oh!

elasticsearchmachine commented Jan 22, 2026

Uh oh!

astefan commented Feb 10, 2026

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

astefan Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

astefan Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kanoshiou Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

kanoshiou commented Feb 15, 2026

Uh oh!

astefan commented Feb 25, 2026

Uh oh!

kanoshiou commented Feb 26, 2026

Uh oh!

astefan commented Feb 26, 2026

Uh oh!

kanoshiou commented Feb 26, 2026

Uh oh!

astefan commented Feb 26, 2026

Uh oh!

kanoshiou commented Feb 27, 2026

Uh oh!

astefan commented Feb 27, 2026

Uh oh!

astefan commented Feb 27, 2026

Uh oh!

astefan Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

astefan commented Mar 2, 2026

Uh oh!

astefan commented Mar 2, 2026

Uh oh!

astefan commented Mar 2, 2026

Uh oh!

astefan commented Mar 3, 2026

Uh oh!

astefan commented Mar 3, 2026

Uh oh!

Uh oh!

kanoshiou commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants