Support multi-index LOOKUP JOIN and various bug fixes by craigtaverner · Pull Request #118429 · elastic/elasticsearch

craigtaverner · 2024-12-11T10:12:27Z

While working on supporting multi-index LOOKUP JOIN; various bugs were fixed:

Previously we just used '*' for lookup-join indices, because the fieldnames were sometimes not being correctly determined. The problem was with KEEP referencing fields from the right that had previously been defined on the left as aliases, including using the ROW command. We normally don't want to ask for aliases, but if they could be shadowed by a lookup join, we need to keep them.
With both single and multi-index LOOKUP JOIN we need to mark each index as potentially wildcard fields, if the KEEP commands occur before the LOOKUP JOIN.

Previously we just used '*' for lookup-join indices, because the fieldnames were sometimes not being correctly determined. The problem was with KEEP referencing fields from the right that had previously been defined on the left as aliases, including using the ROW command. We normally don't want to ask for aliases, but if they could be shadowed by a lookup join, we need to keep them.

elasticsearchmachine · 2024-12-11T10:12:51Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

alex-spies

Heya, I think this should improve things, but I think we need to add tests to really see what fieldnames we'll request. There are tests for the field names method and I think we should throw a bunch of LOOKUP JOIN queries in there to ensure that we're asking for the right fields. In particular, there may be bugs in situations where there's a KEEP/DROP before (i.e. upstream from) the LOOKUP JOIN, which would indicate that we only need some fields from the main index - that could end up requesting too few fields from the LOOKUP JOIN.

I tried the following test, and it passes as-is on main:

    public void testLookupJoin() {
        assertFieldNames(
            "FROM employees | KEEP languages | RENAME languages AS language_code | LOOKUP JOIN languages_lookup ON language_code",
            Set.of("languages", "language_code", "language_code.*", "languages.*")
        );
    }

Not sure that's correct - we actually need all the fields from LOOKUP JOIN.

Note: Since we supply a single list of field names that doesn't differentiate between the main index and lookup indices (c.f. here where the fieldnames are passed to preAnalyzeLookupIndices), we'll ask for all the main index' field names, too, and will request too many fields either from the main index, or from lookup indices in case of multiple lookups that shadow.

Requesting too many fields is likely okay for now as long as it's not *. We probably want an optimization rule that prunes the lookup to only fetch the relevant fields, and it shouldn't/can't be the job of the pre-analysis stage to determine this.

astefan

This may not be related to this specific PR's changes, it may be something that was postponned but, nevertheless, I think we need tests in IndexResolverFieldNamesTests for queries that use lookup join queries so that the fieldNames output is correct given those queries. This may be the right PR to add the missing tests.

The meta issue also mentions a bug related to fieldNames. I am not sure if what this PR is doing has any relation to that bug, but the lack of any lookup join related tests in IndexResolverFieldNamesTests is concerning.

craigtaverner · 2024-12-11T12:40:50Z

This may not be related to this specific PR's changes, it may be something that was postponned but, nevertheless, I think we need tests in IndexResolverFieldNamesTests for queries that use lookup join queries so that the fieldNames output is correct given those queries. This may be the right PR to add the missing tests.

I could investigate that. The current PR replaces a fix that was already tested in the csv-spec tests, with a new fix. I did verify that removing "*" without also fixing the alias check causes failures.

The meta issue also mentions a bug related to fieldNames. I am not sure if what this PR is doing has any relation to that bug, but the lack of any lookup join related tests in IndexResolverFieldNamesTests is concerning.

The bug in the meta-issue was added by me, so yes, this is the fix for that bug. The text of the bug is "EsqlSession.fieldNames does not handle lookup references that are also mentioned in aliases (erases them)", but no specific issue was created, since it is a refactoring mainly.

And also improve the fix for wildcard lookups, since the previous fix did not consider KEEP before lookup-join

…averner/elasticsearch into fix-lookupjoin-wildcard-fieldcaps

astefan

@craigtaverner I was hoping with this comment from Alex you would catch this one.

I've taken that exact query that Alex added a review to and I simply moved the keep command and placed it between the two lookups (like how he suggested) with this final query:

ROW left = "left", client_ip = "172.21.0.5", message = "Connected to 10.1.0.1", right = "right"
| LOOKUP JOIN clientips_lookup ON client_ip
| KEEP left, client_ip, message, right
| LOOKUP JOIN message_types_lookup ON message

It fails with

path: /_query, params: {format=txt, error_trace=true}, status: 500 java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
        at org.elasticsearch.compute.operator.lookup.MergePositionsOperator.<init>(MergePositionsOperator.java:76)
        at org.elasticsearch.xpack.esql.enrich.AbstractLookupService.doLookup(AbstractLookupService.java:329)
        at org.elasticsearch.xpack.esql.enrich.AbstractLookupService$TransportHandler.messageReceived(AbstractLookupService.java:447)
        at org.elasticsearch.xpack.esql.enrich.AbstractLookupService$TransportHandler.messageReceived(AbstractLookupService.java:442)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:90)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:1098)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

If it's something easy to fix, do it here, otherwise merge this PR and address this problem ^ in a separate PR right away.

craigtaverner · 2024-12-16T15:36:37Z

@astefan Your failing query drops all columns from the JOIN, and we do not yet have validation for that case. I would prefer to fix this in another PR, because it is a pre-existing issue, unrelated to multiple joins.

…rd-fieldcaps

astefan · 2024-12-16T15:45:55Z

#118778

astefan

LGTM

…rd-fieldcaps

alex-spies

Reviewed the changes to LogicalPlanOptimizerTests, these LGTM! Thanks Craig!

While working on supporting multi-index LOOKUP JOIN; various bugs were fixed: * Previously we just used '*' for lookup-join indices, because the fieldnames were sometimes not being correctly determined. The problem was with KEEP referencing fields from the right that had previously been defined on the left as aliases, including using the ROW command. We normally don't want to ask for aliases, but if they could be shadowed by a lookup join, we need to keep them. * With both single and multi-index LOOKUP JOIN we need to mark each index as potentially wildcard fields, if the KEEP commands occur before the LOOKUP JOIN.

elasticsearchmachine · 2024-12-16T20:28:14Z

💚 Backport successful

Status	Branch	Result
✅	8.x

) While working on supporting multi-index LOOKUP JOIN; various bugs were fixed: * Previously we just used '*' for lookup-join indices, because the fieldnames were sometimes not being correctly determined. The problem was with KEEP referencing fields from the right that had previously been defined on the left as aliases, including using the ROW command. We normally don't want to ask for aliases, but if they could be shadowed by a lookup join, we need to keep them. * With both single and multi-index LOOKUP JOIN we need to mark each index as potentially wildcard fields, if the KEEP commands occur before the LOOKUP JOIN.

craigtaverner added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 11, 2024

craigtaverner requested review from alex-spies and astefan December 11, 2024 10:12

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

85d3362

alex-spies reviewed Dec 11, 2024

View reviewed changes

astefan requested changes Dec 11, 2024

View reviewed changes

craigtaverner added 7 commits December 11, 2024 13:40

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

e3ae17a

Add support for multiple lookup joins

ce99649

And also improve the fix for wildcard lookups, since the previous fix did not consider KEEP before lookup-join

Ignore two remaining failures

3039387

Revert some formatting changes

0af4dc5

Fix compile errors

c7a09b8

Merge branch 'fix-lookupjoin-wildcard-fieldcaps' of github.com:craigt…

19a3cdb

…averner/elasticsearch into fix-lookupjoin-wildcard-fieldcaps

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

5c188a3

craigtaverner changed the title ~~Fix wildcard field-caps for LOOKUP JOIN~~ Support multi-index LOOKUP JOIN and various fixes Dec 11, 2024

craigtaverner changed the title ~~Support multi-index LOOKUP JOIN and various fixes~~ Support multi-index LOOKUP JOIN and various bug fixes Dec 11, 2024

craigtaverner added 7 commits December 11, 2024 19:08

More consistent test naming

3649261

More consistent test naming

96b97ce

Change EsqlCapability for backwards compatibility tests

e60a216

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

fe5fab0

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

946c998

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

5f9da13

Add unit tests for fieldNames and LOOKUP JOIN

84e61d1

craigtaverner added 5 commits December 13, 2024 16:36

Rename class for clarity of purpose

99dbc09

Added tests for wildcard KEEP

134a152

Reordered expected fields to be more like rest of test class

3d05ab4

Added tests using results from one lookup as join key in next lookup

a1fdf4e

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

52bd37e

alex-spies mentioned this pull request Dec 13, 2024

ESQL: Skip lookup fields when eliminating missing fields #118658

Merged

craigtaverner added 2 commits December 13, 2024 18:24

Added RequestIndexFilteringTest on missing LOOKUP JOIN index

c4589bc

Block running in mixed clusters

dda8b2e

astefan reviewed Dec 13, 2024

View reviewed changes

craigtaverner added 3 commits December 16, 2024 16:36

Merge remote-tracking branch 'origin/main' into fix-lookupjoin-wildca…

57ba4ec

…rd-fieldcaps

Added more tests for KEEP before, between and after joins when using ROW

3b4776e

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

05408b5

astefan approved these changes Dec 16, 2024

View reviewed changes

craigtaverner mentioned this pull request Dec 16, 2024

LOOKUP JOIN fails with cryptic error when we DROP the fields looked up #118778

Closed

craigtaverner added 4 commits December 16, 2024 17:03

Fixed capability name after merging main

cef5753

Fix failing tests after merging in main

4b24022

Merge remote-tracking branch 'origin/main' into fix-lookupjoin-wildca…

701d589

…rd-fieldcaps

Just use one index name

d78b655

alex-spies approved these changes Dec 16, 2024

View reviewed changes

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

8e3f9b2

craigtaverner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 16, 2024

Merge branch 'main' into fix-lookupjoin-wildcard-fieldcaps

4225976

elasticsearchmachine merged commit 8e98868 into elastic:main Dec 16, 2024

craigtaverner deleted the fix-lookupjoin-wildcard-fieldcaps branch December 16, 2024 20:27

craigtaverner mentioned this pull request Dec 16, 2024

[8.x] Support multi-index LOOKUP JOIN and various bug fixes (#118429) #118799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-index LOOKUP JOIN and various bug fixes#118429

Support multi-index LOOKUP JOIN and various bug fixes#118429
elasticsearchmachine merged 50 commits intoelastic:mainfrom
craigtaverner:fix-lookupjoin-wildcard-fieldcaps

craigtaverner commented Dec 11, 2024 •

edited

Loading

Uh oh!

elasticsearchmachine commented Dec 11, 2024

Uh oh!

alex-spies left a comment

Uh oh!

astefan left a comment

Uh oh!

craigtaverner commented Dec 11, 2024

Uh oh!

astefan left a comment

Uh oh!

craigtaverner commented Dec 16, 2024

Uh oh!

astefan commented Dec 16, 2024

Uh oh!

astefan left a comment

Uh oh!

alex-spies left a comment

Uh oh!

elasticsearchmachine commented Dec 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

craigtaverner commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 11, 2024

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

craigtaverner commented Dec 11, 2024

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

craigtaverner commented Dec 16, 2024

Uh oh!

astefan commented Dec 16, 2024

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Dec 16, 2024

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

craigtaverner commented Dec 11, 2024 •

edited

Loading