Skip to content

Conversation

@astefan
Copy link
Contributor

@astefan astefan commented Nov 4, 2025

Addresses #127497

@astefan astefan force-pushed the push_filters_past_inlinestats branch from 13d73b6 to cbd91b4 Compare November 11, 2025 16:02
@astefan astefan requested a review from bpintea November 12, 2025 14:50
@astefan astefan marked this pull request as ready for review November 12, 2025 14:50
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @astefan, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

c:long
1
;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are mostly mirroring the unit tests in PushDownAndCombineFiltersTests

}));
}

public static LogicalPlan newMainPlan(LogicalPlan optimizedPlan, InlineJoin.LogicalPlanTuple subPlans, LocalRelation resultWrapper) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted this as a method and made public to be used in the unit tests. I needed a way to use the mechanism EsqlSession uses to simulate part of the flow inline stats goes through.

);
assertEquals(expectedPushedFilters, actualPushedFilters);
}

Copy link
Contributor Author

@astefan astefan Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are convoluted (apologies for that) but they are a must. The nature of this optimization (where the results of queries do not change and the PR impact is not easily visible) requires a deep analysis of what happens so that the filters go to the right positions in the logical plan.

If you have further ideas for what to test, please let me know. I could have came up with more complex queries, but the amount of cognitive load on each test writing is high enough that I wanted to speed up the process a bit :-).

My advice: read the method javadoc with the modified logical plan and then try to match what you see there with the actual Java code, otherwise you'd spend a lot of minutes (tens of) for the test code alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add these into a new file?
PushDownAndCombineFiltersInlineJoinTests.java or something.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Looks good, only left cosmetics remarks so far.

return new ScopedFilter(rest, leftFilters, rightFilters);
}

// split the filter condition in 2 parts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// split the filter condition in 2 parts:
// split the filter condition in 3 parts:

:)

List<Expression> leftFilters = new ArrayList<>(filters);

AttributeSet leftOutput = ij.left().outputSet();
AttributeSet rightOutput = AttributeSet.of(ij.config().rightFields());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AttributeSet rightOutput = AttributeSet.of(ij.config().rightFields());
AttributeSet rightJoinSet = AttributeSet.of(ij.config().rightFields());

As it's not really the output of the right (i.e. aggs + groups), but just the groups.

List<Expression> bothSides = new ArrayList<>();
List<Expression> leftFilters = new ArrayList<>(filters);

AttributeSet leftOutput = ij.left().outputSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AttributeSet leftOutput = ij.left().outputSet();
AttributeSet leftOutputSet = ij.left().outputSet();

Just to match the below.

Comment on lines 136 to 138
// 1. filters that can be applied only to the right
// 2. filters that can be applied to both sides
// 3. filters that can be applied only to the left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace the "can be applied" with "reference" and then at the end "attributes": "1. filters that reference only to the RHS attributes" (or "..the right attributes"), since we're not applying those as such.

);
assertEquals(expectedPushedFilters, actualPushedFilters);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add these into a new file?
PushDownAndCombineFiltersInlineJoinTests.java or something.

join = (Join) join.replaceLeft(left);
// we completely applied the left filters, so we can remove them from the scoped filters
scoped = new ScopedFilter(scoped.commonFilters(), List.of(), scoped.rightFilters);
scoped = new ScopedFilter(commonFilters, List.of(), scoped.rightFilters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scoped = new ScopedFilter(commonFilters, List.of(), scoped.rightFilters);
scoped = new ScopedFilter(commonFilters, List.of(), pushableToRightSide);

Here and below: it's hard to track which filters come from where. The suggestion above is one way to go.

Another would be to rewrite scoped above instead of extracting the three variables (or return them "ready to be applied" from scopeInlineStatsFilter() already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried multiple approaches here, and none of them is ideal. If I do it like you say (and I believe I did try this approach) another decision based on the join type must be taken somewhere further below. Because the logic is common in many places for inlinejoin and other types of join, I really wanted to not duplicate the logic. Initially I also used a single method for scoping, but it was too messy. Imho, the code I came up with is the least intrusive one and somewhat clean.

Likely, I didn't find the best approach; please, if you have the time, try to make it more easily understandable by refactoring the way you think best. From my point of view (and ease of grasping the logic), what is in this PR is the best version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored this after I looked into pushing down filters on the right-hand side as well. Not possible yet #138257

Comment on lines 181 to 183
var pushableToLeftSide = join instanceof InlineJoin ? scoped.commonFilters() : scoped.leftFilters();
var pushableToRightSide = scoped.rightFilters();
var commonFilters = join instanceof InlineJoin ? scoped.leftFilters() : scoped.commonFilters();
Copy link
Contributor

@bpintea bpintea Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Related to the previous comment above]

Suggested change
var pushableToLeftSide = join instanceof InlineJoin ? scoped.commonFilters() : scoped.leftFilters();
var pushableToRightSide = scoped.rightFilters();
var commonFilters = join instanceof InlineJoin ? scoped.leftFilters() : scoped.commonFilters();
scoped = join instanceof InlineJoin ? new ScopedFilter(scoped.leftFilters, scoped.commonFilters, scoped.rightFilters) : scoped;

..and then keep the rest of the code below unchanged.

Alternatively, do this shuffling in scopeInlineStatsFilter() already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the naming here is a bit tricky since, as done now, the filters pushable on the LHS are considered to be the ones taken from the RHS join keys. However, these are the same as the LHS join keys (for InlineJoin).
Basically, we want to push on the LHS of the (inline) join the filters defined on the groupings (which are the join keys). Given that, I think I'd update scopeInlineStatsFilter() code to start with the bothSides var be initialised with the filters param (and not leftFilters as is now) and then extract to the leftFilters those predicates/filters that are part of the join keys.

This would then allow initialising resulting ScopedFilter instance inline with the usage at this code location where this comment is left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored

assertEquals(expectedPushedFilters, actualPushedFilters);
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
/*

I guess this might be done as a Javadoc (which, by the book, isn't) for consistency. The trouble is that this is the default rendering, which is difficult to read. I guess it can stay (for consistency), but I'll take on this crusade one day :).

Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice set of tests.
Could we also add some with filters on the aggs and maybe on the groups and see how that mixes?

FROM employees
| INLINE STATS avgByL = AVG(salary) BY languages
| INLINE STATS avgByG = AVG(salary) BY gender
| WHERE languages > 2 AND gender IS NOT NULL AND avgByL > .... AND avgByG < ...
| KEEP avg*, languages, gender, emp_no
FROM employees
| INLINE STATS avg = AVG(salary) WHERE ...  BY languages 
| WHERE languages > 2
| KEEP avg, languages, salary, emp_no

with possibly some combination of the filter(s) in the INLINE STATS WHERE clause (on the group or other field) and the filter(s) in the WHERE command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one unit test and two csv tests as the AggregateFunction filters handling is not done on the logical plan side of things from what I could tell.

// - filters scoped to the right
// - filter that requires both sides to be evaluated
ScopedFilter scoped = join instanceof InlineJoin ij
? scopeInlineStatsFilter(Predicates.splitAnd(filter.condition()), ij)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Predicates.splitAnd(filter.condition() could be taken out in a var, to make the ternary expression "lighter".

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I've only left one more comment on making the namings easier to understand.
But I believe the logic is correct. So LGTM.

if (f.references().subsetOf(leftOutput)) {
bothSides.add(f);
} else {
rightFilters.add(f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should never happen, no?
There shouldn't be an attribute that is part of the join key, but not part of the left, as otherwise the join couldn't happen. Might be easier to replace this with an assertion and have ScopedFilter instance created with a Set.of()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's a very good point. There is an use case I didn't cover (the original issue focused on the groupings pushing down) and that is the one where the calculated values are filtered further in the query. In this case those filters could be pushed down on the right hand side of the InlineJoin and the filtering done before the actual HashJoin and, I think, it should be more performant (hashing happening on fewer rows/values).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is yet not possible. See #138257

// 3. filter that requires both sides to be evaluated
ScopedFilter scoped = scopeFilter(Predicates.splitAnd(filter.condition()), left, right);
// Split the filter condition in 3 parts.
// For InlineJoin we use a scoping that allows pushing down filters either to right side only or to both sides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For InlineJoin we use a scoping that allows pushing down filters either to right side only or to both sides.
// For InlineJoin we use a scoping that allows pushing down filters either to left side only or to both sides.

Comment on lines 181 to 183
var pushableToLeftSide = join instanceof InlineJoin ? scoped.commonFilters() : scoped.leftFilters();
var pushableToRightSide = scoped.rightFilters();
var commonFilters = join instanceof InlineJoin ? scoped.leftFilters() : scoped.commonFilters();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the naming here is a bit tricky since, as done now, the filters pushable on the LHS are considered to be the ones taken from the RHS join keys. However, these are the same as the LHS join keys (for InlineJoin).
Basically, we want to push on the LHS of the (inline) join the filters defined on the groupings (which are the join keys). Given that, I think I'd update scopeInlineStatsFilter() code to start with the bothSides var be initialised with the filters param (and not leftFilters as is now) and then extract to the leftFilters those predicates/filters that are part of the join keys.

This would then allow initialising resulting ScopedFilter instance inline with the usage at this code location where this comment is left.

@astefan astefan requested a review from bpintea November 18, 2025 19:23
@astefan astefan added the auto-backport Automatically create backport pull requests when merged label Nov 18, 2025
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrei, looks great now!

@astefan astefan merged commit 45ebb58 into elastic:main Nov 19, 2025
34 checks passed
@astefan astefan deleted the push_filters_past_inlinestats branch November 19, 2025 08:27
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137572

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants