Fixes esql class cast bug in STATS at planning level#137511
Fixes esql class cast bug in STATS at planning level#137511ncordon merged 20 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @ncordon, I've created a changelog YAML for you. |
alex-spies
left a comment
There was a problem hiding this comment.
Heya, no review yet, except for a very quick first glance.
This also fixes #136598, nice! Let's note that down in the PR description so the other issue gets auto-closed on merge, as well.
That said, I don't think this addresses problems like
| stats median(foo), percentile(foo, 50), count_distinct(foo)
because the substitution median(foo) -> percentile(foo, 50) happens after ReplaceAggregateAggExpressionWithEval, right?
The PR description says this partially addresses #133992; what else is not yet addressed?
alex-spies
left a comment
There was a problem hiding this comment.
Heya, could you please add some tests to the logical plan optimizer tests that demonstrate what the plans for some relevant STATS queries will look like? Actually, we should create a test class similar to ReplaceStatsFilteredAggWithEvalTests; there are probably some tests in LogicalPlanOptimizerTests that could be moved there, too, but that's optional.
I'm interested in seeing a bunch of cases, esp. ones with a BY clause and with per-agg-function WHERE clauses. We seem to have little coverage of per-agg-function WHERE clauses that are different from their canonicalization (otherwise I'd have expected some test failures).
Other than that, I think the approach in the fix is good! Clearly, when deduplicating aggs in expressions, we need to be consistent between a single agg function and an expression with agg functions within it.
| if (alias == null) { | ||
| // create synthetic alias ove the found agg function | ||
| alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), canonical, null, true); | ||
| alias = new Alias(af.source(), syntheticName(canonical, child, counter[0]++), af, null, true); |
There was a problem hiding this comment.
Why do we not want to use the canonicalized agg function here anymore?
There was a problem hiding this comment.
I think we should keep af.canonicalize(). The canonicalization still affects the per-agg filter, as in STATS c = count(field) WHERE other_field*1 > 10
There was a problem hiding this comment.
I don't understand the explanation on why it is important to keep the af.cannonical() here. I'd swear at some point I had to change this because tests were breaking otherwise. But if all tests are passing with it that means that either it is not that important or that we are missing specific tests that would break because of this?
There was a problem hiding this comment.
The canonicalization may not be important, but I have a hunch that this is under tested + it's just a smaller change if we keep emitting the same agg functions as before (with canonicalization).
| Expression aggExpression = child.transformUp(AggregateFunction.class, af -> { | ||
| AggregateFunction canonical = (AggregateFunction) af.canonical(); | ||
| // canonical representation, with resolved aliases | ||
| AggregateFunction canonical = (AggregateFunction) af.canonical().transformUp(e -> aliases.resolve(e, e)); |
There was a problem hiding this comment.
Maybe we should use a helper function for this line to prevent this from being different from how we canonicalize agg functions above (line 91)?
| 891 | 1782 | 8 | ||
| ; | ||
|
|
||
| fixClassCastBug2 |
There was a problem hiding this comment.
Nit: It would be nice to avoid numbers in test cases.
Additional description could also hint what aspects were broken before:
- combining results of two aggregate functions
- nesting functions
- multiplying result by constant
- etc
| PUSHING_DOWN_EVAL_WITH_SCORE, | ||
|
|
||
| /** | ||
| * Fix for ClassCastException in STATS |
There was a problem hiding this comment.
| * Fix for ClassCastException in STATS | |
| * Fix for ClassCastException in STATS | |
| * https://github.com/elastic/elasticsearch/issues/133992 |
I realize it is a bit tricky to describe the change with java doc.
It might be worth linking an issue as it has a bit detailed description.
There are some prior examples with such links.
@alex-spies, I've now included another planning phase after the constant folding that should take care of cases like these that @astefan suggested.
Philosophically I don't think the design of the compute engine part is correct at the moment. We try to make optimizations at runtime to avoid computing duplicated things and that breaks in case the plan is not optimal because we end up accessing wrong positions in our buffers. For many (if not all) of the tests I added the plans were correct (but not optimal), and we are throwing at runtime. I've been in touch with @dnhatn about this part and he's helping me solve it. |
| * becomes | ||
| * stats a = min(x), c = count(*) by g | eval b = a, d = c | keep a, b, c, d, g | ||
| */ | ||
| public final class ReplaceDuplicatedAggs extends OptimizerRules.OptimizerRule<Aggregate> implements OptimizerRules.CoordinatorOnly { |
There was a problem hiding this comment.
Ignore the duplicate code in this file with respect to ReplaceAggregateAggExpressionWithEval.java, I'll try to share as much code as possible once I've checked this passes all tests
BASE=3017e334274a7292997b0fea77f90d2c73b58eba HEAD=f1fd40ef2107a7fcf162a3baa2a9484c03cd5546 Branch=main
BASE=3017e334274a7292997b0fea77f90d2c73b58eba HEAD=f1fd40ef2107a7fcf162a3baa2a9484c03cd5546 Branch=main
BASE=3017e334274a7292997b0fea77f90d2c73b58eba HEAD=71c1691e34614a43230da1905611f4fa9dfeec01 Branch=main
alex-spies
left a comment
There was a problem hiding this comment.
Thanks a lot for iterating on this @ncordon ! This has some subtilities.
| new PropagateEvalFoldables(), | ||
| new ConstantFolding(), | ||
| // then extract nested aggs top-level | ||
| new DeduplicateAggs(), |
There was a problem hiding this comment.
Interesting!
Is the constant folding required to run before? Will this take care of cases that have percentile( foo, 25+25) and percentile(foo,2*25)?
I'd have thought that maybe the aggs substitution needs to happen before we replace agg expressions with evals. But this is placing deduplication pretty late into the optimization.
It's not wrong, though! But maybe a bit unfortunate that we'll have 2 rules that dedupe; and we can't assume that aggs come out of the substitutions batch already deduped. But maybe that was never correct to assume.
There was a problem hiding this comment.
Constant folding in aggregation is an endemic problem and tech debt. Some insights here #112392 (comment).
I don't think we could realistically do the right thing now with constant folding in aggs (due to priorities), and there are already optimization stuff we duplicate because of this combination of surrogate expressions in aggs + constant folding issues - see
new SubstituteSurrogateAggregations(),
new ReplaceAggregateNestedExpressionWithEval()
being called twice in Substitutions batch. Having deduplication called twice for aggs only because agg constant folding doesn't happen correctly and at the right time is, in my mind, the right compromise at this point given how mysterious the errors are, how likely users are to run into them and, also, is not a completely wrong thing to do.
| private static AggregateFunction getCannonical(AggregateFunction af, AttributeMap<Expression> aliases) { | ||
| return (AggregateFunction) af.canonical().transformUp(e -> aliases.resolve(e, e)); | ||
| } | ||
|
|
There was a problem hiding this comment.
I've extracted this as @alex-spies asked me to do
| } else { | ||
| newAggs.add(agg); | ||
| newProjections.add(agg.toAttribute()); |
There was a problem hiding this comment.
This block is doing nothing really because it only gets triggered if replaceNestedExpressions is false and it does not seem to be reached in that case ever, at least for the tests in our codebase at the moment.
But I'd rather have this safeguard (just leaving that aggregation as it is) than risking something important to disappear in the plan.
Now it should be better. I've added a boolean to the original class instead to gate the part of the code that treats nested aggs. |
astefan
left a comment
There was a problem hiding this comment.
LGTM with a small comment. Thank you!
| */ | ||
| public final class ReplaceAggregateAggExpressionWithEval extends OptimizerRules.OptimizerRule<Aggregate> { | ||
| public class ReplaceAggregateAggExpressionWithEval extends OptimizerRules.OptimizerRule<Aggregate> { | ||
| private boolean replaceNestedExpressions = true; |
There was a problem hiding this comment.
final and set it always in the constructor.
b78574e to
fdde58c
Compare
fdde58c to
c483a52
Compare
Addresses #133992 and #136598, partially.
Missing from this pr that we still need to do: at the moment the runtime part tries to avoid double computations, resulting in exceptions if the plan is correct but not optimal. In other words, queries like:
should had never failed at runtime even if the plan was not optimal for repeated aggregations.