-
Notifications
You must be signed in to change notification settings - Fork 258
fix: Prevent mixed Spark/Comet partial/final aggregates #2872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@EmilyMatt fyi |
|
I still believe this is the wrong approach, but I do support having a fix rather than none^^ |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2872 +/- ##
============================================
+ Coverage 56.12% 60.78% +4.65%
- Complexity 976 1505 +529
============================================
Files 119 167 +48
Lines 11743 16150 +4407
Branches 2251 2807 +556
============================================
+ Hits 6591 9817 +3226
- Misses 4012 4993 +981
- Partials 1140 1340 +200 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Check if this operator has already been tagged with fallback reasons | ||
| if (hasExplainInfo(op)) { | ||
| return op | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is also a performance optimization in the case where the rule is applied to a plan multiple times (which does happen with AQE)
Understood. We at least have tests now that will be helpful if/when we want to implement optimizations to support partial Comet and Spark final. |
|
I am putting this on hold for now because I have a better understanding of some of the points in the original issue, and I have learned more from another issue that I have been working on. I will add notes on the issue. |
|
It might be related to #2870 |
@parthchandra @comphead @mbutrovich This is now ready for review. This PR is step 1 of TBD (probably 2-3). With this PR, Comet consistently falls back if either partial or final agg cannot be converted. We were already doing this in some cases, and this PR makes it more consistent and adds tests. Note that Comet is falling back even when it isn't necessary. This PR doesn't change that, so there will be a future PR to fall back only if there is actually an incompatibility between Comet/Spark for the intermediate aggregation buffer (such as with bloom filters and ANSI mode numeric aggregates currently), rather than always falling back. |
Which issue does this PR close?
Closes #1389
Rationale for this change
Prevent runtime errors. The issue is explained very well in #1389.
What changes are included in this PR?
Add new
tagUnsupportedPartialAggregatesmethod inCometExecRulethat will look for unsupported final aggregates and then tag the corresponding partial aggregate with a fallback reason to prevent it from being converted to Comet.The opposite scenario (supported partial, unsupported final) is already handled in
CometHashAggregateExec. In a future PR I will move that logic intoCometExecRuleas well.How are these changes tested?
New unit tests in
CometExecRuleSuite.There are tests for partial/final pair both in the same query stage and split across query stages.