-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible data corruption in "Skipping partial aggregation" change #11850
Comments
Thanks @andygrove -- if you can get a repro we'll get it fixed asap One bug I know that was present that we have fixed is #11833 / 1c98e6e -- not sure if you tried with that one |
The query is
The last aggregation with @andygrove is there a chance that Comet may use datafusion UPD: I've updated "user-facing changes" section in #11627, and, I suppose, this PR also needs an |
This query works fine in DataFusion, but not in Comet. Comet does use both Partial and Final aggregates. I am working on debugging this to better understand where this is going wrong. Perhaps there could be an option to disable this functionality? |
Yes you can set the Config setting: https://datafusion.apache.org/user-guide/configs.html BTW (🎣 for reviews) I have some PRs that make it easier to see what is going on: |
@andygrove if you have additional details about how spark handles partial aggregates, I would love to read about them |
@alamb Sure, here is one of the query stages after we have translated it to a DataFusion plan. Note that we are performing a join on the output of two partial aggregates Perhaps we'll need to start thinking about having a physical optimizer phase in Comet so that we can leverage the "skip partial aggregates" feature in some cases.
|
Also worth noting that in this case, the partial aggregates in the join input have no aggregate expressions, just the group by. The output of the join goes through a partial/final aggregate pair, and we could potentially benefit from the "skip partial aggregate" feature in this case. |
This is starting to feel like a bug again. I will try and create a repro today based on this join example. |
I cannot repro in DataFusion via SQL because DataFusion creates a very different plan, so I will close this issue and explore implementing optimizer rules in Comet to handle this (but in the short term we will just set the threshold high to disable the feature) |
Looking at the plans in #11850 (comment) I wonder if it could be because the top AggregateExec is trying to take advantage of sorted outputs but when aggregate goes into "skip partial agg mode" it no longer produces a sorted input stream or something 🤔 |
Describe the bug
When we upgrade DataFusion Comet to use a version of DataFusion that inclues #11627, we see incorrect results for an aggregation in TPC-DS q97.
To Reproduce
Start with apache/datafusion-comet#783 and then upgrade DataFusion to include #11627. I do not have a repro for DataFusion yet.
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: