-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression by reverting Materialize dictionaries in group keys #8740
Conversation
I believe we should proceed with this change, for the reasons explained in #7647 (comment) |
Do we have any empirical numbers to support this, recomputing dictionaries is extremely expensive and I would have thought it would outweigh any other overheads? |
To be clear, the core rationale to revert this change is fix the functional regression (a query that used to run no longer does), as described in #8738 Once we have figure out how to avoid that functional regression with this change, we can also have a more reasonable discussion on performance. I will file a ticket to make some performance benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this approach, but don't feel strongly
As I mentioned before, regressions in functionality I think take priority over (theorized) performance improvements which is what #8291 is In retrospect, I should have insisted on some benchmarks showing improvements before I approved (and merged) #8291 To partly make up for this oversight, I plan to at least create said benchmarks so we can have a fact driven conversation rather than speculating |
This closes #8738 by reverting #8291
I am not sure if this is the right fix -- I need to write some more tests / understand the problem more fully, but I wanted to get the PR up as I had the code.To be clear, this change fixes a functional regression (a query that used to run no longer does), as descsribed in #8738
What I would like to do is to reopen #7647 and then we can work on reapply that change, ensuring the LogicalPlan and PhysicalPlan schemas match as well as maybe being more dilligent about performance testing