Fix regression by reverting Materialize dictionaries in group keys #8740

alamb · 2024-01-03T15:27:18Z

This closes #8738 by reverting #8291

~~I am not sure if this is the right fix -- I need to write some more tests / understand the problem more fully, but I wanted to get the PR up as I had the code.~~

To be clear, this change fixes a functional regression (a query that used to run no longer does), as descsribed in #8738

What I would like to do is to reopen #7647 and then we can work on reapply that change, ensuring the LogicalPlan and PhysicalPlan schemas match as well as maybe being more dilligent about performance testing

alamb · 2024-01-05T19:30:16Z

I believe we should proceed with this change, for the reasons explained in #7647 (comment)

…xpansion

tustvold · 2024-01-05T19:54:54Z

Do we have any empirical numbers to support this, recomputing dictionaries is extremely expensive and I would have thought it would outweigh any other overheads?

alamb · 2024-01-05T20:04:38Z

Do we have any empirical numbers to support this, recomputing dictionaries is extremely expensive and I would have thought it would outweigh any other overheads?

To be clear, the core rationale to revert this change is fix the functional regression (a query that used to run no longer does), as described in #8738

Once we have figure out how to avoid that functional regression with this change, we can also have a more reasonable discussion on performance. I will file a ticket to make some performance benchmarks

tustvold

I'm not a fan of this approach, but don't feel strongly

alamb · 2024-01-08T14:29:06Z

As I mentioned before, regressions in functionality I think take priority over (theorized) performance improvements which is what #8291 is

In retrospect, I should have insisted on some benchmarks showing improvements before I approved (and merged) #8291

To partly make up for this oversight, I plan to at least create said benchmarks so we can have a fact driven conversation rather than speculating

revert eb8aff7 / Materialize dictionaries in group keys

88679ea

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 3, 2024

This was referenced Jan 3, 2024

Regression "RowConverter column schema mismatch, expected Utf8 got Dictionary(Int32, Utf8)" after upgrade #8738

Closed

Materialize Dictionaries in Group Keys #7647

Open

alamb added 2 commits January 5, 2024 14:31

Merge remote-tracking branch 'apache/main' into alamb/revert_schema_e…

96356ce

…xpansion

Update tests

c180c5d

alamb changed the title ~~revert eb8aff7becaf5d4a44c723b29445deb958fbe3b4 / Materialize dictionaries in group keys~~ revert Materialize dictionaries in group keys Jan 5, 2024

alamb marked this pull request as ready for review January 5, 2024 19:36

Update tests

920d058

alamb changed the title ~~revert Materialize dictionaries in group keys~~ Fix regression by reverting Materialize dictionaries in group keys Jan 5, 2024

alamb requested a review from tustvold January 8, 2024 12:28

tustvold approved these changes Jan 8, 2024

View reviewed changes

alamb merged commit ff27d90 into apache:main Jan 8, 2024
25 checks passed

alamb deleted the alamb/revert_schema_expansion branch January 8, 2024 14:29

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression by reverting Materialize dictionaries in group keys #8740

Fix regression by reverting Materialize dictionaries in group keys #8740

alamb commented Jan 3, 2024 •

edited

Loading

alamb commented Jan 5, 2024

tustvold commented Jan 5, 2024

alamb commented Jan 5, 2024

tustvold left a comment

alamb commented Jan 8, 2024

Fix regression by reverting Materialize dictionaries in group keys #8740

Fix regression by reverting Materialize dictionaries in group keys #8740

Conversation

alamb commented Jan 3, 2024 • edited Loading

alamb commented Jan 5, 2024

tustvold commented Jan 5, 2024

alamb commented Jan 5, 2024

tustvold left a comment

Choose a reason for hiding this comment

alamb commented Jan 8, 2024

alamb commented Jan 3, 2024 •

edited

Loading