Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better multi-column aggregation support with StringView #11794

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Related to #7000. Previous pr was automatically closed because the string-view2 branch is merged.

Rationale for this change

I get some time to implement the multi-column aggregation with StringView, the implementation is inspired by @jayzhan211 's #10976.

However, the performance is mixed, I see almost no performance diff with ClickBench query 18:

SELECT "UserID", extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM hits GROUP BY "UserID", m, "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10;

This is becuase the average size of SearchPhrase is only 4, meaning the string is always inlined to view and thus not being reused.

However, I made this query (q14, changed SearchPhrase to URL)

SELECT "SearchEngineID", "URL", COUNT(*) AS c FROM hits WHERE "URL" <> '' GROUP BY "SearchEngineID", "URL" ORDER BY c DESC LIMIT 10;

Without this patch, it takes 7.4s; with this patch, it takes 5.9s, a 20% improvement.

With this patch, the output string will reuse the string values in the buffer, so it is good.
However, when interning the values, we need to hash twice: (1) the string itself to build BytesViewMap (2) the group index - u32, which adds overhead.

What changes are included in this PR?

Things not implemented yet:

  • EmtiTo::First(n): I need to think a bit more about how to efficiently reuse the BytesViewBuilder

Are these changes tested?

Are there any user-facing changes?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant