Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick clickbench with a smaller dataset #12455

Open
Rachelint opened this issue Sep 13, 2024 · 2 comments
Open

Quick clickbench with a smaller dataset #12455

Rachelint opened this issue Sep 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@Rachelint
Copy link
Contributor

Rachelint commented Sep 13, 2024

Is your feature request related to a problem or challenge?

For measuring the performance improvement of #11827 , some extended queries with more complex udaf(like median, approx_median) + high cardinality group by are needed #12438 .

But I found, such queries can't run successfully to get the result in my local. After debugging, I found it is due to their large intermdiate results which will full memory rapidly, leading to swap or oom...

However, when I run it in a subset with only 15% of the whole clickbench dataset, they can finish successfully and reflect the improvement #11827 (comment)

I think maybe we need a clickbench with the smaller dataset (like tpch 1, tpch 10...) in some situations.

Describe the solution you'd like

Support to generate a samller dataset of the whole clickbench dataset, and we can run queries on it.

Describe alternatives you've considered

No response

Additional context

No response

@Rachelint Rachelint added the enhancement New feature or request label Sep 13, 2024
@alamb
Copy link
Contributor

alamb commented Sep 13, 2024

I would like to troll / 🐟 for improvements: instead of making the benchmark easier, let's spend our time reducing the size of the intermediate state for those aggregates :)

@Rachelint
Copy link
Contributor Author

I would like to troll / 🐟 for improvements: instead of making the benchmark easier, let's spend our time reducing the size of the intermediate state for those aggregates :)

🤔 Make sense, solving the real problem may be more valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants