Skip to content

Conversation

@drusso
Copy link
Contributor

@drusso drusso commented Nov 6, 2020

ARROW-10510

This change adds benchmarks for COUNT(DISTINCT) queries. This is a small follow-up to ARROW-10043 / #8222. In that PR, a number of implementation ideas were discussed for follow-ups, and having benchmarks will help evaluate them.


There are two benchmarks added:

  • wide: all of the values are distinct; this is looking at worst-case performance
  • narrow: only a handful of distinct values; this is closer to best-case performance

The wide benchmark runs ~ 7x slower than the narrow benchmark.

@github-actions
Copy link

github-actions bot commented Nov 6, 2020

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants