Skip to content

Prefilter GROUPBY LIMIT queries#19051

Merged
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:prefilter-for-groupby-limit
Feb 17, 2023
Merged

Prefilter GROUPBY LIMIT queries#19051
kaikalur merged 1 commit intoprestodb:masterfrom
kaikalur:prefilter-for-groupby-limit

Conversation

@kaikalur
Copy link
Contributor

@kaikalur kaikalur commented Feb 14, 2023

We have seen users running ad-hoc queries like:

SELECT SUM(x), userid FROM Table GROUP BY userid LIMIT 1000

if the Table has a large number of distinct userid, these queries often OOM or timeout. So we added a new optimization that first tries to scan the table looking for LIMIT number of distinct keys (in a short time like 10s, configurable) and if it finds them, makes a map of those and filters out the original table using that. Roughly:

SELECT SUM(x) , userid FROM Table
CROSS JOIN (SELECT MAP_AGG(hash(userid)) m FROM (SELECT DISTINCT userid FROM Table LIMIT 1000)))
WHERE IF(CARDINALITY(m)=1000, m[hash(userid)], TRUE)

So the idea is if there are 1000 keys and if we find them quickly then those are returned for quick lookup (prefilter), if not there are no 1000 distinct keys or if we could not find them within the timeout limit, this filter becomes a no-op.

Test plan - AbstractTestQueries

== RELEASE NOTES ==

General Changes
* Added a new optimization for filtering large tables with LIMIT number of keys for queries that do simple GROUP BY LIMIT with no ORDER BY. Added a boolean session param: `prefilter_for_groupby_limit` that can enable this feature

@kaikalur kaikalur requested a review from a team as a code owner February 14, 2023 00:48
@jainxrohit jainxrohit self-requested a review February 14, 2023 00:56
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch 6 times, most recently from 5238561 to c7e4bb4 Compare February 14, 2023 18:07
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch from c7e4bb4 to 18db107 Compare February 14, 2023 19:20
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch from 18db107 to d1bba7e Compare February 14, 2023 23:10
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch 2 times, most recently from b9f40c5 to a5e7120 Compare February 15, 2023 00:37
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch 2 times, most recently from abceb2f to f604ef7 Compare February 15, 2023 19:34
@kaikalur kaikalur force-pushed the prefilter-for-groupby-limit branch from f604ef7 to 6d5aa90 Compare February 15, 2023 19:59
@jainxrohit
Copy link
Contributor

Can we change commit message to imperative style?

@kaikalur kaikalur changed the title Optimization for prefiltering group by limit keys Prefilter GROUPBY LIMIT queries Feb 15, 2023
@kaikalur
Copy link
Contributor Author

Can we change commit message to imperative style?

Done

@kaikalur kaikalur requested a review from pranjalssh February 16, 2023 22:00
@kaikalur kaikalur merged commit 94b161c into prestodb:master Feb 17, 2023
@wanglinsong wanglinsong mentioned this pull request Feb 25, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants