Skip to content

Add experimental noisyCardinality function#22087

Merged
pranjalssh merged 1 commit intoprestodb:masterfrom
mohandhar:approx-cardinality4
Mar 8, 2024
Merged

Add experimental noisyCardinality function#22087
pranjalssh merged 1 commit intoprestodb:masterfrom
mohandhar:approx-cardinality4

Conversation

@mohandhar
Copy link
Contributor

Description

Add noisyCardinality function for experimentation.

Motivation and Context

Unlike cardinality(HyperLogLog), which returns a deterministic approximate count, the noisy_cardinailty(HyperLogLog) function can be used to return a random noisy estimate of cardinality from a HyperLogLog sketch, similar to the random noisy estimates that would be returned from noisy_approx_distinct_sfm.

Impact

No impact to existing code since we are adding a new function.

Test Plan

  • Manually ran this query within the cli:
WITH digests AS (
    SELECT shipdate, APPROX_SET(partkey) AS hll
    FROM tpch.sf1.lineitem
    GROUP BY 1
)

-- This just returns the approximate COUNT(DISTINCT partkey) for each shipdate
SELECT shipdate, NOISY_CARDINALITY(hll, 10)
FROM digests
ORDER BY 1
LIMIT 10;
  • Added a unit test

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

Return a random noisy estimate of cardinality with new noisyCardinality(serializedHll, epsilon) function
@mohandhar mohandhar requested a review from a team as a code owner March 5, 2024 17:40
@mohandhar mohandhar requested a review from presto-oss March 5, 2024 17:40
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new function should probably have releases notes.

@jonhehir
Copy link
Contributor

jonhehir commented Mar 6, 2024

a new function should probably have releases notes.

Thanks! This PR is related to #21290. We're expecting to have documentation and release notes to provide in a follow-up PR soon.

@tdcmeehan
Copy link
Contributor

If it's marked experimental, it does not show up in SHOW FUNCTIONS, so we can wait to add documentation and release note when we move it off experimental. But let's create an issue to remind us to follow up.

@pranjalssh pranjalssh merged commit b2db272 into prestodb:master Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants