Documented noisy aggregate functions in new page#22715
Documented noisy aggregate functions in new page#22715jonhehir merged 1 commit intoprestodb:masterfrom
Conversation
| Noisy Aggregate Functions | ||
| ========================= | ||
|
|
||
| Noisy aggregate functions are aggregate functions that provide random, noisy |
There was a problem hiding this comment.
are aggregate functions --> are functions
| mechanisms, it is important to note that neither the values returned by | ||
| these functions nor the query results that incorporate these functions | ||
| are differentially private in general. See `Limitations`_ below for more | ||
| details. Users who wish to support a strong privacy guarantee should |
There was a problem hiding this comment.
I'd delete " Users who wish to support a strong privacy guarantee should
discuss with a suitable technical expert first."
There was a problem hiding this comment.
I'm going to vote to leave this sentence, as we want to be abundantly clear about both the limitations and their practical implications.
|
|
||
| .. note:: | ||
|
|
||
| Unlike :func:`count`, this function will return ``NULL`` when the (true) count of ``x`` is 0. |
|
|
||
| .. function:: noisy_count_gaussian(x, noise_scale[, random_seed]) -> bigint | ||
|
|
||
| Counts the non-``NULL`` values in ``x`` and then adds random Gaussian noise with 0 |
There was a problem hiding this comment.
What's x? (a column I assume)
There was a problem hiding this comment.
Indeed, I've renamed all these from x to col, which is hopefully more clear.
|
|
||
| Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT x, ...)``, or with | ||
| :func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will | ||
| return more accurate results but at a larger computational cost. |
| .. function:: noisy_avg_gaussian(x, noise_scale[, random_seed]) -> double | ||
| :noindex: | ||
|
|
||
| Calculates the average (arithmetic mean) of all the input values in ``x`` and then adds |
There was a problem hiding this comment.
"adds a normally distributed random double value with 0 mean..."
|
|
||
| .. note:: | ||
|
|
||
| Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty. |
| .. note:: | ||
|
|
||
| Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty. | ||
| If this behavior is undesirable, :func:`coalesce` with :func:`noisy_empty_approx_set_sfm`. |
|
|
||
| .. note:: | ||
|
|
||
| Unlike :func:`approx_distinct`, this function will return ``NULL`` when ``x`` is empty. |
| Libraries and How to Fix It. In *Proceedings of the 2022 ACM SIGSAC Conference | ||
| on Computer and Communications Security* (pp. 471-484). | ||
|
|
||
| .. [Hehir2023] Hehir, J., Ting, D., & Cormode, G. (2023). Sketch-Flip-Merge: |
There was a problem hiding this comment.
There was a problem hiding this comment.
I've gone ahead and linked all the references.
steveburnett
left a comment
There was a problem hiding this comment.
Nice work, thanks! I had a suggestion I couldn't make inline so I'll mention it here.
Similar to the table of contents that I suggest adding to noisy.rst, aggregate.rst could benefit from a table of contents even more due to its length. Could you add a table of contents to aggregate.rst, the same as I added a comment in noisy.rst about?

If you do so, you could add a heading to aggregate.rst for Noisy Aggregate Functions, and underneath that header link to the new page with "See :doc:noisy. " underneath it, to help the reader know that there are aggregate functions that are not on the Aggregate Functions page.
Screenshot of aggregate.rst in local build with screenshot.
Screenshot of local build for consideration.
| Unlike :func:`count`, this function returns ``NULL`` when the (true) count of ``col`` is 0. | ||
|
|
||
| Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT col, ...)``, or with | ||
| :func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will |
There was a problem hiding this comment.
| :func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will | |
| :func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` |
|
Thank you, @steveburnett! I appreciate the edits and reStructuredText pointers. I think I've hit your full wish list. Let me know if all looks good. 😄 |
Moved documentation for noisy_count_gaussian and related functions from the Aggregate Functions page to a new Noisy Aggregate Functions page. Also added documentation of the SFM sketch type and functions, such as noisy_approx_set_sfm.
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local build of docs, verified changes. New review of the documents as a whole, everything looks good. Thanks!
Description
Moved documentation for
noisy_count_gaussianand related functions from the Aggregate Functions page to a new Noisy Aggregate Functions page. Also added documentation of the SFM sketch type and functions, such asnoisy_approx_set_sfm.Motivation and Context
This is the documentation promised back in #21290.
Impact
Docs only
Test Plan
Docs build and render nicely.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.