Skip to content

Document TDigest functions and type#16911

Merged
highker merged 1 commit intoprestodb:masterfrom
amellnik:tdigest-docs
Dec 16, 2021
Merged

Document TDigest functions and type#16911
highker merged 1 commit intoprestodb:masterfrom
amellnik:tdigest-docs

Conversation

@amellnik
Copy link
Contributor

@amellnik amellnik commented Oct 26, 2021

This adds documentation for the t-digest type (#15674), similar to what
already exists for the QDigest type. The t-digest library was added in
#12961 and functions to use it were added in #13940.

Test plan - Documentation changes only, manual inspection of output.

Open questions:

== NO RELEASE NOTE ==

@amellnik amellnik force-pushed the tdigest-docs branch 2 times, most recently from a80cfa0 to 54795dc Compare October 27, 2021 21:47
@dborkar dborkar requested a review from aweisberg October 28, 2021 15:45
@dborkar
Copy link

dborkar commented Oct 28, 2021

Thx for the contribution @amellnik
@aweisberg @agrawalreetika can you take a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rephrase this like this, to keep it consistent with other existing types details -
A t-digest is similar to :ref:qdigest_type, but it uses a different algorithm <https://arxiv.org/abs/1902.04023>_ to represent the approximate distribution of a set of numbers. See :doc:/functions/tdigest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it wouldn't hurt in the t-digest and q-digest to give a one sentence explanation of the pros/cons of each so they don't need to click through, but also it is fine if they have to click through.

Also these types don't link to the functions that work with these types? Seems like a lost opportunity to simplify discovery of important information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as per suggestions. My takeaway from the linked paper as well as some testing suggests that T-Digest offers better performance in general. The main limitation is that it only supports DOUBLE at the moment.

@aweisberg
Copy link
Contributor

If you rebase it will be possible to build these docs. There was a fix pinning a dependency. The checks passed when you ran them, but they won't pass next time.

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made suggestions for some improvements to help people understand the difference between t-digests and q-digests and when to use them as well as making it easier to go from types to the functions that operate on those types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for T-digest doesn't really need to say in addition because these documents aren't ordered per se.

It is a good idea in this introduction of T-Digest and Q-Digest to explain what each one is and why you would pick one over the other. I for one don't know the answer and I think this section should answer it.

The Q-Digest section should be updated to have the mirror so no matter which section you open you understand what your options are in Presto and why you would pick one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it wouldn't hurt in the t-digest and q-digest to give a one sentence explanation of the pros/cons of each so they don't need to click through, but also it is fine if they have to click through.

Also these types don't link to the functions that work with these types? Seems like a lost opportunity to simplify discovery of important information.

@amellnik amellnik force-pushed the tdigest-docs branch 4 times, most recently from 86b12d6 to 5b3e9bf Compare October 31, 2021 20:02
@amellnik
Copy link
Contributor Author

Also rebased on upstream master, some automated tests are failing but they definitely appear unrelated.

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent. Please fix the two typos and then I will approve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implementation of quantile digests supports more numberic types.
implementation of quantile digests supports more numeric types.

@aweisberg aweisberg requested a review from highker November 3, 2021 15:33
@aweisberg
Copy link
Contributor

@highker can you merge this?

@highker highker requested a review from tdcmeehan November 4, 2021 18:55
@highker
Copy link

highker commented Nov 4, 2021

@tdcmeehan wanna take a look? It's tdigest related

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Just some minor changes requested. 😄

@amellnik
Copy link
Contributor Author

@tdcmeehan Just following up, let me know if there's anything else needed here. Thanks!

@tdcmeehan
Copy link
Contributor

@amellnik commit message looks incomplete?

This adds documentation for the t-digest type (#15674), similar to what
already exists for the QDigest type. The t-digest library was added in

@tdcmeehan tdcmeehan self-assigned this Nov 24, 2021
This adds documentation for the t-digest type (prestodb#15674), similar to
what already exists for the QDigest type. The t-digest library was
added in prestodb#12961 and functions to use it were added in prestodb#13940.
@amellnik amellnik changed the title WIP: Document TDigest functions and type Document TDigest functions and type Nov 30, 2021
@amellnik
Copy link
Contributor Author

@tdcmeehan Not sure what happened there, but it's now fixed!

@amellnik
Copy link
Contributor Author

amellnik commented Dec 8, 2021

@highker Any feedback or is this good to go?

@highker highker merged commit 9c8eee9 into prestodb:master Dec 16, 2021
.. function:: tdigest_agg(x, w, accuracy) -> tdigest<double>

Returns the ``tdigest`` which is composed of all input values of ``x`` using
the per-item weight ``w`` and maximum error of ``accuracy``. ``accuracy``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation, it seems this description of the accuracy parameter is in correct. This appears to be the "compression factor" parameter instead. CC: @tdcmeehan @aweisberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants