Document TDigest functions and type#16911
Conversation
a80cfa0 to
54795dc
Compare
|
Thx for the contribution @amellnik |
There was a problem hiding this comment.
I think we can rephrase this like this, to keep it consistent with other existing types details -
A t-digest is similar to :ref:qdigest_type, but it uses a different algorithm <https://arxiv.org/abs/1902.04023>_ to represent the approximate distribution of a set of numbers. See :doc:/functions/tdigest.
There was a problem hiding this comment.
I think it wouldn't hurt in the t-digest and q-digest to give a one sentence explanation of the pros/cons of each so they don't need to click through, but also it is fine if they have to click through.
Also these types don't link to the functions that work with these types? Seems like a lost opportunity to simplify discovery of important information.
There was a problem hiding this comment.
Updated as per suggestions. My takeaway from the linked paper as well as some testing suggests that T-Digest offers better performance in general. The main limitation is that it only supports DOUBLE at the moment.
|
If you rebase it will be possible to build these docs. There was a fix pinning a dependency. The checks passed when you ran them, but they won't pass next time. |
aweisberg
left a comment
There was a problem hiding this comment.
Made suggestions for some improvements to help people understand the difference between t-digests and q-digests and when to use them as well as making it easier to go from types to the functions that operate on those types.
There was a problem hiding this comment.
The documentation for T-digest doesn't really need to say in addition because these documents aren't ordered per se.
It is a good idea in this introduction of T-Digest and Q-Digest to explain what each one is and why you would pick one over the other. I for one don't know the answer and I think this section should answer it.
The Q-Digest section should be updated to have the mirror so no matter which section you open you understand what your options are in Presto and why you would pick one.
There was a problem hiding this comment.
I think it wouldn't hurt in the t-digest and q-digest to give a one sentence explanation of the pros/cons of each so they don't need to click through, but also it is fine if they have to click through.
Also these types don't link to the functions that work with these types? Seems like a lost opportunity to simplify discovery of important information.
86b12d6 to
5b3e9bf
Compare
|
Also rebased on upstream master, some automated tests are failing but they definitely appear unrelated. |
aweisberg
left a comment
There was a problem hiding this comment.
This is excellent. Please fix the two typos and then I will approve.
There was a problem hiding this comment.
| implementation of quantile digests supports more numberic types. | |
| implementation of quantile digests supports more numeric types. |
5b3e9bf to
e3274b5
Compare
|
@highker can you merge this? |
|
@tdcmeehan wanna take a look? It's tdigest related |
tdcmeehan
left a comment
There was a problem hiding this comment.
Thanks for adding this! Just some minor changes requested. 😄
e3274b5 to
4aac4c6
Compare
|
@tdcmeehan Just following up, let me know if there's anything else needed here. Thanks! |
This adds documentation for the t-digest type (prestodb#15674), similar to what already exists for the QDigest type. The t-digest library was added in prestodb#12961 and functions to use it were added in prestodb#13940.
4aac4c6 to
a0ede3c
Compare
|
@tdcmeehan Not sure what happened there, but it's now fixed! |
|
@highker Any feedback or is this good to go? |
| .. function:: tdigest_agg(x, w, accuracy) -> tdigest<double> | ||
|
|
||
| Returns the ``tdigest`` which is composed of all input values of ``x`` using | ||
| the per-item weight ``w`` and maximum error of ``accuracy``. ``accuracy`` |
There was a problem hiding this comment.
Looking at the implementation, it seems this description of the accuracy parameter is in correct. This appears to be the "compression factor" parameter instead. CC: @tdcmeehan @aweisberg
This adds documentation for the t-digest type (#15674), similar to what
already exists for the QDigest type. The t-digest library was added in
#12961 and functions to use it were added in #13940.
Test plan - Documentation changes only, manual inspection of output.
Open questions: