Skip to content

Add t-digest#12961

Merged
rongrong merged 4 commits intoprestodb:masterfrom
gastonar:t_digest
Jul 20, 2019
Merged

Add t-digest#12961
rongrong merged 4 commits intoprestodb:masterfrom
gastonar:t_digest

Conversation

@gastonar
Copy link
Contributor

@gastonar gastonar commented Jun 17, 2019

Add t-digest to Presto, which allows faster and more accurate results when calculating quantiles while saving space in memory. In reference to issue #12929.

Note

We make intentional decision to just copy the existing implementation for now. See #12961 (review) and #12961 (comment)

@gastonar gastonar changed the title Add t-digest [WIP]: Add t-digest Jun 17, 2019
@gastonar gastonar mentioned this pull request Jun 17, 2019
@gastonar gastonar force-pushed the t_digest branch 2 times, most recently from 56a6498 to 2fe3004 Compare June 18, 2019 19:15
@gastonar gastonar changed the title [WIP]: Add t-digest Add t-digest Jun 18, 2019
@gastonar gastonar requested a review from tdcmeehan June 18, 2019 19:18
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I would also move these classes into their own package for now, com.facebook.presto.operator.scalar.tdigest

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think changing variable names are necessary (yes, longer names are more readable, but for algorithms like this readable names is not that critical to understanding it anyways. I feel safer to see they are not changed from the original. Just personal opinion.), also it's not clear to me why a lot of comments are removed. Reducing these would make reviewing easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to see hashCode and equals not based on the same variables. What's the reason behind this?

Copy link
Contributor Author

@gastonar gastonar Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each instance of a centroid has a unique id, even if the centroid contains the same values. Therefore, if we compare by id, it would always return false.

@gastonar gastonar force-pushed the t_digest branch 3 times, most recently from 2cfea70 to 078308b Compare June 19, 2019 23:19
@jessesleeping jessesleeping self-requested a review June 20, 2019 02:04
@gastonar gastonar force-pushed the t_digest branch 5 times, most recently from 5c3169a to 532fb28 Compare June 21, 2019 19:36
Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add implementation of T-Digest"

Maybe also add commit message like "This commit copied the original code as it is intentionally. Refactored will be done in future commits"

This commit copied the original code as it is intentionally. Refactoring will be done in future commits.
@gastonar gastonar force-pushed the t_digest branch 4 times, most recently from b44c56d to 3aabb13 Compare July 12, 2019 01:39
@tdcmeehan
Copy link
Contributor

There's some copy paste in the graffle metadata. Other than that I think this is good to merge @rongrong

@gastonar gastonar force-pushed the t_digest branch 4 times, most recently from 1f33404 to 1f03ac6 Compare July 16, 2019 08:27
@wenleix wenleix assigned tdcmeehan, wenleix and rongrong and unassigned wenleix Jul 16, 2019
@gastonar gastonar force-pushed the t_digest branch 2 times, most recently from a81e903 to f239e1d Compare July 18, 2019 21:27
Remove unnecessary functions for Presto t-digest needs.
@rongrong rongrong merged commit 446416e into prestodb:master Jul 20, 2019
amellnik added a commit to amellnik/presto that referenced this pull request Nov 30, 2021
This adds documentation for the t-digest type (prestodb#15674), similar to
what already exists for the QDigest type. The t-digest library was
added in prestodb#12961 and functions to use it were added in prestodb#13940.
highker pushed a commit that referenced this pull request Dec 16, 2021
This adds documentation for the t-digest type (#15674), similar to
what already exists for the QDigest type. The t-digest library was
added in #12961 and functions to use it were added in #13940.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants