Upgrade t-digest to 3.2 #28305

liketic · 2018-01-19T03:18:53Z

Upgrade t-digest to 3.2. Some unit tests are failed seems like the percentile algorithm is different as the current version.

Closes #28295

/cc @colings86

elasticmachine · 2018-01-19T03:18:54Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

elasticmachine · 2018-01-19T03:18:54Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

colings86

It would be good to understand why some of these values are different now, especially as for the one on line 100 of the tests the change in the values is relatively big. I think its ok for the values to change if they are more correct but we should be able to explain why the values needed to change.

liketic · 2018-01-24T08:10:48Z

Thanks @colings86 ! I'm not a expert on t-digest, please correct me if I'm wrong. I made a test for both t-digest 3.2 and 3.1, I found the big change is introduced in 3.2:

    private static void test32() {
        AVLTreeDigest digest = new AVLTreeDigest(100);
        digest.add(8);
        digest.add(5);
        digest.add(3);
        digest.add(2);
        digest.add(1);
        digest.add(1);
        digest.add(0);
        System.out.println(digest.quantile(0.86));
        System.out.println(digest.quantile(0.75));
        System.out.println(digest.quantile(0.25));
        System.out.println(digest.quantile(0.01));
    }

The output is:

6.559999999999999
4.5
1.0
0.0

For 3.1, the output:

5.48
4.0
1.0
0.06

Then I found another implementation https://github.com/CamDavidsonPilon/tdigest and made a test:

from tdigest import TDigest

digest = TDigest()
x = [8, 5, 3, 2, 1, 1, 0]

for t in x:
    digest.update(t)

print(digest.percentile(86))
print(digest.percentile(75))
print(digest.percentile(25))
print(digest.percentile(1))

The output is:

6.56
4.5
0.833333333333
-0.286666666667

So I trend to think it's more accurate than before.

colings86 · 2018-02-02T10:03:52Z

@liketic I had a look into the differences between t-digest 3.0 and t-digest 3.2 specifically around the AVLTreeDigest that we use in Elasticsearch. The key difference that is causing the tests here to need changing seems to be a different approach to interpolating the value when the requested quantile is between two centroids.

As an example, if we are looking for the 81.25th percentile and we have 8 values the percentile is located at the 6.5th value. Let say that we have a centroid that contains the 6th value with a centroid value of 4 and a centroid containing the 7th value with centroid value of 9.

In 3.0 the interpolation was linear between the two centroids. So it will return the value that is half way between 5 and 8, since the value we want is half way between the 6th and 7th value. So the returned value is 8.5

In 3.2 the interpolation is a weighted average between the two value which takes into account the count of values in the 6th and 7th centroids. So if the 7th centroid has more values than the 6th centroid, the interpolation is biased more towards the 7th centroid.

I think the weighted interpolation is a better approach so I am happy that the change is a positive one. Would you mind updating your branch with the latest master and then I can set of another CI build to make sure we are still green before I merge this?

Thanks for working on this and also for your patience waiting for my response (sorry for the delay)

liketic · 2018-02-05T08:56:44Z

Wow! @colings86 Thank you so much for your professional investigation! I rebased the pull request, please trigger jenkins.

colings86 · 2018-02-05T09:05:52Z

@liketic no problem, I'll trigger a build now

colings86 · 2018-02-05T09:05:59Z

jenkins please test this

colings86 · 2018-02-05T13:19:10Z

jenkins test this please

colings86 · 2018-02-06T08:26:17Z

@liketic it looks like there are still test failures in TDigestPercentileRanksIT are you able to take a look into them?

colings86 · 2018-02-13T08:56:24Z

jenkins test this please

colings86 · 2018-02-13T15:35:45Z

Jenkins retest this please

colings86 · 2018-02-14T08:34:51Z

@liketic it looks like there are some test failures left in the documentation tests:

- org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT.test {yaml=reference/aggregations/metrics/percentile-aggregation/line_28}
- org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT.test {yaml=reference/aggregations/metrics/percentile-aggregation/line_103}

I think the expected outputs in the percentiles aggregation page in the docs just need adjusting. Are you able to make those changes?

liketic · 2018-02-14T12:47:02Z

@colings86 That's my mistake. I pushed 6f65266f3d9bd6b67e42201ef8b4f518f2612410 to fix the tests. Please trigger jenkins again.

colings86 · 2018-02-14T12:50:53Z

jenkins please retest this

liketic · 2018-02-14T13:57:12Z

@colings86 The testing is still failed, I'll rebase the PR, WDYT?

colings86 · 2018-02-14T13:58:34Z

@liketic yes I think rebasing the PR would be good to try here as I think the failure is related to some changes that were made overnight. Thanks

colings86 · 2018-02-14T14:29:05Z

jenkins test this please

colings86 · 2018-02-15T15:03:57Z

Merged to master and backported to 6.x. Thanks for working on this @liketic and for sticking with it through all the build failures 😄

* master: Add a note to the docs that _cat api `help` option cannot be used if an optional url param is used (elastic#28686) Lift error finding utility to exceptions helpers Change "tweet" type to "_doc" (elastic#28690) [Docs] Add missing word in nested.asciidoc (elastic#28507) Simplify the Translog constructor by always expecting an existing translog (elastic#28676) Upgrade t-digest to 3.2 (elastic#28295) (elastic#28305) Add comment explaining lazy declared versions

…on on Percentiles aggregations (#1531) I've updated the: - docker-compose to a newer version (3.4) - updated to ElasticSearch 6.4.x (available on [HERE](https://www.docker.elastic.co/)) - Updated Percentiles Aggregation Tests as the T-Digest algorithm has been updated and so all the tests failed. Have a look at [the PR on ES](elastic/elasticsearch#28305)

liketic changed the title ~~Upgrade t digest~~ Upgrade t-digest Jan 19, 2018

liketic changed the title ~~Upgrade t-digest~~ Upgrade t-digest to 3.2 Jan 19, 2018

colings86 reviewed Jan 19, 2018

View reviewed changes

liketic force-pushed the upgrade-t-digest branch from 7309ac5 to d1890b5 Compare February 5, 2018 08:54

colings86 mentioned this pull request Feb 13, 2018

Wrong percentile_ranks calculation #28536

Closed

Upgrade t-digest to 3.2 (elastic#28295)

a307bdd

liketic force-pushed the upgrade-t-digest branch from 6f65266 to a307bdd Compare February 14, 2018 14:05

colings86 added >enhancement :Analytics/Aggregations Aggregations v7.0.0 v6.3.0 labels Feb 15, 2018

colings86 merged commit fc406c9 into elastic:master Feb 15, 2018

colings86 added the backport pending label Feb 15, 2018

liketic deleted the upgrade-t-digest branch February 15, 2018 09:59

colings86 pushed a commit that referenced this pull request Feb 15, 2018

Upgrade t-digest to 3.2 (#28295) (#28305)

d58162c

colings86 removed the backport pending label Feb 15, 2018

This was referenced Mar 22, 2018

Percentile ranks metric bug? #10216

Closed

Wrong Percentile Rank calculation #14851

Closed

mfussenegger mentioned this pull request Oct 24, 2018

Update t-digest to 3.2 crate/crate#7804

Merged

4 tasks

p365labs mentioned this pull request Nov 5, 2018

update docker-compose to 3.4, update ES version & bug on new ES versi… ruflin/Elastica#1531

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

dependabot-preview bot mentioned this pull request Aug 22, 2019

[Do not merge] Bump ruflin/elastica from 5.3.5 to 6.1.1 owncloud/search_elastic#137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade t-digest to 3.2 #28305

Upgrade t-digest to 3.2 #28305

liketic commented Jan 19, 2018

elasticmachine commented Jan 19, 2018

elasticmachine commented Jan 19, 2018

colings86 left a comment

liketic commented Jan 24, 2018 •

edited

Loading

colings86 commented Feb 2, 2018

liketic commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 6, 2018

colings86 commented Feb 13, 2018

colings86 commented Feb 13, 2018

colings86 commented Feb 14, 2018

liketic commented Feb 14, 2018

colings86 commented Feb 14, 2018

liketic commented Feb 14, 2018

colings86 commented Feb 14, 2018

colings86 commented Feb 14, 2018

colings86 commented Feb 15, 2018

Upgrade t-digest to 3.2 #28305

Upgrade t-digest to 3.2 #28305

Conversation

liketic commented Jan 19, 2018

elasticmachine commented Jan 19, 2018

elasticmachine commented Jan 19, 2018

colings86 left a comment

Choose a reason for hiding this comment

liketic commented Jan 24, 2018 • edited Loading

colings86 commented Feb 2, 2018

liketic commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 5, 2018

colings86 commented Feb 6, 2018

colings86 commented Feb 13, 2018

colings86 commented Feb 13, 2018

colings86 commented Feb 14, 2018

liketic commented Feb 14, 2018

colings86 commented Feb 14, 2018

liketic commented Feb 14, 2018

colings86 commented Feb 14, 2018

colings86 commented Feb 14, 2018

colings86 commented Feb 15, 2018

liketic commented Jan 24, 2018 •

edited

Loading