Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 1, 2020

When the terms agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is very common. But I
broke it in #57241. This fixes that optimization and adds debug
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.

We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there is a
filter.

Closes #57407

When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in elastic#57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.

We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in elastic#57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.

Closes elastic#57407
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 1, 2020
@nik9000
Copy link
Member Author

nik9000 commented Jun 1, 2020

I've labeled this non-issue because it fixes a unreleased performance regression. Thanks for finding it @dliappis !

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If at some point you'd like to make a flow chart of what choices terms makes to decide on its leaf collector implementation, I'd find that helpful.

@nik9000
Copy link
Member Author

nik9000 commented Jun 2, 2020

LGTM. If at some point you'd like to make a flow chart of what choices terms makes to decide on its leaf collector implementation, I'd find that helpful.

Good call!

@nik9000 nik9000 merged commit b072f5f into elastic:master Jun 2, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 2, 2020
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in elastic#57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.

We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in elastic#57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.

Closes elastic#57407
nik9000 added a commit that referenced this pull request Jun 2, 2020
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.

We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.

Closes #57407
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 2, 2020
nik9000 added a commit that referenced this pull request Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in aggregation latency after #57241

4 participants