Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented May 29, 2020

This merges the global-ordinals-based implementation for
significant_terms into the global-ordinals-based implementation of
terms, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit ResultStrategy with nice stuff like Javadoc.

The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of significant_terms.

This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.

The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 29, 2020
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Nice refactoring. Feels good to see these two coming more into sync.

BytesRef apply(long ord) throws IOException;
}

public GlobalOrdinalsStringTermsAggregator(String name, AggregatorFactories factories,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe run the formatter on this method declaration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

boolean showTermDocCountError,
Map<String, Object> metadata) throws IOException {
super(name, factories, context, parent, order, format, bucketCountThresholds, collectionMode, showTermDocCountError, metadata);
this.resultStrategy = resultStrategy.apply(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since the syntax backing this is somewhat rare in aggs, maybe toss a quick comment here that we need the reference to this in order to construct the closure? I figured it out, but did a double take when I first saw it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

boolean showTermDocCountError,
Map<String, Object> metadata) throws IOException {
super(name, factories, valuesSource, order, format, bucketCountThresholds, null,
super(name, factories, a -> a.new StandardTermsResults(), valuesSource, order, format, bucketCountThresholds, null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe update the javadoc to say this can only be used for StandardTerms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nik9000 nik9000 merged commit 10dc144 into elastic:master May 29, 2020
@nik9000
Copy link
Member Author

nik9000 commented May 29, 2020

Thanks @not-napoleon ! I've merged and will backport!

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 29, 2020
This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.

The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.
nik9000 added a commit that referenced this pull request May 30, 2020
This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.

The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.

Co-authored-by: Elastic Machine <[email protected]>
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 5, 2020
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in elastic#57361. This fixes that.

Closes elastic#57402
nik9000 added a commit that referenced this pull request Jun 8, 2020
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.

Closes #57402
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 8, 2020
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in elastic#57361. This fixes that.

Closes elastic#57402
nik9000 added a commit that referenced this pull request Jun 8, 2020
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.

Closes #57402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants