Skip to content

Conversation

@imotov
Copy link
Contributor

@imotov imotov commented Jul 7, 2020

Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.

Closes #50109

Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.

Closes elastic#50109
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 7, 2020
@imotov imotov requested a review from not-napoleon July 7, 2020 17:32
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thanks for fielding it. I know some users will be really happy to see this land :)

I left a couple of nits, but nothing major.

offset = in.readLong();
extendedBounds = in.readOptionalWriteable(ExtendedBounds::new);
extendedBounds = in.readOptionalWriteable(LongBounds::new);
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll change this after the backport, I presume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just until backport.

/** Set hard bounds on this histogram, specifying boundaries outside which buckets cannot be created. */
public DateHistogramAggregationBuilder hardBounds(LongBounds hardBounds) {
if (hardBounds == null) {
throw new IllegalArgumentException("[hardBounds] must not be null: [" + name + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not allow setting the bounds to null? It's not a required parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with all other optional parameters in this builder like order or externed_bounds.


import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg;

public class DoubleBounds implements ToXContentFragment, Writeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that there's no common interface between DoubleBound and LongBound because we'd have to autobox in the collection loop? If so, maybe let's just leave a comment to that effect and maybe link the two in Javadoc, so there's at least something indicating there are two implementations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. Where do you expect them to intersect?

private double offset = 0;
private double minBound = Double.POSITIVE_INFINITY;
private double maxBound = Double.NEGATIVE_INFINITY;
private DoubleBounds hardBounds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we convert minBound and maxBound into a DoubleBounds instance? if not, should probably just leave a comment that it's done this way for legacy reasons, not out of any specific need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to refactor minBounds and maxBounds in a follow up PR. I just didn't feel like making this one more complicated than necessary. I will add TODO comment.

return max;
}

public boolean contain(long time) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time seems like a confusing name for this parameter. Maybe just value like the double version uses?

);
}

public void testOverlappingBounds() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a test for hard bounds exactly equal to extended bounds. That's a bit of an edge case, and we want to make sure it doesn't break down the line.

@imotov imotov merged commit 0af410a into elastic:master Jul 13, 2020
@imotov
Copy link
Contributor Author

imotov commented Jul 14, 2020

This PR requires a bit tricky backport, so we will let it slip into 7.10 in order to not disrupt feature freeze today.

@imotov imotov added the v7.10.0 label Jul 15, 2020
imotov added a commit to imotov/elasticsearch that referenced this pull request Jul 15, 2020
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
imotov added a commit that referenced this pull request Jul 16, 2020
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
@imotov imotov deleted the issue-50109-add-restrictive-bounds branch July 17, 2020 17:52
imotov added a commit that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
imotov added a commit to imotov/elasticsearch that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for elastic#59175
imotov added a commit that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
@consulthys
Copy link
Contributor

Thanks for fixing this @imotov 👍

@ClaudioConsolmagno
Copy link

Should the renaming of ExtendedBounds to LongBounds have appeared in the 7.10 release notes or the migration guide?

Was upgrading elastic and dependencies today and came across this and was looking to see if there was any impact in simply renaming ExtendedBounds to LongBounds in my code so looked first in the notes and couldn't see anything. Had to search the repo until I tracked it down to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.10.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

date_histogram of date_range with null end points triggers a circuit breaker

6 participants