-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Pq compression user metrics #18227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pq compression user metrics #18227
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
This pull request does not have a backport label. Could you fix it @yaauie? 🙏
|
🔍 Preview links for changed docs |
91da26c to
4d9a9dc
Compare
4d9a9dc to
e0f1215
Compare
| return new ImmutableRatio(combinedBytesIn, combinedBytesOut); | ||
| } catch (ArithmeticException e) { | ||
| // don't crash, just start over. | ||
| return new ImmutableRatio(bytesIn, bytesOut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative that would be less lossy would be to shift the stored number (dividing it by 2 or 4).
But at 100GiB/sec this counter would take ~272 years to overflow so it's not really a concern in practice.
| return new ImmutableRatio(combinedBytesIn, combinedBytesOut); | ||
| } catch (ArithmeticException e) { | ||
| // don't crash, just start over. | ||
| return new ImmutableRatio(bytesIn, bytesOut); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log here, just in case there is some misplaced calculation that triggers this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this in 7ae2395 by logging and halving the existing long value, and then logging. Combined with changing the interface of IORatioMetric#incrementBy to take int, int this should make those invalid calculations even harder to come by.
| compression: | ||
| type: object | ||
| properties: | ||
| encode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would including the goal (speed, balanced, etc) in the payload provide value for monitoring, or checking diags without needing to refer back to logstash.yml?
(This is ok to add as a follow-on PR if we think it is worthwhile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be trivial.
But the queue.compression.encode namespace is only present when encoding operations are possible (e.g., speed, balanced, size) and missing when configured with none. Do you have a preference for where we would put it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think queue.compression.encode.goal would work ok if that namespace is present. If it is absent, then none is implied, which makes sense, I think
|
|
||
| if (bytesIn.signum() == 0) { | ||
| return switch(bytesOut.signum()) { | ||
| case -1 -> Double.NEGATIVE_INFINITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible? Or should incrementBy only accept positive values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added safeguards to what is acceptable, so this should never get hit, but I would rather leave it in place instead of crashing if the impossible occurs.
| public void testZeroBytesInNegativeBytesOut() { | ||
| final AtomicIORatioMetric ioRatioMetric = new AtomicIORatioMetric("name"); | ||
|
|
||
| ioRatioMetric.incrementBy(0, -768); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a use case that the ratio should support? Or should this raise an IllegalArgumentException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed in 7ae2395 by making it log and ignore attempts to decrement. This is only ever called with the result of byte[]#length (which is never negative), but I would very much not like this to be a crashing problem that puts in-flight events at risk.
By using `int` type in `IORatioMetric#incrementBy(int,int)`, we simplify the failure scenarios while still allowing the desired behaviour, since this is always called in practice with `byte[]#length`. We ensure that attempts to decrement the value are ignored, and result in a log message, and that overflows reduce precision and are also logged. Together, these ensure that long overflows won't ever result in pipeline crashes.
|
robbavey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made LGTM
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
robbavey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* pq-compression: wire through custom user metrics for ratio/spend * add test for queue compression metrics * pq metrics: IORatioMetric edge-case logging, use int at interface By using `int` type in `IORatioMetric#incrementBy(int,int)`, we simplify the failure scenarios while still allowing the desired behaviour, since this is always called in practice with `byte[]#length`. We ensure that attempts to decrement the value are ignored, and result in a log message, and that overflows reduce precision and are also logged. Together, these ensure that long overflows won't ever result in pipeline crashes.





Release notes
[rn: skip]
What does this PR do?
Adds metrics to the pipeline's
queuenamespace to reflect the cost and benefit of compression (see: #18107):Why is it important/What is the impact to the user?
It gives a user insight into the cost and benefit of compression options, so that they can make informed decisions about expending resources.
Checklist
[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
example-input.ndjson:feedutility to your path