ES|QL: Implement heap attack tests for histograms#140714
ES|QL: Implement heap attack tests for histograms#140714JonasKunz merged 7 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
There is a to_string for histogram, if we want to stress test that. |
not-napoleon
left a comment
There was a problem hiding this comment.
I think the test itself looks good. It's probably worth adding a version that does the same thing with a legacy histogram, just in case something is bonkers in the casting logic.
What about testing a single, pathologically large histogram?
| StringBuilder bulk = new StringBuilder(); | ||
| int flush = 0; | ||
| for (int i = 0; i < numHistograms; i++) { | ||
| StringBuilder histoJson = new StringBuilder("{"); |
There was a problem hiding this comment.
Do we actually need to do this construction here? we have randomized functions that build both t-digests and exponential histograms already, and the storage classes have the ability to render as json. I don't love adding another place we randomly generate t-digests, especially ones that aren't actually built as t-digests
There was a problem hiding this comment.
We want deterministically sized data for these tests, therefore I think this is required.
And for the exponential histogram the code is really just histogram construction plus calling the serializer, so no duplicate logic here.
storage classes have the ability to render as json
This is the case for exponential histograms, but is that also possible for T-Digest? I couldn't find reusable code there. It should definitely be a a future refactor to remove this duplicate serialization logic.
My understanding here is that all heap-attack tests "grow" stuff in memory, e.g. through computations or duplications, in the compute engine. To my knowledge it isn't possible to do this for Therefore I don't see use in adding this kind of test for |
nik9000
left a comment
There was a problem hiding this comment.
I think it's good. We might need to do more cruel things later - like ingest a single giant histogram and to_string it. That's probably more compelling if we have a very dense ingest format for histograms.
| query.append("| STATS "); | ||
| query.append( | ||
| IntStream.range(0, numDuplications) | ||
| .mapToObj(i -> "val_" + i + " = PERCENTILE(histo, 50) WHERE histo_id != -" + i) |
I'd love to get one of these in to be honest. I do think it'd be worth doing the other type, just to be paranoid. It's not about the code being the same now, it's about what we'll do in six months without noticing. What about TO_STRING and TO_TDIGEST? TO_STRING could, for example, not be tracking the memory of each element. That's fine for numbers, but for these biggest types that seems like a thing. TO_TDIGEST looks to be limited to 2mb histograms. I see that it is using which isn't tracked. I might send it a 1.99mb histogram and a 2.01mb histogram. Just out of paranoia. |
|
@nik9000 I'm having trouble of constructing test for to_string / to_tdigest which are constructed so that when scaling up they actually explode in those functions and not somewhere else first. To my understanding this is the general pattern for these tests: Find some way of scaling up the memory usage in a particular function that we want to test for robustness. So for example for The same is not possible for So the only option would be to gradually grow
However, here we still don't guarantee that we actually test For TO_STRING we'd also have to change the testing entirely, because we fail the method call with an That's why I'm unsure on how to proceed here. If you tell me what approach to take for testing
I can add it as it is not much effort, but I think it will never test the thing we want to test. E.g. I would use the same testing I did for |
|
I've added now the tests for the histogram type and keeping it otherwise as-is. If we can resolve my concerns from the previous comment somehow, we can later add tests for TO_STRING. |
Part of #140508.
Implements heap attack tests for
tdigestandexponential_histogram.I also thought about adding heap-attack tests for
histogram, but the only thing you can do withhistogramin ES|QL currently is to cast it totdigest. So the test would end up testingtdigestagain, which seemed pointless to me.