Conversation
796cf6c to
a730466
Compare
lukasz-stec
left a comment
There was a problem hiding this comment.
Nice! This should bring some perf improvement, I have seen 20% perf drop in some queries because of group id code was not compiled optimally by jit. using int[] will likely help with that.
There was a problem hiding this comment.
Would it be beneficial to handle this case better?
We could still have GroupIdBLock that has two states, int[] and single value and count similar to SelectedPositions and then do
if (groupIdsBlock.isArray()) {
// handle array
} else {
// handle rle
}
``` in the `Accumulator`
There was a problem hiding this comment.
Maybe. I'm not sure the complexity will result in better performance in the 99% case where we don't have RLEs. My thought is we switch to int[] and establish a new base line. Then in follow up work we can investigate if adding wrapper with RLE support helps.
There was a problem hiding this comment.
Does it make sense to also make io.trino.spi.function.GroupedAccumulatorState#setGroupId accept int instead of long?
There was a problem hiding this comment.
Yes! That will be in some follow up work.
a730466 to
1fc7432
Compare
Group id block was used to carry the max group id to the grouped accumulator, but the max group id is now set directly with the setGroupCount method. An int[] is used because the GroupByHash implementations only create int group ids.
1fc7432 to
ca61901
Compare
Group id block was used to carry the max group id to the grouped accumulator, but the max group id is now set directly with the setGroupCount method.
An int[] is used because the GroupByHash implementations only create int group ids.
Release notes
(x) This is not user-visible or docs only and no release notes are required.