Skip to content

Improve performance of VariableWidthBlock#copyPositions#10409

Merged
sopel39 merged 2 commits intotrinodb:masterfrom
raunaqmorarka:var-copy
Jan 10, 2022
Merged

Improve performance of VariableWidthBlock#copyPositions#10409
sopel39 merged 2 commits intotrinodb:masterfrom
raunaqmorarka:var-copy

Conversation

@raunaqmorarka
Copy link
Copy Markdown
Member

@raunaqmorarka raunaqmorarka commented Dec 25, 2021

Instead of copying every position one at a time, copy
contiguous ranges of Slice to reduce the number of times
SliceOutput#writeBytes is called when there are sequences
in the positions being copied.

Reduce branches for populating null values

BenchmarkVariableWidthBlock.copyPositions

(nullsAllowed) (selectedPositions) (selectedPositionsCount) us/op Before us/op After
    FALSE          GROUPED             200                      7.108        4.222
    FALSE          GROUPED             1000                     48.71        22.427
    FALSE          GROUPED             8000                     476.058      186.539
    FALSE          SEQUENCE            200                      6.912        4.107
    FALSE          SEQUENCE            1000                     47.931       21.282
    FALSE          SEQUENCE            8000                     473.199      174.198
    FALSE          RANDOM              200                      8.175        7.505
    FALSE          RANDOM              1000                     53.125       51.679
    FALSE          RANDOM              8000                     519.651      478.773
    TRUE           GROUPED             200                      5.897        4.099
    TRUE           GROUPED             1000                     38.14        19.807
    TRUE           GROUPED             8000                     394.617      158.567
    TRUE           SEQUENCE            200                      6.06         3.508
    TRUE           SEQUENCE            1000                     39.77        17.536
    TRUE           SEQUENCE            8000                     413.274      149.79
    TRUE           RANDOM              200                      7.076        7.201
    TRUE           RANDOM              1000                     42.756       43.626
    TRUE           RANDOM              8000                     416.765      385.222

Copy link
Copy Markdown
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid there will be a regression for certain position arrays. Is there any benchmark?

IMHO it would be best to apply some heuristics to guess if the "new" way of copying will be faster e.g. some function of fill ratio and average position length. We can also fallback to old code if the first loop uncovers highly fragmented positions array.

@raunaqmorarka raunaqmorarka force-pushed the var-copy branch 4 times, most recently from 8654fb2 to f7acdc9 Compare December 27, 2021 14:15
@raunaqmorarka
Copy link
Copy Markdown
Member Author

I am afraid there will be a regression for certain position arrays. Is there any benchmark?

IMHO it would be best to apply some heuristics to guess if the "new" way of copying will be faster e.g. some function of fill ratio and average position length. We can also fallback to old code if the first loop uncovers highly fragmented positions array.

I've tweaked the code a bit and posted JMH results with different selection patterns. It's doing better in all the benchmarked scenarios.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think offsets are guaranteed to make sense for null positions (add a test?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes at a particular offset in slice for a null position can be junk, we don't care about that. However, the offset positions need to be valid in existing code as well, otherwise the result of getSliceLength for the next non-null position could go wrong.

Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0 is bad?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I just copied this one from BenchmarkDataGenerator#randomNullChance
@skrzypo987 do you know why ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a < at the end, so if random picks 0 it will return null for null chance of 0.
We basically want to pick random out of [0,1>, but nextBoolean returns <0,1>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify it

Instead of copying every position one at a time, copy
contiguous ranges of Slice to reduce the number of times
SliceOutput#writeBytes is called when there are sequences
in the positions being copied.

Reduce branches for populating null values

BenchmarkVariableWidthBlock.copyPositions
(nullsAllowed) (selectedPositions) (selectedPositionsCount) us/op Before us/op After
FALSE	       GROUPED		   200			    7.108        4.222
FALSE	       GROUPED		   1000			    48.71        22.427
FALSE	       GROUPED		   8000			    476.058      186.539
FALSE	       SEQUENCE		   200			    6.912        4.107
FALSE	       SEQUENCE		   1000			    47.931       21.282
FALSE	       SEQUENCE		   8000			    473.199      174.198
FALSE	       RANDOM		   200			    8.175        7.505
FALSE	       RANDOM		   1000			    53.125       51.679
FALSE	       RANDOM		   8000			    519.651      478.773
TRUE	       GROUPED		   200			    5.897        4.099
TRUE	       GROUPED		   1000			    38.14        19.807
TRUE	       GROUPED		   8000			    394.617      158.567
TRUE	       SEQUENCE		   200			    6.06         3.508
TRUE	       SEQUENCE		   1000			    39.77        17.536
TRUE	       SEQUENCE		   8000			    413.274      149.79
TRUE	       RANDOM		   200			    7.076        7.201
TRUE	       RANDOM		   1000			    42.756       43.626
TRUE	       RANDOM		   8000			    416.765      385.222
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify it

@sopel39 sopel39 merged commit 5bae1e5 into trinodb:master Jan 10, 2022
@raunaqmorarka raunaqmorarka deleted the var-copy branch January 10, 2022 15:15
@github-actions github-actions bot added this to the 368 milestone Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants