Skip to content

Conversation

@pettyjamesm
Copy link
Member

@pettyjamesm pettyjamesm commented Aug 18, 2025

Description

Improve block serialization logic to prefer using primitive arrays over virtual calls when checking positions for nulls.

Benchmarks: BenchmarkBlockSerde.serialize*

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Aug 18, 2025
@wendigo
Copy link
Contributor

wendigo commented Aug 18, 2025

It this PR about reducing the network overhead for nulls or does it improve performance due to autovectorization as well?

@pettyjamesm pettyjamesm force-pushed the improve-null-bits-encoding branch 2 times, most recently from d2ace22 to bec67f3 Compare August 18, 2025 20:23
@pettyjamesm
Copy link
Member Author

It this PR about reducing the network overhead for nulls or does it improve performance due to autovectorization as well?

This PR doesn't change the serialized representation (and therefore has no impact on network overhead), the performance comes from better JIT code generation for the serialization loops, although not necessarily autovectorization specifically.

@pettyjamesm pettyjamesm marked this pull request as ready for review August 18, 2025 20:44
@pettyjamesm pettyjamesm requested a review from dain August 18, 2025 20:44
@pettyjamesm pettyjamesm force-pushed the improve-null-bits-encoding branch from bec67f3 to d85b614 Compare August 18, 2025 21:15
@pettyjamesm pettyjamesm requested a review from wendigo August 19, 2025 10:44
@pettyjamesm pettyjamesm force-pushed the improve-null-bits-encoding branch from d85b614 to 3912ab0 Compare August 19, 2025 19:29
@wendigo
Copy link
Contributor

wendigo commented Aug 20, 2025

@pettyjamesm since you've shared benchmark results, can you contribute benchmark itself too?

int currentByte = 0;
for (int position = 0; position < (length & ~0b111); position += 8) {
byte value = 0;
value |= (isNull[position + offset] ? 1 : 0) << 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster than:

            value |= block.isNull(position) ? 0b1000_0000 : 0;
            value |= block.isNull(position + 1) ? 0b0100_0000 : 0;
            value |= block.isNull(position + 2) ? 0b0010_0000 : 0;
            value |= block.isNull(position + 3) ? 0b0001_0000 : 0;
            value |= block.isNull(position + 4) ? 0b0000_1000 : 0;
            value |= block.isNull(position + 5) ? 0b0000_0100 : 0;
            value |= block.isNull(position + 6) ? 0b0000_0010 : 0;
            value |= block.isNull(position + 7) ? 0b0000_0001 : 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new version performs better (and more consistently) than the previous implementation across all null rates in the benchmark including when branches are (nearly) perfectly predicted (null rate 0.01 and 0.99). The performance of the new version is nearly flat (with some variation in performance coming from the logic performing null-supression of the values array) where as we see a typical "M"-shaped curve with the previous version where performance relies on the branch prediction rate- which suggests the new version is branchless although unfortunately I don't have a JVM build with hdsdis available to show the generated assembly to prove it.

@pettyjamesm
Copy link
Member Author

@pettyjamesm since you've shared benchmark results, can you contribute benchmark itself too?

The benchmark is already checked in as BenchmarkBlockSerDe. The results posted above only include the serialization benchmarks since the deserialization paths are unaffected by this change.

@pettyjamesm pettyjamesm force-pushed the improve-null-bits-encoding branch from 3912ab0 to b6f323c Compare August 20, 2025 13:01
Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@pettyjamesm pettyjamesm merged commit 624542d into trinodb:master Aug 20, 2025
179 of 182 checks passed
@pettyjamesm pettyjamesm deleted the improve-null-bits-encoding branch August 20, 2025 18:57
@github-actions github-actions bot added this to the 477 milestone Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants