Improve the performance of filter functions#16681
Improve the performance of filter functions#16681raunaqmorarka merged 2 commits intotrinodb:masterfrom
Conversation
core/trino-main/src/test/java/io/trino/operator/scalar/BenchmarkArrayFilterObject.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/scalar/BenchmarkArrayFilterObject.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/scalar/BenchmarkArrayFilterObject.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/scalar/ArrayFilterFunction.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/operator/scalar/BenchmarkArrayFilterObject.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Could you check if writing this part as
positions[length] = position;
length += TRUE.equals(keep) ? 1 : 0;
would give better results ?
There was a problem hiding this comment.
I reverted to the if clause and benchmarked again, the result of if clause was below:
Benchmark (name) Mode Cnt Score Error Units
BenchmarkArrayFilter.benchmark filter avgt 20 13.320 ± 0.610 ns/op
BenchmarkArrayFilter.benchmarkObject filter avgt 20 32.699 ± 0.945 ns/op
The result of ? : clause was below:
Benchmark (name) Mode Cnt Score Error Units
BenchmarkArrayFilter.benchmark filter avgt 20 13.576 ± 0.849 ns/op
BenchmarkArrayFilter.benchmarkObject filter avgt 20 32.051 ± 0.268 ns/op
It seems that the performance didn't have too much improvement contrast to the if clause, but i still changed to this way because the code is cleaner~
There was a problem hiding this comment.
I think whether you see improvement depends on whether the input data has a significant fraction of randomly generated nulls
There was a problem hiding this comment.
I think whether you see improvement depends on whether the input data has a significant fraction of randomly generated nulls
@raunaqmorarka Good question, you reminded me~ So I temporarily changed the nullRate to 0.8F in BlockAssertions#createRandomBlockForType(the former null rate was 20%):

Then i benchmarked again, below was the result(i.e. with 80% null values):
Before our improvement
Benchmark (name) Mode Cnt Score Error Units
BenchmarkArrayFilter.benchmarkObject filter avgt 20 7.694 ± 0.345 ns/op
After our improvement
Benchmark (name) Mode Cnt Score Error Units
BenchmarkArrayFilter.benchmarkObject filter avgt 20 7.301 ± 2.077 ns/op
So our improvement will only have a little speed up, if there are most null values in the column.
But the less null values are, the faster our improvement will be.
core/trino-main/src/test/java/io/trino/operator/scalar/BenchmarkArrayFilterObject.java
Outdated
Show resolved
Hide resolved
Before the change: Benchmark (name) Mode Cnt Score Error Units BenchmarkArrayFilter.benchmark filter avgt 20 22.543 ± 0.979 ns/op BenchmarkArrayFilter.benchmarkObject filter avgt 20 42.045 ± 2.088 ns/op After the change: Benchmark (name) Mode Cnt Score Error Units BenchmarkArrayFilter.benchmark filter avgt 20 13.327 ± 0.359 ns/op BenchmarkArrayFilter.benchmarkObject filter avgt 20 34.443 ± 1.943 ns/op
9cb9d4d to
4745664
Compare
|
@raunaqmorarka Thank you for your advice above very much, i have modified some implementations, please review again when you have time~ |
Description
In the original implementation of
ArrayFilterFunction.filter*, an additional copy of the filtered block is adopted. The speed, especially for some nested column types such asRowTypeis not perfect.So we improved the implementation to record the block index position, then directly call
Block.copyPositions()after the filtering is completed, in order to reduce unnecessary block copies such asType.appendTo()andBlockBuilder.build().Additional context and related issues
Before the change
After the change
Release notes
( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: