Skip to content

Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention#13874

Merged
wenleix merged 1 commit intoprestodb:masterfrom
wenleix:arrayjoin
Dec 20, 2019
Merged

Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention#13874
wenleix merged 1 commit intoprestodb:masterfrom
wenleix:arrayjoin

Conversation

@wenleix
Copy link
Contributor

@wenleix wenleix commented Dec 17, 2019

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Optimizer performance for array_join

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 17, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Wenlei Xie (f6ebeda52f2bec4a7519a847f2eb081ad665c9d1)

@wenleix wenleix force-pushed the arrayjoin branch 3 times, most recently from 4857cfd to 22b044f Compare December 18, 2019 00:43
@wenleix wenleix changed the title Support PROVIDED_BLOCKBUILDER return convention for array_join Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention Dec 18, 2019
@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

Benchmark shows over 10% improvements.

Before

Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  152.954 ± 1.246  ns/op

After

Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  134.558 ± 2.078  ns/op

@wenleix wenleix requested review from arhimondr and highker December 18, 2019 00:44
@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

See #9638 and #12166 for context.

cc @oerling , we once talked about allowing scalar function to directly write to output buffer (to avoid copy data for struct types). The framework is implemented but not function is yet using it. @kaikalur recently also observed such inefficiency when optimizing user's query, so here is an example about how to use it :) .

ArrayJoin only gets moderate benefit as the function logic is also quite intensive. This type of optimization would have more improvements for functions with light computations :)

@wenleix
Copy link
Contributor Author

wenleix commented Dec 18, 2019

I realized the benchmark is over ARRAY(BIGINT) thus casting from BIGINT to VARCHAR can take significant time. Benchmark over ARRAY(VARCHAR) would probably show more improvements :)

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Benchmark shows over 10% improvements.

Before
```
Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  152.954 ± 1.246  ns/op
```

After
```
Benchmark                     Mode  Cnt    Score   Error  Units
BenchmarkArrayJoin.benchmark  avgt   60  134.558 ± 2.078  ns/op
```
@wenleix wenleix merged commit fdb7611 into prestodb:master Dec 20, 2019
@wenleix wenleix deleted the arrayjoin branch December 20, 2019 08:06
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants