[Prototype & Benchmark] Support directly writing to output block builder for scalar functions by wenleix · Pull Request #9638 · prestodb/presto

wenleix · 2017-12-28T22:59:51Z

Introduction

Today the return convention for scalar function is always to return on stack, and the callee will append the results value on stack into the result BlockBuilder(for out-most function call) or use it to invoke other functions (for inner/nested function call like f(g(x)) )

While this return convention works well for primitive types, it's not optimal for structural types since it always has to copy the result block.

Proposed Solution

To address this inefficiency, one idea is to introduce a new return convention that directly writes to the output block builder. This requires two part of the work:

We need to support compiling a RowExpression to write to output block builder directly when presented. A prototype is done in [WIP] Compile RowExpression to directly write to output block builder #8747.
Update: This new return convention is supported via Implement PROVIDED_BLOCKBUILDER return place convention for scalar function #12166.
Since the callee might expect certain return convention (return on stack vs. direct output write), we need be able to adapt between different return convention. While adapting from return on stack to direct output write is trivial, the other direction of adaption is more involved, as many functions leverage CachedInstanceBinder to maintain function state, and to avoid repeatedly allocating memory for result (see [Prototype & Benchmark] Support directly writing to output block builder for scalar functions #9638 (comment)).

An preliminary proof-of-concept of the InvocationAdapter can be found in commit wenleix@d0fb108

Benchmark Result

In this PR we prototyped the preliminary support to directly write to output block builder and benchmark the potential performance gain. This implementation is not a full support since the adaption for a caller expect return on stack behavior while callee provides directly write to block behavior requires more work .

To see the potential performance gain, we add a fake array_identity function which copies the array (see the second commit), and benchmark its performance:

Benchmark                                        (name)  Mode  Cnt   Score   Error  Units
BenchmarkArrayIdentity.benchmark         array_identity  avgt   20  22.721 ± 1.298  ns/op
BenchmarkArrayIdentity.benchmark  array_identity_direct  avgt   20  10.557 ± 0.466  ns/op

We see about 2x performance gain by directly writing to output block builder (instead of putting the result block on stack and copy to the final block).

This is based on #8747.

…ut block builder Also see prestodb#8747

sopel39 · 2017-12-29T09:33:24Z

Nice!
Have you also though about reversing the ownership of objects returned by scalar? Currently object returned by scalar is owned by caller. If the object is still owned by scalar then the we could use mutable accumulators to store the scalar result. We would save on object allocation (and memory zeroing for Slice), but also JIT could probably optimize such code better.

Alternatively for scalars the returned object could be passed as out parameter.

wenleix · 2018-01-02T18:27:03Z

Thank you @sopel39 ! :)

Speaking of making scalar function owning the returning objects, the main benefit is to save object allocation right? I think CachedInstanceBinder is helping on this. -- In a typical usage, the cached instance (or the function state) will be a PageBuilder with output type. The output is always appending to this PageBuilder and slicing the last block as return type.

In summary, CachedInstanceBinder helps the performance with the following two aspects:
- Instead of allocating many small chunk of memory (BlockBuilder), it does one big allocation (a PageBuilder) and slicing block from it.
- When a page is full and a new page is allocated (through PageBuilder.reset()), it can use the stats collected from PageBuilderStatus to help pre-allocate underlying block size in the future batches. This should help reduce the chance that the underlying BlockBuilder get resized (which needs copy the current data, etc).
It doesn't help with
- Avoid data copy to the output BlockBuilder.

sopel39 · 2018-01-03T10:27:02Z

Speaking of making scalar function owning the returning objects, the main benefit is to save object allocation right?

That's correct. Specifically I was thinking about fast decimal, which is represented by Slice for larger scale. Currently we return a new Slice for every scalar execution which involves long decimals. I remember I did some POC and it improved decimal performance by double digit percentage numbers.

wenleix · 2018-01-05T23:05:28Z

@sopel39 : With this new return convention , we can also support writing slice into output block :). So for outmost function that returns long decimal, it can avoid allocating new Slice and get all the saves (and even avoid the copying).

For function in the middle of chained calls (e.g. the g(x) in f(g(x))), it will write to an temporary block, and the engine will get the Slice from block, which might still have allocation overhead. One way to avoid this is to introduce the calling convention that passed in BLOCK_INPUT and BLOCK_INDEX, similar to aggregation functions.

stale · 2019-10-04T18:52:55Z

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

wenleix added 2 commits December 28, 2017 14:45

[Prototype] Support compiling RowExpression to directly write to outp…

3aad015

…ut block builder Also see prestodb#8747

[Benchmark Only] Add array_identity function and benchmarks

565ade4

facebook-github-bot added the CLA Signed label Dec 28, 2017

stale bot added the stale label Oct 4, 2019

stale bot closed this Oct 11, 2019

wenleix mentioned this pull request Dec 18, 2019

Optimize array_join by supporting PROVIDED_BLOCKBUILDER convention #13874

Merged

wenleix changed the title ~~[Benchmark] Benchmark support directly writing to output block builder for scalar functions~~ [Design] Support directly writing to output block builder for scalar functions Jun 16, 2020

wenleix changed the title ~~[Design] Support directly writing to output block builder for scalar functions~~ [Prototype & Benchmark] Support directly writing to output block builder for scalar functions Jun 16, 2020

wenleix mentioned this pull request Jun 16, 2020

Implement PROVIDED_BLOCKBUILDER return place convention for scalar function #12166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prototype & Benchmark] Support directly writing to output block builder for scalar functions#9638

[Prototype & Benchmark] Support directly writing to output block builder for scalar functions#9638
wenleix wants to merge 2 commits intoprestodb:masterfrom
wenleix:write2bb_bench

wenleix commented Dec 28, 2017 •

edited

Loading

Uh oh!

sopel39 commented Dec 29, 2017

Uh oh!

wenleix commented Jan 2, 2018

Uh oh!

sopel39 commented Jan 3, 2018

Uh oh!

wenleix commented Jan 5, 2018

Uh oh!

stale bot commented Oct 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenleix commented Dec 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Proposed Solution

Benchmark Result

Uh oh!

sopel39 commented Dec 29, 2017

Uh oh!

wenleix commented Jan 2, 2018

Uh oh!

sopel39 commented Jan 3, 2018

Uh oh!

wenleix commented Jan 5, 2018

Uh oh!

stale bot commented Oct 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenleix commented Dec 28, 2017 •

edited

Loading