Skip to content

Improvements to Unnest Operator#10506

Merged
phd3 merged 8 commits intotrinodb:masterfrom
phd3:unnest-reviewready
Jul 12, 2022
Merged

Improvements to Unnest Operator#10506
phd3 merged 8 commits intotrinodb:masterfrom
phd3:unnest-reviewready

Conversation

@phd3
Copy link
Copy Markdown
Member

@phd3 phd3 commented Jan 7, 2022

Description

Improve performance of Unnest operator. Comparable changes from prestodb/presto#13751 with some bugfixes.

  • build output blocks in a column-oriented fashion to have tighter loops
  • Simplifying the common codepath when no nulls need to be added to the output block.
  • For cases when nulls need to be appended to output block, add more efficient implementation for copying block content with an appended null.

Benchmark on jmh.morethan.io

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# General
* Improve performance of queries with `UNNEST` clause. ({issue}`10506`)

@cla-bot cla-bot bot added the cla-signed label Jan 7, 2022
@phd3 phd3 force-pushed the unnest-reviewready branch from bae88bd to cca916a Compare January 8, 2022 18:34
@phd3
Copy link
Copy Markdown
Member Author

phd3 commented Jan 8, 2022

one of the failures was unrelated #10038

@phd3
Copy link
Copy Markdown
Member Author

phd3 commented Jan 10, 2022

all failures on latest run are unrelated: #9199, #9423

@sopel39 sopel39 requested a review from erichwang February 1, 2022 11:07
@phd3 phd3 requested review from arhimondr and dain February 1, 2022 21:36
@arhimondr
Copy link
Copy Markdown
Contributor

@phd3 I'm a little overbooked right now and likely wont be able to take a look this week. I can try to find some time next week if not reviewed / merged by then.

Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Naming mostly and some questions.
Is the implementation already proofed on prestodb?

@phd3 phd3 force-pushed the unnest-reviewready branch from cca916a to b241e5c Compare May 24, 2022 16:10
@phd3 phd3 force-pushed the unnest-reviewready branch from 3e8a5b3 to c7e2210 Compare June 27, 2022 20:05
@phd3
Copy link
Copy Markdown
Member Author

phd3 commented Jun 27, 2022

@phd3
Copy link
Copy Markdown
Member Author

phd3 commented Jun 28, 2022

flaky #12875

@phd3 phd3 requested a review from losipiuk June 28, 2022 02:45
@phd3 phd3 force-pushed the unnest-reviewready branch from c7e2210 to 98d9501 Compare June 28, 2022 02:50
Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.
On the second look it feels like naming could still be a bit nicer. But if it is pain to address I am fine with names from the PR too.

As for the benchmark, are we sure that the refactor in the last commit does not introduce behaviour change (i.e. are we sure improved performance is strictly due to production changes, not benchmark code changes)? Did you try to run benchmark with new production code, but old benchmark class without refactorings?

And please address the conflict.

phd3 and others added 6 commits July 6, 2022 12:33
Benchmark	(nullsRatio)	Mode	Cnt	Score		Error	Units
BenchmarkCopyBlock.copyBlockByAppend	0	avgt	30	4941.237	±	1221.599 us/op
BenchmarkCopyBlock.copyBlockByAppend	0.2	avgt	30	4583.895	±	268.919	us/op
BenchmarkCopyBlock.copyBlockByLoop	0	avgt	30	1421.283	±	164.73	us/op
BenchmarkCopyBlock.copyBlockByLoop	0.2	avgt	30	1850.813	±	263.724	us/op

Comparable changes from prestodb/presto#13746

Co-authored-by: Ying Su <yingsu@fb.com>
Comparable changes from prestodb/presto#13746

Co-authored-by: Ying Su <yingsu@fb.com>
Comparable changes from prestodb/presto#13746

replicate,unnest				unnest null %	Implementation	us/op
varchar,array(varchar),	            		0	old	12112.09	±	925.452	us/op
varchar,array(varchar),				0	new	2584.86	±	242.155	us/op
varchar,array(varchar),				0.05	old	11609.358	±	869.656	us/op
varchar,array(varchar),				0.05	new	2278.788	±	109.432	us/op
varchar,map(varchar,varchar),			0	old	14539.333	±	730.733	us/op
varchar,map(varchar,varchar),			0	new	2368.276	±	87.712	us/op
varchar,map(varchar,varchar),			0.05	old	13898.414	±	118.539	us/op
varchar,map(varchar,varchar),			0.05	new	2289.834	±	41.853	us/op
varchar,array(row(varchar,varchar,varchar)),	0	old	27944.668	±	1292.521	us/op
varchar,array(row(varchar,varchar,varchar)),	0	new	2643.749	±	214.019	us/op
varchar,array(row(varchar,varchar,varchar)),	0.05	old	28443.996	±	1846.325	us/op
varchar,array(row(varchar,varchar,varchar)),	0.05	new	14353.898	±	1038.962	us/op
varchar,array(array(varchar)),			0	old	12334.193	±	916.8	us/op
varchar,array(array(varchar)),			0	new	2452.163	±	138.614	us/op
varchar,array(array(varchar)),			0.05	old	11037.719	±	1353.202	us/op
varchar,array(array(varchar)),			0.05	new	3037.677	±	1288.91	us/op
varchar,array(varchar)|array(varchar),		0	old	223204.589	±	53170.945	us/op
varchar,array(varchar)|array(varchar),		0	new	11602.298	±	1359.086	us/op
varchar,array(varchar)|array(varchar),		0.05	old	21610.119	±	1298.055	us/op
varchar,array(varchar)|array(varchar),		0.05	new	7475.767	±	507.788	us/op

Co-authored-by: Ying Su <yingsu@fb.com>
phd3 and others added 2 commits July 6, 2022 12:35
* Reuse block generation code from BlockAssertions
* Use one null ratio value as row nulls ratio isn't super useful
  in the benchmark matrix.
* Make unnest type inputs more generic.

Comparable changes from prestodb/presto#13746

Co-authored-by: Ying Su <yingsu@fb.com>
@phd3 phd3 force-pushed the unnest-reviewready branch from 98d9501 to 10725af Compare July 6, 2022 16:43
@phd3
Copy link
Copy Markdown
Member Author

phd3 commented Jul 8, 2022

As for the benchmark, are we sure that the refactor in the last commit does not introduce behaviour change (i.e. are we sure improved performance is strictly due to production changes, not benchmark code changes)? Did you try to run benchmark with new production code, but old benchmark class without refactorings?

I had tried that early on. Ran them again to make sure. The results are pretty similar. https://gist.github.com/phd3/b5c877aa28cbccbbcb232cbd69cc4b57. There's some issue with morethan displaying them side by side using gists. But here's the screenshot.

AC.

@losipiuk
Copy link
Copy Markdown
Member

losipiuk commented Jul 9, 2022

As for the benchmark, are we sure that the refactor in the last commit does not introduce behaviour change (i.e. are we sure improved performance is strictly due to production changes, not benchmark code changes)? Did you try to run benchmark with new production code, but old benchmark class without refactorings?

I had tried that early on. Ran them again to make sure. The results are pretty similar. https://gist.github.com/phd3/b5c877aa28cbccbbcb232cbd69cc4b57. There's some issue with morethan displaying them side by side using gists. But here's the screenshot.

AC.

Thank @phd3 . good to be merged IMO.

@phd3 phd3 merged commit 4aedf4d into trinodb:master Jul 12, 2022
@github-actions github-actions bot added this to the 390 milestone Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants