Skip to content

[native-pos] Disable AsyncDataCache (thus prefetching)#20969

Merged
mbasmanova merged 1 commit intoprestodb:masterfrom
Yuhta:spark-on-presto-split-preload
Oct 19, 2023
Merged

[native-pos] Disable AsyncDataCache (thus prefetching)#20969
mbasmanova merged 1 commit intoprestodb:masterfrom
Yuhta:spark-on-presto-split-preload

Conversation

@Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Sep 26, 2023

We see negative performance impact (5% ~ 30%) when split prefetch is enabled,
hence adding these config options to disable it.

@Yuhta Yuhta marked this pull request as ready for review September 26, 2023 23:46
@Yuhta Yuhta requested review from a team and shrinidhijoshi as code owners September 26, 2023 23:46
@Yuhta Yuhta requested a review from presto-oss September 26, 2023 23:46
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from 8046570 to ab5c4fd Compare September 26, 2023 23:53
@shrinidhijoshi
Copy link
Collaborator

@Yuhta Please update the commit message (and PR description) with brief explanation of why this change is needed.

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 2 times, most recently from c4d92e4 to a37e18e Compare September 27, 2023 01:27
Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried to do a test offline but the internal tooling was slowing us down. As this is non-breaking and very specific to cpp process, I am approving this. We can come back and fix if something is missing

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 8 times, most recently from b6edb01 to bc4c894 Compare October 4, 2023 00:54
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta Use [native-pos] prefix for Presto-on-Spark specific changes.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta What is the definition of num-connector-io-threads property? Is it documented already? If not, would you document it?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see negative performance impact (5% ~ 30%) when split prefetch is enabled,

What is causing that?

@Yuhta Yuhta changed the title [native] Add num-connector-io-threads to Presto Spark [native] Disable AsyncDataCache (thus prefetching) for Presto on Spark Oct 4, 2023
@Yuhta Yuhta changed the title [native] Disable AsyncDataCache (thus prefetching) for Presto on Spark [native-pos] Disable AsyncDataCache (thus prefetching) for Presto on Spark Oct 4, 2023
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from bc4c894 to 9763999 Compare October 4, 2023 15:21
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta Since you have [native-pos] prefix, adding "for Presto on Spark" is redundant. Let's remove from both PR title and commit message.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see negative performance impact (5% ~ 30%) when split prefetch is enabled,

@Yuhta What's the cause of performance impact?

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 2 times, most recently from 1c416c2 to 896ec91 Compare October 4, 2023 15:36
@Yuhta Yuhta requested a review from a team as a code owner October 4, 2023 15:36
@Yuhta Yuhta changed the title [native-pos] Disable AsyncDataCache (thus prefetching) for Presto on Spark [native-pos] Disable AsyncDataCache (thus prefetching) Oct 4, 2023
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 2 times, most recently from 5e9be11 to dbba536 Compare October 4, 2023 16:18
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from dbba536 to b246328 Compare October 17, 2023 16:47
@github-actions
Copy link

github-actions bot commented Oct 17, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 3d3ee49...ca341eb.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/develop/presto-native.rst

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I added a few suggestions to make some statements more explicit and easier to understand. If I have made a suggestion that changes the meaning in a way you disagree with, please help me understand the meaning.

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from 08a3a56 to b2c3ec6 Compare October 17, 2023 20:48
@Yuhta Yuhta requested a review from steveburnett October 17, 2023 20:48
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 2 times, most recently from aaf03f6 to 27c84d6 Compare October 18, 2023 14:18
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing some other configs re: SSD. I assume one needs to specify rott directory or something, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use async data cache without SSD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but if I want to use SSD I assume I need more properties than just size, no?

Copy link
Contributor Author

@Yuhta Yuhta Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you will need to set SSD size in another property to indicate how much space is available

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: async-data-cache-enabled vs. async-cache-ssd-gb - one says data-cache, the other just cache. Would be nice to fix.

Copy link
Contributor Author

@Yuhta Yuhta Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be a big risky change, rather not changing it here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Perhaps, file a GitHub issue about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these have default values? true, 0 and 30 (according to documentation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set it to false here to disable it. The default values are for when it is not overwritten here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you explicitly set it or make a comment that default is 'false'?

private boolean asyncDataCacheEnabled = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default values are for when it is not overwritten here.

I'm not sure I understand.... The user reading the documentation would assume that default is what they get without changing anything. It looks like actual default is false, no?

Copy link
Contributor Author

@Yuhta Yuhta Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it explicitly violates a lint rule. I will put a comment.

The default is you don't overwrite it in code (i.e., instantiate a plain vanilla PrestoServer). Here POS is overwriting it explicitly so the defaults no longer applies. Do you have a place I can write POS defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova I added the comment and add the default value for POS in the documentation in addition to the vanilla Presto

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from 27c84d6 to 392a8dc Compare October 18, 2023 19:40
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting to unblock, but I'm very confused by how these properties are set and what the defaults are. It would be nice to clear this up in a follow up.

@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch 2 times, most recently from 85fb23c to 3fb5382 Compare October 18, 2023 20:29
@aditi-pandit
Copy link
Contributor

@Yuhta , @mbasmanova : We noticed this negative performance impact internally for Prestissimo as well. Did you'll only see it for POS, but not Prestissimo ? (Just curious as you are only turning off POS configs)

We should turn it off in https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/NativeQueryRunnerUtils.java#L33 config as well. wdyt ?

@Yuhta
Copy link
Contributor Author

Yuhta commented Oct 18, 2023

@aditi-pandit We have different deployments of Prestissimo and some do turn the option off. You can use the worker config file to overwrite the system config. POS needs special treatment because the configuration is a little bit weird and worker config does not work out of box there.

We see negative performance impact (5% ~ 30%) when split prefetch is enabled,
hence adding these config options to disable it.
@Yuhta Yuhta force-pushed the spark-on-presto-split-preload branch from 3fb5382 to ca341eb Compare October 18, 2023 21:56
@mbasmanova mbasmanova merged commit abc7491 into prestodb:master Oct 19, 2023
@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants