Skip to content

[native pos] Remove busy-waiting-loop to reduce the CPU usage#19926

Merged
miaoever merged 1 commit intoprestodb:masterfrom
miaoever:remove_busy_wait
Jun 23, 2023
Merged

[native pos] Remove busy-waiting-loop to reduce the CPU usage#19926
miaoever merged 1 commit intoprestodb:masterfrom
miaoever:remove_busy_wait

Conversation

@miaoever
Copy link
Contributor

@miaoever miaoever commented Jun 21, 2023

The busy-waiting-loop in PrestoSparkNativeTaskExecutorFactory is running in spark executor main thread during task execution which consuming up to 40% of spark executor JVM CPU in our local profiling. In order to reduce the CPU usage, We make the task main thread waiting on two signals (task finish and output is ready) rather than busy waiting. We also removed unused field in the same class (functionAndTypeManager).

As the results, in our test run in cluster, we're seeing the same query using 30% less CPU time in total query CPU usage.

== NO RELEASE NOTE ==

@miaoever miaoever requested a review from a team as a code owner June 21, 2023 19:22
@miaoever miaoever requested a review from presto-oss June 21, 2023 19:22
@miaoever miaoever force-pushed the remove_busy_wait branch 2 times, most recently from 0df17ed to 4202de7 Compare June 21, 2023 19:33
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miaoever Thanks.

@shrinidhijoshi Would you take a look as well?

@miaoever miaoever requested review from a team, mbasmanova and shrinidhijoshi June 21, 2023 19:40
Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for investigating this @miaoever .

Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can simplify this code by writing 2 different iterators - 1. shuffleMap task and 2. for result task,
Reason being

  • Behavior of Shuffle Map task is fundamentally different and is more straight forward (wait for CompletableFuture returned by NativeExecutionTask) Most of the stage in all of our prod workload would use the ShuffleMap iterator as that is the 99% use-case.

I am concerned the multiple wait()/notifyAll() pattern here might be hiding behaviors that are even harder to spot than the current busy-wait bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews @shrinidhijoshi.

Technically waiting for CompletableFuture (CompletableFuture::get()) is also a busy-waiting which is a not good idea to have it on the main thread.

On the other hand, for our scenarios - asynchronous communication between different threads/processes, I believe the wait()/notifyAll() is the right java primitives to use if we can't use more built-in high level blocks (e.g. blockingQueue etc.)

@facebook-github-bot
Copy link
Collaborator

@miaoever has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

The busy-waiting-loop in PrestoSparkNativeTaskExecutorFactory is
running in spark executor main thread during task execution which consuming
up to 40% of spark executor JVM CPU in our local profiling. In order to reduce
the CPU usage, We make the task main thread waiting on two signals
(task finish and output is ready) rather than busy waiting. We also removed
unsafe field in the same class (functionAndTypeManager).

As the results, in our test run in cluster, we're seeing the same query
using 30% less CPU time in total query CPU usage.
@facebook-github-bot
Copy link
Collaborator

@miaoever has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants