Support internal communication with thrift #13894

highker · 2019-12-24T11:33:54Z

HTTP is too unreliable to use for internal communication. Add thrift support.

== RELEASE NOTES ==

General Changes
* Allow Presto nodes to shuffle data with Thrift protocol. Use config `internal-communication.task-communication-protocol` to control between HTTP and Thrift.
* Allow Presto nodes to announce state with Thrift protocol. Use config `internal-communication.server-info-communication-protocol` to control between HTTP and Thrift.

highker · 2019-12-24T23:10:15Z

Early benchmark result. Improved performance and reliability. No query failure in 5 hours with prod workload.

presto-main/src/main/java/com/facebook/presto/server/PrestoServer.java

wenleix · 2019-12-24T23:13:52Z

Early benchmark result. Improved performance and reliability. No query failure in 5 hours with prod workload.

@highker Nice win for latency! -- So I assume this improve wall time for light queries with little data for exchange?

Not sure how to interpret the first image though? Why still see "Worker Jetty Threads" even with Thrift PRC?

highker · 2019-12-24T23:17:58Z

@wenleix

So I image this improve wall time for light queries with little data for exchange?

Ya, I think so.

Not sure how to interpret the first image though? Why still see "Worker Jetty Threads" even with Thrift PRC?

It just shows we used very few Jetty threads given Jetty thread pool is always full in prod. I didn't touch the task update/status part. So that tunnel still goes through HTTP. That should be easy to change as well on top of the framework.

wenleix · 2019-12-24T23:28:54Z

It just shows we used very few Jetty threads given Jetty thread pool is always full in prod. I didn't touch the task update/status part. So that tunnel still goes through HTTP. That should be easy to change as well on top of the framework.

Yeah. Especially if we only want to migrate the RPC part (independent of migrating the encoding part, as there are recursive fields which Drift doesn't support well). Here is an POC of migrating HttpRemoteTask#sendUpdate (you probably also have a similar POC already~) : wenleix@8a562ff . Similar to what you changed, we need to have a more abstract RemoteTask that can switch between HttpXClient and ThriftXClient

arhimondr · 2019-12-26T15:29:26Z

Nice!

One note:

Drift uses native memory to buffer the entire request / response. This is needed to run blocking Thrift encoding / decoding. We should be tracking native memory utilization very closely when switching to Thrift.

mayankgarg1990 · 2019-12-26T18:59:25Z

I didn't completely understand the graphs uploaded above. Can you add more context around those graphs?

highker · 2019-12-26T20:04:51Z

@mayankgarg1990, the figure is to show the cluster is running healthy and fast with 600 fanout. The cluster is not blocked by jetty threads anymore.

@arhimondr, that is a very good point. We will monitor that for sure!

wenleix · 2020-01-08T06:24:52Z

I will start to take a look at the PR. @tdcmeehan I am wondering if you are also interested in taking a look into this ? 😃

wenleix

The first 3 commits LGTM. One question do we want to use "*" from artifacts id? Looks like we only use it when it's too cumbersome to list all artifacts. How many artifacts does netty have? :)

presto-mongodb/pom.xml

wenleix

"Initial support with Thrift RPC" LGTM. but definitely need someone else to review :P .

IIRC the announcer based broadcast mechanism works well for worker node, but somehow doesn't work for coordinator (thinking about "TableFinishOperator" which has COOORDINATOR_ONLY partitioning). Maybe want to add a comment in TestingPrestoServer.java when doing thrift server port announcement ? :)

presto-main/src/main/java/com/facebook/presto/server/testing/TestingPrestoServer.java

wenleix

"Add thrift support for exchange server". LGTM. minor comments.

presto-main/src/main/java/com/facebook/presto/server/ForAsyncTask.java

presto-main/src/main/java/com/facebook/presto/execution/buffer/SerializedPage.java

presto-main/src/main/java/com/facebook/presto/server/thrift/ThriftTaskClient.java

presto-main/src/main/java/com/facebook/presto/server/thrift/ThriftTaskService.java

presto-main/src/main/java/com/facebook/presto/metadata/ThriftRemoteNodeState.java

'

SyncMemoryBackend::handleCommand does not override parent method. Remove the implementation to avoid netty dependency.

Introduce RpcShuffleClient that allows using different RPC for shuffle. Refactor HttpPageBufferClient into PageBufferClient and HttpRpcShuffleClient that implements RpcShuffleClient. PageBufferClient now only handles page buffering and scheduling logic. The actual RPC detail is handled in HttpRpcShuffleClient

The return result of acknowledgeResults does not need handling. Use a future to make the call non-blocking instead of feeding it to a thread pool.

guhanjie · 2022-04-27T10:55:31Z

Early benchmark result. Improved performance and reliability. No query failure in 5 hours with prod workload.

@highker Could you explain the second pic?
I see the "Percent" asides the Y-axis, what's meaning?
Is it latency percentiles? but it seems that sum(Y) is not 100, :p, and what about the http with the same workload?
And, what's the directed differences(I mean the pros and cons) between http and thrift?

tdcmeehan · 2023-07-31T13:15:05Z

@guhanjie my guess is the Y axis is percent of the workload, and we're seeing a subsection of the X axis which might explain why it doesn't sum to 100.

FWIW, we later discovered that while this change did improve performance, it also increased native memory usage. This means clusters needed to be restarted sooner than previously. This is because the Thrift communication library under the hood was allocating more and ever increasing native memory to copy the shuffle output. This is because to copy the shuffle data over Thrift, we needed to copy the data one more time than before.

After a certain point, we decided upon the following:

Keep shuffle output as binary, but support async shuffles. This was an under the hood change to change the way we do shuffle over HTTP, but make it more dynamic and streaming. This avoids the need to copy the data one additional time.
For coordinator-to-worker communication (all the other Task endpoints), we can use Thrift over HTTP. This is still work in progress, but progress is slowly being made, and eventually all such communication can happen over Thrift.

highker force-pushed the thrift2 branch 2 times, most recently from 62b3ce0 to ec44775 Compare December 24, 2019 22:48

highker requested review from arhimondr, shixuan-fan and wenleix December 24, 2019 22:53

wenleix reviewed Dec 24, 2019

View reviewed changes

presto-main/src/main/java/com/facebook/presto/server/PrestoServer.java Outdated Show resolved Hide resolved

highker force-pushed the thrift2 branch from ec44775 to fc83236 Compare December 26, 2019 20:02

highker force-pushed the thrift2 branch 4 times, most recently from 3e59787 to acb46e5 Compare December 27, 2019 00:07

highker changed the title ~~Support exchange with thrift~~ Support internal communication with thrift Dec 27, 2019

highker force-pushed the thrift2 branch from acb46e5 to a28b3ea Compare December 27, 2019 00:22

highker requested a review from wenleix December 27, 2019 19:13

highker mentioned this pull request Jan 7, 2020

[WIP] Bypass http call if the node is local #13838

Closed

highker force-pushed the thrift2 branch from a28b3ea to 3282b79 Compare January 7, 2020 22:55

wenleix requested a review from tdcmeehan January 8, 2020 06:25

highker force-pushed the thrift2 branch from 3282b79 to 6c5b809 Compare January 8, 2020 07:39

wenleix reviewed Jan 10, 2020

View reviewed changes

presto-mongodb/pom.xml Outdated Show resolved Hide resolved

wenleix reviewed Jan 10, 2020

View reviewed changes

presto-main/src/main/java/com/facebook/presto/server/testing/TestingPrestoServer.java Outdated Show resolved Hide resolved

wenleix reviewed Jan 10, 2020

View reviewed changes

highker force-pushed the thrift2 branch from 6c5b809 to e6245ac Compare January 11, 2020 09:00

highker force-pushed the thrift2 branch from f1dc73f to 03aaaa0 Compare January 16, 2020 20:58

tdcmeehan reviewed Jan 16, 2020

View reviewed changes

presto-main/src/main/java/com/facebook/presto/metadata/ThriftRemoteNodeState.java Outdated Show resolved Hide resolved

highker force-pushed the thrift2 branch 2 times, most recently from 8a86533 to bdaf87e Compare January 16, 2020 22:07

tdcmeehan approved these changes Jan 16, 2020

View reviewed changes

tdcmeehan reviewed Jan 16, 2020

View reviewed changes

' Outdated Show resolved Hide resolved

James Sun and others added 18 commits January 16, 2020 16:19

Strip netty dependency from presto-elasticsearch

8fc6ae1

Remove netty dependency from presto-mongodb

ef9a982

SyncMemoryBackend::handleCommand does not override parent method. Remove the implementation to avoid netty dependency.

Strip netty dependency from presto-cassandra

fb0c22f

Initial support with Thrift RPC

9a24d43

Announce thrift port for PrestoServer

7e0d4ec

Add thrift support for exchange server

16954a0

Rename HttpPageBufferClient to PageBufferClient

afbcf62

Add ThriftRpcShuffleClient implementation

2fdcd7d

Add thrift communication protocol config

ee0f370

Add thrift support for v1/task/results

5913cfc

Rewrite page too large exception based on network protocols

c03c4ce

Add TestHiveDistributedQueriesWithThriftRpc

2e89fe2

Update NodeState with enum values

d8bc424

Add server info server side thrift implementation

7991e21

Abstract RemoteNodeState HTTP implementation to HttpRemoteNodeState

d370eb5

Add server info client side thrift implementation

32663a9

Make acknowledgeResults to return future

7c6f167

The return result of acknowledgeResults does not need handling. Use a future to make the call non-blocking instead of feeding it to a thread pool.

highker force-pushed the thrift2 branch from bdaf87e to 7c6f167 Compare January 17, 2020 00:19

highker merged commit 20c7af7 into prestodb:master Jan 17, 2020

caithagoras mentioned this pull request Feb 20, 2020

Add release notes for 0.232 #14130

Merged

8 tasks

hackeryang mentioned this pull request Apr 23, 2023

[Design] Disaggregated Presto Coordinators #15453

Open

Support internal communication with thrift #13894

Support internal communication with thrift #13894

Uh oh!

Conversation

highker commented Dec 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

highker commented Dec 24, 2019

Uh oh!

Uh oh!

wenleix commented Dec 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

highker commented Dec 24, 2019

Uh oh!

wenleix commented Dec 24, 2019

Uh oh!

arhimondr commented Dec 26, 2019

Uh oh!

mayankgarg1990 commented Dec 26, 2019

Uh oh!

highker commented Dec 26, 2019

Uh oh!

wenleix commented Jan 8, 2020

Uh oh!

wenleix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenleix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenleix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guhanjie commented Apr 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdcmeehan commented Jul 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

highker commented Dec 24, 2019 •

edited

Loading

wenleix commented Dec 24, 2019 •

edited

Loading

guhanjie commented Apr 27, 2022 •

edited

Loading