[SPARK-20529][Core]Allow worker and master work with a proxy server#17821
[SPARK-20529][Core]Allow worker and master work with a proxy server#17821zsxwing wants to merge 2 commits intoapache:masterfrom
Conversation
|
Test build #76351 has finished for PR 17821 at commit
|
sameeragarwal
left a comment
There was a problem hiding this comment.
Looks solid, just some minor comments. Thanks!
| case class RegisteredWorker( | ||
| master: RpcEndpointRef, | ||
| masterWebUiUrl: String, | ||
| masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse |
There was a problem hiding this comment.
Can we avoid adding an extra field here? Perhaps just put the masterAddress in the master field.
There was a problem hiding this comment.
Checked the current codes. Unfortunately, we cannot remove this extra field. master.address and masterAddress are different.
There was a problem hiding this comment.
Alright, that sounds good.
| registerMasterFutures.foreach(_.cancel(true)) | ||
| } | ||
| val masterAddress = masterRef.address | ||
| val masterAddress = masterAddressToConnect.get |
There was a problem hiding this comment.
How about we conf protect this change (with a default that still uses masterRef). If we can merge master and masterAddress as I suggested above, we can just add a conf on the master and the worker code can be largely unaffected.
sameeragarwal
left a comment
There was a problem hiding this comment.
LGTM, just a small question. Thanks!
| case class RegisteredWorker( | ||
| master: RpcEndpointRef, | ||
| masterWebUiUrl: String, | ||
| masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse |
There was a problem hiding this comment.
Alright, that sounds good.
| } | ||
| val masterAddress = masterRef.address | ||
| val masterAddress = | ||
| if (preferConfiguredMasterAddress) masterAddressToConnect.get else masterRef.address |
There was a problem hiding this comment.
Perhaps it isn't an issue but do you think we should fall back to masterRef.address in case masterAddressToConnect isn't set (instead of throwing a generic scala exception)? Something along the lines of:
val masterAddress = masterAddressToConnect match {
case Some(master) if preferConfiguredMasterAddress => master
case _ => masterRef.address
}There was a problem hiding this comment.
Right now masterRef and masterAddressToConnect are set at the same time. It's impossible unless we break something in future. It's better to fail rather than hiding the broken change.
|
Test build #76392 has finished for PR 17821 at commit
|
|
Test build #3684 has finished for PR 17821 at commit
|
|
Thanks! Merging to master and 2.2. |
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes #17821 from zsxwing/SPARK-20529. (cherry picked from commit 9150bca) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#17821 from zsxwing/SPARK-20529.
## What changes were proposed in this pull request? In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy. In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side. There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker. ## How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#17821 from zsxwing/SPARK-20529.
What changes were proposed in this pull request?
In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.
In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.
There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.
How was this patch tested?
The new added unit test.