Skip to content

[SPARK-20529][Core]Allow worker and master work with a proxy server#17821

Closed
zsxwing wants to merge 2 commits intoapache:masterfrom
zsxwing:SPARK-20529
Closed

[SPARK-20529][Core]Allow worker and master work with a proxy server#17821
zsxwing wants to merge 2 commits intoapache:masterfrom
zsxwing:SPARK-20529

Conversation

@zsxwing
Copy link
Copy Markdown
Member

@zsxwing zsxwing commented May 1, 2017

What changes were proposed in this pull request?

In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.

In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.

There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.

How was this patch tested?

The new added unit test.

@zsxwing
Copy link
Copy Markdown
Member Author

zsxwing commented May 1, 2017

cc @sameeragarwal

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 1, 2017

Test build #76351 has finished for PR 17821 at commit 8ded9b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RegisteredWorker(

Copy link
Copy Markdown
Member

@sameeragarwal sameeragarwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, just some minor comments. Thanks!

case class RegisteredWorker(
master: RpcEndpointRef,
masterWebUiUrl: String,
masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding an extra field here? Perhaps just put the masterAddress in the master field.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the current codes. Unfortunately, we cannot remove this extra field. master.address and masterAddress are different.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that sounds good.

registerMasterFutures.foreach(_.cancel(true))
}
val masterAddress = masterRef.address
val masterAddress = masterAddressToConnect.get
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we conf protect this change (with a default that still uses masterRef). If we can merge master and masterAddress as I suggested above, we can just add a conf on the master and the worker code can be largely unaffected.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new conf

Copy link
Copy Markdown
Member

@sameeragarwal sameeragarwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small question. Thanks!

case class RegisteredWorker(
master: RpcEndpointRef,
masterWebUiUrl: String,
masterAddress: RpcAddress) extends DeployMessage with RegisterWorkerResponse
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that sounds good.

}
val masterAddress = masterRef.address
val masterAddress =
if (preferConfiguredMasterAddress) masterAddressToConnect.get else masterRef.address
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it isn't an issue but do you think we should fall back to masterRef.address in case masterAddressToConnect isn't set (instead of throwing a generic scala exception)? Something along the lines of:

val masterAddress = masterAddressToConnect match {
  case Some(master) if preferConfiguredMasterAddress => master
  case _ => masterRef.address
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now masterRef and masterAddressToConnect are set at the same time. It's impossible unless we break something in future. It's better to fail rather than hiding the broken change.

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 3, 2017

Test build #76392 has finished for PR 17821 at commit f4699ad.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 3, 2017

Test build #3684 has finished for PR 17821 at commit f4699ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Copy Markdown
Member Author

zsxwing commented May 16, 2017

Thanks! Merging to master and 2.2.

@asfgit asfgit closed this in 9150bca May 16, 2017
asfgit pushed a commit that referenced this pull request May 16, 2017
## What changes were proposed in this pull request?

In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.

In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.

There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between  master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.

## How was this patch tested?

The new added unit test.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #17821 from zsxwing/SPARK-20529.

(cherry picked from commit 9150bca)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@zsxwing zsxwing deleted the SPARK-20529 branch May 16, 2017 17:42
robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
## What changes were proposed in this pull request?

In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.

In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.

There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between  master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.

## How was this patch tested?

The new added unit test.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#17821 from zsxwing/SPARK-20529.
lycplus pushed a commit to lycplus/spark that referenced this pull request May 24, 2017
## What changes were proposed in this pull request?

In the current codes, when worker connects to master, master will send its address to the worker. Then worker will save this address and use it to reconnect in case of failure. However, sometimes, this address is not correct. If there is a proxy between master and worker, the address master sent is not the address of proxy.

In this PR, the master address used by the worker will be sent to the master, then master just replies this address back, worker will use this address to reconnect in case of failure. In other words, the worker will use the config master address set in the worker side if possible rather than the master address set in the master side.

There is still one potential issue though. When a master is restarted or takes over leadership, the work will use the address sent from the master to connect. If there is still a proxy between  master and worker, the address may be wrong. However, there is no way to figure it out just in the worker.

## How was this patch tested?

The new added unit test.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#17821 from zsxwing/SPARK-20529.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants