Skip to content

Conversation

@agrawaldevesh
Copy link

@agrawaldevesh agrawaldevesh commented Jul 6, 2020

What changes were proposed in this pull request?

This PR allows an external agent to inform the Master that certain hosts
are being decommissioned.

Why are the changes needed?

The current decommissioning is triggered by the Worker getting getting a SIGPWR
(out of band possibly by some cleanup hook), which then informs the Master
about it. This approach may not be feasible in some environments that cannot
trigger a clean up hook on the Worker. In addition, when a large number of
worker nodes are being decommissioned then the master will get a flood of
messages.

So we add a new post endpoint /workers/kill on the MasterWebUI that allows an
external agent to inform the master about all the nodes being decommissioned in
bulk. The list of nodes is specified by providing a list of hostnames. All workers on those
hosts will be decommissioned.

This API is merely a new entry point into the existing decommissioning
logic. It does not change how the decommissioning request is handled in
its core.

Does this PR introduce any user-facing change?

Yes, a new endpoint /workers/kill is added to the MasterWebUI. By default only
requests originating from an IP address local to the MasterWebUI are allowed.

How was this patch tested?

Added unit tests

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch 2 times, most recently from 24b32f8 to 112ac42 Compare July 6, 2020 19:29
@agrawaldevesh agrawaldevesh changed the title [WIP] Expose a (protected) /workers/kill endpoint on the MasterWebUI [WIP][SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI Jul 7, 2020
@agrawaldevesh agrawaldevesh changed the title [WIP][SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI Jul 8, 2020
@agrawaldevesh
Copy link
Author

@holdenk, @jiangxb1987 @cloud-fan @Ngone51 -- This PR is ready for your review please. Thanks !

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch 2 times, most recently from cc70cf2 to 93f2d52 Compare July 13, 2020 23:10
Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good only nits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will multiple requests block each other, since we use askSync here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah: We would wait for each one to be processed iteratively by the master's message handling thread. Having said that, the decommissioning does not block for actually sending/acking the messages to the executors. Its merely updating some (potentially persistent) state in the Master so shouldn't be that slow.

Having said that, would this be a problem ? I am assuming that the JettyHandler that the MasterWebUI is built atop can indeed handle multiple requests in flight, where some of them are blocking.

The use case for making this handler synchronous is so that the external agent doing the decommissioning of the hosts can know whether the cleanup succeeded or not. While this information is scrapeable from the MasterPage (that returns the status of the Workers), it would require some brittle scraping on the external end point. So I figured it would be better for this call to return the number of workers it was actually able to decommission.

I am happy to switch this logic to Async if you see any red flags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it such that the actual decommissioning is done async in the master: So now the DecommissionHostPorts call should be very quick and it is okay to be synchronous. ie, DecommissionHostPorts will simply enqueue multiple WorkerDecommission to decommission a worker at a time.

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch 2 times, most recently from fb662f9 to d8e241f Compare July 14, 2020 22:41
@cloud-fan
Copy link
Contributor

ok to test

case class Heartbeat(workerId: String, worker: RpcEndpointRef) extends DeployMessage

// Out of band commands to Master
case class DecommissionHostPorts(hostPorts: Seq[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe DecommissionWorkers? In the comment, we can say that the worker is identified by host and port(optional)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be confusing to say DecommissionWorkers but passes in sequence of hostPorts...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but DecommissionHostPorts is more confusing as you don't even know what it does unless you look at the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change to DecommissionWorkers then, since WorkerStateResponse also passes in host and port.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am narrowing the scope to simply decommission a bunch of hostnames, and all workers within a host will go away. This is the only production use case I have in mind and there is no need to design for the flexibility of wanting to decommission an individual worker on the node.

As such, I have renamed the API to DecommissionHosts and it takes a list of host names.

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch 2 times, most recently from 9ea178b to 08a4c9b Compare July 15, 2020 23:17
@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125911 has finished for PR 29015 at commit d8e241f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DecommissionHostPorts(hostPorts: Seq[String])

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch from 08a4c9b to 3ee87f3 Compare July 16, 2020 01:31
@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125922 has finished for PR 29015 at commit 9ea178b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@agrawaldevesh
Copy link
Author

jenkins retest this please

@agrawaldevesh
Copy link
Author

Retest this please.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125927 has finished for PR 29015 at commit 08a4c9b.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125935 has finished for PR 29015 at commit 3ee87f3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DecommissionHosts(hostnames: Seq[String])

@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch from 3ee87f3 to c6a6a90 Compare July 16, 2020 19:19
@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125998 has finished for PR 29015 at commit c6a6a90.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DecommissionWorkersOnHosts(hostnames: Seq[String])

This allows an external agent to inform the Master that certain
hosts are being decommissioned.

This alternative is suitable for some environments that cannot trigger a
clean up hook on the Worker that is needed today to inform the Master.

This new API also allows the Master to be informed of all hosts being
decommissioned in bulk by specifying a list of hostnames.

This API is merely a new entry point into the existing decommissioning
logic. It does not change how the decommissioning request is handled in
its core.

Added unit tests
@agrawaldevesh agrawaldevesh force-pushed the master_decom_endpoint branch from c6a6a90 to 31b231e Compare July 17, 2020 01:20
@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126012 has finished for PR 29015 at commit 31b231e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DecommissionWorkersOnHosts(hostnames: Seq[String])

@cloud-fan cloud-fan closed this in ffdbbae Jul 17, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants