Skip to content

Conversation

@IngoSchuster
Copy link
Contributor

What changes were proposed in this pull request?

Please see also https://issues.apache.org/jira/browse/SPARK-21176

This change limits the number of selector threads that jetty creates to maximum 8 per proxy servlet (Jetty default is number of processors / 2).
The newHttpClient for Jettys ProxyServlet class is overwritten to avoid the Jetty defaults (which are designed for high-performance http servers).
Once jetty/jetty.project#1643 is available, the code could be cleaned up to avoid the method override.

I really need this on v2.1.1 - what is the best way for a backport automatic merge works fine)? Shall I create another PR?

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
The patch was tested manually on a Spark cluster with a head node that has 88 processors using JMX to verify that the number of selector threads is now limited to 8 per proxy.

@gurvindersingh @zsxwing can you please review the change?

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #3815 has finished for PR 18437 at commit 2d7d4f1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def newHttpClient(): HttpClient = {
// Use the Jetty logic to calculate the number of selector threads (#CPUs/2),
// but limit it to 8 max.
val numSelectors = math.max(1,math.min(8,Runtime.getRuntime().availableProcessors()/2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail style tests @IngoSchuster -- we need spaces around operators, after commas, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aargh - what an unnecessary failure, sorry about that :(.
I've corrected the line in question, the Scala style tests do pass now.

override def newHttpClient(): HttpClient = {
// Use the Jetty logic to calculate the number of selector threads (#CPUs/2),
// but limit it to 8 max.
val numSelectors = math.max(1, math.min(8, Runtime.getRuntime().availableProcessors()/2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you still need spaces around /, and omit return on the next line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed that as well. The run-tests scala style checks pass with both the latest and the previous version.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @cloud-fan


override def newHttpClient(): HttpClient = {
// Use the Jetty logic to calculate the number of selector threads (#CPUs/2),
// but limit it to 8 max.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also explain why we are limiting the number of selectors in the comment above?

@jiangxb1987
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 30, 2017

Test build #78942 has finished for PR 18437 at commit c6198d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 30, 2017
…roxy servlets to 8

## What changes were proposed in this pull request?
Please see also https://issues.apache.org/jira/browse/SPARK-21176

This change limits the number of selector threads that jetty creates to maximum 8 per proxy servlet (Jetty default is number of processors / 2).
The newHttpClient for Jettys ProxyServlet class is overwritten to avoid the Jetty defaults (which are designed for high-performance http servers).
Once jetty/jetty.project#1643 is available, the code could be cleaned up to avoid the method override.

I really need this on v2.1.1 - what is the best way for a backport automatic merge works fine)? Shall I create another PR?

## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
The patch was tested manually on a Spark cluster with a head node that has 88 processors using JMX to verify that the number of selector threads is now limited to 8 per proxy.

gurvindersingh zsxwing can you please review the change?

Author: IngoSchuster <[email protected]>
Author: Ingo Schuster <[email protected]>

Closes #18437 from IngoSchuster/master.

(cherry picked from commit 88a536b)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

LGTM, merging to maser/2.2/2.1!

asfgit pushed a commit that referenced this pull request Jun 30, 2017
…roxy servlets to 8

## What changes were proposed in this pull request?
Please see also https://issues.apache.org/jira/browse/SPARK-21176

This change limits the number of selector threads that jetty creates to maximum 8 per proxy servlet (Jetty default is number of processors / 2).
The newHttpClient for Jettys ProxyServlet class is overwritten to avoid the Jetty defaults (which are designed for high-performance http servers).
Once jetty/jetty.project#1643 is available, the code could be cleaned up to avoid the method override.

I really need this on v2.1.1 - what is the best way for a backport automatic merge works fine)? Shall I create another PR?

## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
The patch was tested manually on a Spark cluster with a head node that has 88 processors using JMX to verify that the number of selector threads is now limited to 8 per proxy.

gurvindersingh zsxwing can you please review the change?

Author: IngoSchuster <[email protected]>
Author: Ingo Schuster <[email protected]>

Closes #18437 from IngoSchuster/master.

(cherry picked from commit 88a536b)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 88a536b Jun 30, 2017
@IngoSchuster
Copy link
Contributor Author

Thank you all for your support!

ghost pushed a commit to dbtsai/spark that referenced this pull request Aug 9, 2017
…and applications

## What changes were proposed in this pull request?

Currently, each application and each worker creates their own proxy servlet. Each proxy servlet is backed by its own HTTP client and a relatively large number of selector threads. This is excessive but was fixed (to an extent) by apache#18437.

However, a single HTTP client (backed by a single selector thread) should be enough to handle all proxy requests. This PR creates a single proxy servlet no matter how many applications and workers there are.

## How was this patch tested?
.
The unit tests for rewriting proxied locations and headers were updated. I then spun up a 100 node cluster to ensure that proxy'ing worked correctly

jiangxb1987 Please let me know if there's anything else I can do to help push this thru. Thanks!

Author: Anderson Osagie <[email protected]>

Closes apache#18499 from aosagie/fix/minimize-proxy-threads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants