Skip to content

Conversation

@aosagie
Copy link
Contributor

@aosagie aosagie commented Jul 1, 2017

What changes were proposed in this pull request?

Currently, each application and each worker creates their own proxy servlet. Each proxy servlet is backed by its own HTTP client and a relatively large number of selector threads. This is excessive but was fixed (to an extent) by #18437.

However, a single HTTP client (backed by a single selector thread) should be enough to handle all proxy requests. This PR creates a single proxy servlet no matter how many applications and workers there are.

How was this patch tested?

.
The unit tests for rewriting proxied locations and headers were updated. I then spun up a 100 node cluster to ensure that proxy'ing worked correctly

@jiangxb1987 Please let me know if there's anything else I can do to help push this thru. Thanks!

@cloud-fan
Copy link
Contributor

OK to test

@aosagie
Copy link
Contributor Author

aosagie commented Jul 2, 2017

Hi @cloud-fan,
Any reason the test results didn't get posted? Is there some step I need to take?

@SparkQA
Copy link

SparkQA commented Jul 2, 2017

Test build #3831 has finished for PR 18499 at commit 2b907df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cc @jiangxb1987

@jiangxb1987
Copy link
Contributor

cc @gurvindersingh Could you look at this please?

@jiangxb1987
Copy link
Contributor

cc @sarutak @gengliangwang Could you look at this when you have time? Thanks!

@srowen
Copy link
Member

srowen commented Jul 4, 2017 via email

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it's a bit extreme to just have a single selector thread. Browsers fetch elements of a page concurrently and it will help for a more responsive user experience if we serve them in parallel.
When I reload the admin ui page, it seems as if 10+ requests are sent concurrently. For the sake of web ui latency I think we could be a bit more generous with selector threads - in particular with this great improvement that reduces the number of proxy servlets to just 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @IngoSchuster
I can go with whichever number people find best. My main goal with this patch was to not have that number grow with the number of workers and applications (since my org runs a lot of applications and ran into issues with Master not being responsive).

From my research, I found the following quote from a Jetty committer: "For 5k clients for normal HTML pages you can easily go by with 1 selector." (See: https://dev.eclipse.org/mhonarc/lists/jetty-users/msg04751.html). But, this was in the context of a server and not necessarily a client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree @aosagie - it's much better if we don't scale the number of selector threads up with the number of executors.
But since we don't serve simple html pages but also page(elements) that might have a higher latency than simply serving static files, I think we should not risk that the page is build up from serial requests where latency adds up - my server for example has 170ms network latency.
I suggest that we set the number of selector threads to 8. This is certainly not excessive but still allows to serve a page with several elements in parallel.

@IngoSchuster
Copy link
Contributor

IngoSchuster commented Jul 5, 2017 via email

@aosagie aosagie force-pushed the fix/minimize-proxy-threads branch from 2b907df to ef3048c Compare July 5, 2017 23:29
@aosagie
Copy link
Contributor Author

aosagie commented Jul 11, 2017

@cloud-fan Hi, I just pushed a change to up the selector threads from 1 to 8. Can I get a retest please?

@jiangxb1987 Sorry to bother, but is there anyone available to give guidance on what I need to do to push this PR forward? This bug renders the master completely unresponsive for us in production.

@aosagie
Copy link
Contributor Author

aosagie commented Aug 2, 2017

Hey @ajbozarth. Any chance you could provide a review or some guidance on anything I can do to make this PR more amenable?

@gengliangwang
Copy link
Member

retest this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments for the above code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just keep the previous code here. If the number of available processors is smaller than 8, e.g. only one processor, we don't need to make it 8 threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can save use a variable to save the second parameter.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2017

Test build #80216 has finished for PR 18499 at commit 45dd13c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aosagie aosagie force-pushed the fix/minimize-proxy-threads branch 2 times, most recently from f998727 to 104c54a Compare August 4, 2017 04:15
@aosagie aosagie force-pushed the fix/minimize-proxy-threads branch from 104c54a to 70c93b0 Compare August 6, 2017 10:17
@cloud-fan
Copy link
Contributor

retest this please

@ajbozarth
Copy link
Member

I don't see any code issues, but I'm confident enough in my proxy knowledge to give this a "lgtm"

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80355 has finished for PR 18499 at commit 70c93b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in ae8a2b1 Aug 9, 2017
@cloud-fan
Copy link
Contributor

@aosagie do we still need the previous fix for SPARK-21176?

@IngoSchuster
Copy link
Contributor

Yes I believe it still makes sense to limit the number of selector threads. Jetty's default is tuned for a real web server and by default it creates (number of cpus / 2) selector threads. Right now we have limited the number of threads to 8 which I believe is more than enough to serve the spark admin UI.

@aosagie
Copy link
Contributor Author

aosagie commented Aug 11, 2017

Yes, I agree with @IngoSchuster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants