-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetty and Loom #5078
Comments
I'm working on Project Loom. If you run into any questions, or issues with the early access builds, then you are welcome to bring them to the OpenJDK loom-dev mailing lists. As it happens, I did create a demo that embeds Jetty and it was very easy to get started. I created a |
For our experiments with Loom, these are the questions that I'd really like answered:
|
State of Loom provides a good overview/status of Project Loom. There is a section on pinning that provides an overview of the short term limitations with respect to parking while holding a monitor. Fairness is not changed in the current prototype. Loom doesn't use "cooperative multithreading" (there are no explicit scheduling points). |
@AlanBateman interesting read... But I'll have to go over it a few more times to fully digest. I am definitely concerned with pinning as we are just not in control of what applications will do. Consider a HTTP2 server, where the flow control is done in user space. If an application writes to a response from within a synchronised block, then that thread could become pinned if the write blocks because the flow control window is entirely consumed. Then if the frame that would open that flow control window is handled by a virtual thread, it may never get executed because all the real cores are attached to pinned virtual threads. We currently optimise the scheduling of this situations by using reserved threads, if we know a thread is available to continue handling flow control, then the current thread can continue from parsing a frame to handling that frame... with a hot cache. Not sure how to handle this with Loom? Perhaps we would need to have a couple of real threads always doing the IO selection and handling of control frames and then passing off to virtual threads for application handling??? But then we will always run the applications with cold CPU caches. Hmmmm So it will be interesting to get Jetty running in Loom and test it with a load in a way to see if we get such problems. However, ultimately I doubt that a server that has been so specifically optimised for running async IO on OS Threads is going to be the best usage of Loom. A more interesting approach would be to use the core infrastructure of jetty to assemble a non-async server that uses/assumes Loom. Ie if we have 10,000 HTTP connections each with a 100 Streams, then we just allocate 1,000,000 virtual threads and don't bother with all the async complexities that we go on with. Hmmm or would we allocation 10,000 threads, one for each connection, that would just run the HTTP2 protocol and then 1,000,000 threads that each ran the application/session. Each connection processing thread would then hand off work to one of 100 application/session threads.... and we'd have to be clever to try to get that executing on the same real thread so the cache would be hot...... and we'd still need to solve the pinning issue... but maybe 1 work stealing real thread could cover that. So yep, I think it will be interesting for us to replace our synchronized with Locks, add a different Thread "pool" and see how it goes. However, ultimately I think we'd only really be fair on Loom if we wrote a new connector type that wasn't intrinsically async.... this would not be too hard to do, but we still have the issue that the input/output streams we give to the applications are implemented as async under the hood, so applications wouldn't really be using Loom preemption on IO. So to remove the async assumption from HttpChannel/HttpInput/HttpOutput is a fair bit of work.... but ultimately if we really want to know if the Loom approach really is scalable, then somebody needs to write a server that fully embraces the approach. |
@gregw My understanding is that while locks and platform IO are considered "logical scheduling points", they aren't required for another virtual thread to preempt and they can be interrupted just like normal threads. I'm not entirely sure on that though and would have trouble proving it, since I can't come up with a case that would show it. I think that is the purpose of the |
My instinct with the concerns about reserved threads and how Jetty currently does scheduling is that if those concerns do end up being valid, a new scheduler roughly matching Jetty's current semantics could be written and used in place of ForkJoinPool. |
Running with the system property @bowbahdoe Ignore the Continuation and tryPreempt for now. Yes, there is support at the lower level for forced preemption but this is not exposed to custom schedulers at this time. |
More pondering on what we'd need to change to make best usage of Loom. I no longer think we need to change However, we probably could experiment with writing a loom specific For HTTP2, it would probably still be a Loom virtual thread per connection, but as there are multiple streams we would have to examine how that virtual thread executed tasks for each frame so that it efficiently handed them over to another Loom virtual thread. Ideally we probably need to specialize the Loom scheduluers and our executor so that if possible the same real thread with a hot cache would go on to run the frame task and call the servlet.... but we'd need to come up with a mechanism to avoid letting the last real thread be dispatched into the servlet container... where it could be pinned and we'd be screwed. But I think we already have all the info we need on our tasks regarding if they can or will block, so we probably have the ability to write a Loom scheduler to actually implement Eat-What-You-Kill as its core strategy. So replacing our synchronizes and thread pool should allow Loom to run OK, but I think we really need to consider next steps to really give it a fair go. |
If it helps, here's the stack trace of a simple service that fetches a resource from another endpoint. It's running on a virtual thread so the blocking operation, to establish the TCP connect to the remote service, just parks the virtual thread (and releasing the underlying carrier thread to do other work).
|
Disclaimer: These are not benchmarks. (but ...) I was curious to see what a simple ThreadPool change would do. Code at https://github.com/jetty-project/jetty-loom The results: With Loom
Without Loom (Using QTP)
|
@joakime Can you run that test but with the 0s replaced by Integer.MAX_VALUE just to see how/if that affects things? |
@bowbahdoe Results with Loom (edit: now using same ab command line as before)
|
Added a AbstractConnection option to use Loom virtual threads for invoking the application.
manually shaded jetty-util classes into jetty-start
Don't build the Maven Plugins because they're not ready for Java 16. Signed-off-by: Simone Bordet <[email protected]>
Using Loom to execute the HttpChannel. Signed-off-by: Simone Bordet <[email protected]>
Fixed logic to use Loom in ReadCallback. Signed-off-by: Simone Bordet <[email protected]>
@lukago the c10k problem is solved since long time, see https://webtide.com/do-looms-claims-stack-up-part-1/ where an untuned laptop can do 32k threads. We would love to hear what your use case! |
@sbordet What i mean is if we can achieve with jetty loom something like in this example based on netty: https://github.com/Jotschi/vertx-c10k-example I runned similar tests (-c > 1000) for jetty with fiber based thread pool based on this example: https://github.com/tipsy/loomylin (javalin is based on jetty)
Despite using fiber thread pool there is 759 socket errors. I guess this happen because in this example jetty connector does not utiize fiber threads and declines new connections after reaching some critical point of ~250 connections. I see you have done some tweaks in connectors here: https://github.com/eclipse/jetty.project/compare/jetty-10.0.x-loom |
Also i wonder if tuning jetty connectors for fibers will get its performance closer to what we see in netty example where for 10k concurrent connections it handles 7k req/s. |
@lukago I have pushed Jetty+CometD to 400k connections a few years ago (I think it was Jetty 8) so c10k is not a problem since many years. We have clients in production that have > 100k connections on a single server, running easily. I have not run I just run the CometD benchmark with 10k connections at ~45k requests/s easily on my laptop:
So, c10k is not a problem, provided you are async on the server. |
@lukago I doubt we will ever use Loom within jetty for connectors. Jetty is already fully async internally and we have to do a lot of clever things to prevent head-of-line blocking and even dead locks if important tasks like HTTP/2 flow control get deferred. Virtual threads can easily be deferred, so we just don't think they are suitable (I'll say yet... but I dubious they eve will be). However, using Loom virtual threads to dispatch to an application that is written in blocking mode is something that we have already implemented in our test branch and is something that is very likely to reach a main branch if Loom ever makes it to a released JVM. That will allows many thousands of virtual threads to block in the application. That could be 10s or 100s or 1000s of thousands, depending on how many other resources the application uses. I'm not yet convinced this will give as good as results as writing async applications, but it should be in the ball park and it will definitely be much easier to write and maintain. |
@sbordet what do you mean by async APIs, do you have any example? I dont have any knowledge about cometD but using Async Servlet seems to be not enough as I still get only ~250 max concurrent connections. I am mostly interested in sync applications but it will be good to know how to configure it for async apps too. @gregw so if i run jetty from your test branch with config Edit: Ok, now i get it, the problem with connections was indeed on client side, i fixed it with Thanks! :) # |
@lukago Just a general note that might be useful for spectators to the discussion even if its not what you were asking exactly. An "async api" is any api that will not deliver its result immediately, such as later calling a callback or by returning an object that callbacks can be attached to. Async Apis: // Async api 1
// Will call the callback "later" maybe on a different thread
void getInteger(Consumer<Integer> whenDone) { ... }
// Usage
getInteger(x -> System.out.println(x + 1));
// Async api 2
// Will return an object that results can be chained on
Future<Integer> getInteger() { ... }
// Usage
getInteger()
.then(x -> x + 1)
.then(x -> System.out.println(x)); Synchronous Apis: // Returns when result available
Integer getInteger() { ... }
// Usage
int x = getInteger();
x = x + 1;
System.out.println(x); If most of your code is written using synchronous apis there won't be much or any performance benefit to using async servlets simply because there won't be "explicit yield points" that can be taken advantage of. The "seams" added by the callbacks or the Futures are what is used to "juggle" tasks between OS threads. |
@gregw I'd like to aske one more thing about jetty I/O model. As I understand current version of Jetty use similar model to Netty for handling I/O. So there is a separate thread pool where each thread (aka event loop) is asking kernel for new I/O events and then dispatching it to another thread pool for blocking servlets or doing evertyhing on I/O pool for async servlets. |
Lukasz,
I'm not familiar with Netty's internals, but our scheduler is not like you
describe.
Having a selector thread that always dispatches to a thread pool means that
there is always extra latency and often you will end up with a cold CPU
cache and get parallel slowdown (see
https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/)
If you never dispatch, then you end up with deadlock and/or head-of-line
blocking.
So we have a more adaptive strategy called "Eat what you Kill". Details
here: https://webtide.com/eat-what-you-kill/
Currently I cannot see how that can be implemented with virtual threads in
a beneficial way, however the strategy is adaptive enough to use virtual
threads when appropriate, as we have done with our Loom branch.
regards
…On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski ***@***.***> wrote:
@gregw <https://github.com/gregw> I'd like to aske one more thing about
jetty I/O model. As I understand current version of Jetty use similar model
to Netty for handling I/O. So there is a separate thread pool where each
thread (aka event loop) is asking kernel for new I/O events and then
dispatching it to separate thread pool for blocking servlets or doing
evertyhing on I/O pool for async servlets.
What If we change I/O thread pool to be fiber based as well? Will it be
beneficial for overall performance?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5078 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA>
.
--
Greg Wilkins <[email protected]> CTO http://webtide.com
|
Try now as it looks complete to me.
…On Tue, 11 Apr 2023 at 14:12, xodiumluma ***@***.***> wrote:
Lukasz, I'm not familiar with Netty's internals, but our scheduler is not
like you describe. Having a selector thread that always dispatches to a
thread pool means that there is always extra latency and often you will end
up with a cold CPU cache and get parallel slowdown (see
https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/) If you never
dispatch, then you end up with deadlock and/or head-of-line blocking. So we
have a more adaptive strategy called "Eat what you Kill". Details here:
https://webtide.com/eat-what-you-kill/ Currently I cannot see how that
can be implemented with virtual threads in a beneficial way, however the
strategy is adaptive enough to use virtual threads when appropriate, as we
have done with our Loom branch. regards
… <#m_1784099298914155251_>
On Wed, 6 Jan 2021 at 14:16, Łukasz Gołębiewski *@*.***> wrote: @gregw
<https://github.com/gregw> https://github.com/gregw I'd like to aske one
more thing about jetty I/O model. As I understand current version of Jetty
use similar model to Netty for handling I/O. So there is a separate thread
pool where each thread (aka event loop) is asking kernel for new I/O events
and then dispatching it to separate thread pool for blocking servlets or
doing evertyhing on I/O pool for async servlets. What If we change I/O
thread pool to be fiber based as well? Will it be beneficial for overall
performance? — You are receiving this because you were mentioned. Reply to
this email directly, view it on GitHub <#5078 (comment)
<#5078 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAARJLIIPT7HR2KEXILUQODSYRPCLANCNFSM4PHJKJFA
.
-- Greg Wilkins ***@***.*** CTO http://webtide.com
Hi @gregw <https://github.com/gregw>,
Hope you are well.
The "Eat what you kill" page <https://webtide.com/eat-what-you-kill/> is
chopped off at the bottom. Could your team please reinstate the missing
content? It currently reads, "If the request is consumed by a different
thread, then all the request data must be loaded into the new CPU c" <-
there it ends.
Thanks!
—
Reply to this email directly, view it on GitHub
<#5078 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAARJLOKKK74QJUIJMMIMYTXAVDC5ANCNFSM4PHJKJFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Greg Wilkins ***@***.***> CTO http://webtide.com
|
Thanks @gregw, it started working about twenty minutes after I posted, so I deleted the post. |
@wendal yes it works... unless you are saying otherwise? |
Jetty version
10.0.0-SNAPSHOT
Java version
Project Loom Pre-release JDK Build
Question
I am experimenting with the project loom pre release builds and I am trying to figure out how to properly configure Jetty to make use of virtual threads.
Quite a bit of the code seems centered around thread pooling and managing capacity, but that isn't quite as applicable to virtual threads. I figure I could change "max threads" up to a really high number, but there is still logic for checking the capacity of a thread pool - even if backed by a
Executors.newUnboundedVirtualThreadExecutor()
- which I am thinking would be wasteful in that context.I guess this is partly a "Jetty Architecture" question more than anything else - I'm just looking for some pointers on where to start with the codebase to make an eventual upgrade work.
The text was updated successfully, but these errors were encountered: