-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is poll_ready worth it? #626
Comments
One significant benefit of Also, separating waiting for readiness from handling requests allows backpressure to be pushed to the network layer. For example, because all I think this may be a better way to explain the discussion around "expensive requests" --- by not accepting incoming connections until the Does that help explain some of the motivation behind having |
As @hawkw mentions, there are some places where we really, truly need However, I'm definitely sympathetic to the issues you describe about the contract--that cloning invalidates any prior readiness, etc. I've found myself wanting to decompose trait PollReady {
type Call;
type Error;
fn poll_ready(&mut self, cx: Context<'_>) -> Poll<Result<Self::Call, Result>>;
}
trait Call<Req> {
type Response;
type Error;
type Future: Future<Output = Result<Self::Response, Self::Error>>;
fn call(self, req: Req) -> Self::Future;
} This would use the type system to more elegantly solve the problem that |
At a very high level, I might summarize this by saying that if Potentially, we could also provide a mechanism for making decisions based on a service's readiness by simply exposing a |
Such a trait would be really nice for axum, which faces this exact problem. The workaround I ended on was to use I agree adding one more trait does add complexity though... |
@davidpdrsn with the split traits, Axum could simply depend on I think this is really worth pursuing a bit more deeply, but we'd want to understand more of the ergonomics tradeoffs before committing a substantial amount of time to this. But I really do think that it's vital to model readiness explicitly. If we just consider HTTP/2, we have stream concurrency limits that we need to honor. A system like tower shouldn't force us to make the innermost part of the stack handle this situation: we explicitly want to surface this state higher up the stack so we can make intentional decisions about whether to eagerly fail requests, route them elsewhere, etc. |
Thanks! I'm reasonably convinced that some form of readiness needs to be built in now, but I going to work through things just to check (and hopefully it will help to coalesce some thoughts). One thing I've realised from reading this is that readiness can only say "I definitely can't handle this" or "I might be able to handle this" (in the general case at least). While I don't think that drastically changes anything, it's useful context I hadn't quite internalised before.
I'm not convinced that these require poll_ready (although it would make them simpler), a MakeService like pattern would allow a router or network layer to wait for the actual service to be ready. However it does highlight that the MakeService option wouldn't be quite as transparent as I thought, the outermost service (be it network layer or a client) would have to be aware of it, and you'd probably want an "always ready" wrapper at the end so that everything looked the same. I have just realised that the MakeService/ReadyService pattern is pretty hard to compose, as you can't just nest them, although that's similar to the awkwardness of composing MakeService's currently.
This is almost exactly composing the MakeService/ReadyService into the type system right? That's probably a plus but I have a suspicion that it's more complicated than poll_ready (with tokens).
I'm not sure I agree with that, it's perfectly possible to encode readiness in the types as they exist today without poll_ready. However, that's pretty complicated and leads to a bunch of awkward code and it seems that there's many use-cases for readiness based decision making. So I'm leaning towards the answer to "Is poll_ready worth it?" being "Yes", that is, it's not necessary but the complexity of not having it is greater than the complexity of having it (or equivalent).
I'm not convinced that it would be nice, do you really not want to be able to route to services that can sometimes not be ready? I think this is just a case where you need to lean into readiness meaning "I definitely can't handle this" or "I might be able to handle this". You can return not ready if none of the routes are ready, and otherwise say ready. Otherwise, as that link mentions you'd need a horribly complicated system to tell the caller what's ready (essentially you'd need to see the request, which is what call does). |
A (maybe?) simpler alternative to two traits would be to add a generic parameter to |
@stuhood that approach was proposed in #412. At the time, some people looked into what an implementation of that design would look like, and you're correct that, unfortunately; it probably can't be implemented without GATs. Which is a shame, because it would be quite nice otherwise. Perhaps that proposal can be re-visited if stable Rust ever gets GATs. |
pub trait Service<Request> {
type Response;
type Error;
type ReadyToken;
type Future<'a>: Future<Output = Result<Self::Response, Self::Error>>
where
Self: 'a;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<Self::ReadyToken, Self::Error>>;
fn call(&mut self, token: Self::ReadyToken, req: Request) -> Self::Future<'_>;
}
Talking to @hawkw on Discord suggests I misunderstood the GAT requirement, it seems that the idea was that ReadyToken needs to be a GAT but as above that doesn't work. Although I did dig out this suggestion from @jonhoo from 2021-04-01 that's incredibly similar to what's been mentioned already: trait Service {
type Token<'s>: ServiceToken;
fn poll_ready<'s>(&'s mut self) -> Self::Token<'s>;
}
trait ServiceToken<Request> {
fn call(self, req: Request) -> Self::Future;
} |
I've been mulling over this for a while now and Amos' recent(-ish) article (https://fasterthanli.me/articles/futures-nostalgia) has pushed me to come back to it. I've come to the conclusion that my main issue with the current design is that it doesn't work, specifically concurrency limit has a massive footgun: This is kinda the scenario that disarm in #408 is designed for, but in this situation there isn't a very obvious point for hyper to disarm the service. It clearly should call poll_ready before it gets the request, otherwise poll_ready is mostly pointless, but it must disarm after a while or stalled connections withhold resources from active ones. I realise that so far I've mostly just been pointing out issues and not really suggested anything constructive so here's a proposal for what the Service trait could look like, and some reasoning as to why I think it solves most of the problems. pub trait Service<Request> {
type Response;
type Error;
type Future: Future<Output = Result<Self::Response, Self::Error>>;
fn is_ready(&self) -> Result<bool, Self::Error>;
fn call(&mut self, req: Request) -> Self::Future;
}
On driving a service to readiness: this removes the ability for trivially "driving" a service while you don't have a request. In general you only call On backpressure: |
Sorry to hijack this issue with a question, but I wanted to come back to a comment in the original description:
I'm new to the ecosystem but I want to make sure I fully understand this. Is this possible because the executor isn't required to immediately poll the future returned by call()?. Does this happen in practice or do some executors (tokio specifically) always do an immediate poll to avoid this case? The reason I'm asking this is I have written code like this (a shared upstream/egress service that is being called in response to downstream/ingress requests). There is definitely the chance that the downstream/ingress request may drop at any time, so now I'm concerned that there is an implicit leak in my code and I'm not quite sure what to do about it. Is there a pattern I am supposed to use instead of (I originally found this issue as I too was confused about the existing trait design.) |
I think it's a good pattern to follow. I haven't run into any leaks myself because of it. Rust code tends to be pretty good at freeing resources when they're dropped. |
But dropping is not the issue here, right? The issue is if some shared state is acquired without calling call() it will leak. I don't think dropping fixes that unless each service is carefully written that when it is dropped it will release shared state. And maybe this is the case for officially supported layers? For example: tower/tower/src/limit/concurrency/service.rs Lines 18 to 23 in 5064987
When dropped will release the permit, but if someone wrote a layer that say used a counter that expected to be balanced via call(), that call() might never happen even in the above call sequence? I'm mostly just trying to make sure I 100% understand the implicit assumptions that are being made here and it sounds like it is a requirement that when a service drops it must release any resources from a missing call()? (Again assuming the pattern of fast clones of services that might be using shared underlying state like connection pools or whatever.) |
Tokio'a semaphore releases the permit when it's dropped so that's not an issue with I'm sure it's theoretically possible if you write your service poorly but I'm just saying I've never heard about it happening. |
Typically, any resource acquired in tower/tower/src/limit/concurrency/service.rs Lines 83 to 92 in 5064987
If that future is canceled before it completes (say, by a timeout or something), the permit is still dropped when the future is dropped.
If shared state is acquired in I think the point @davidpdrsn is trying to make is that msot shared resource types will provide some form of RAII guard, such as a If you're implementing your own shared resource (e.g. the balanced counter you described in your comment), yes, you would be expected to use a similar RAII construct to ensure that shared resource is released in drops. But, this is the general Rust idiom for such code, which is what I think David is getting at. You would need to do this regardless of the design of the Footnotes
|
@hawkw thanks for the lengthy explanation. Yes, this all makes sense to me and now that I understand the contract it's not difficult to adhere to. I guess my main comment as someone new to the ecosystem is that it was not easy for me to get to this understanding of how poll_ready() interacts with call() and per the original issue description there are some implicit foot guns (especially as it was definitely not clear to me that I can propose a documentation improvement PR for poll_ready() if that would be accepted? |
Sure, we'd love any docs improvements! |
I took a stab here: #662 |
It appears like tower has inspired an imitator. https://github.com/cloudwego/motore I'm sharing here because they might have an interesting design option that I haven't seen considered here. pub trait Service<Cx, Request> {
/// Responses given by the service.
type Response;
/// Errors produced by the service.
type Error;
/// The future response value.
type Future<'cx>: Future<Output = Result<Self::Response, Self::Error>> + Send + 'cx
where
Cx: 'cx,
Self: 'cx;
/// Process the request and return the response asynchronously.
fn call<'cx, 's>(&'s mut self, cx: &'cx mut Cx, req: Request) -> Self::Future<'cx>
where
's: 'cx;
} |
I got to a similar spot #![feature(type_alias_impl_trait)]
use std::future::Future;
pub trait Service<Req> {
type Res;
type Err;
type Future<'a>: Future<Output = Result<Self::Res, Self::Err>> + 'a
where
Self: 'a;
fn call<'a>(&'a self, req: Req) -> Self::Future<'a>;
} |
Summary
I don't think that
poll_ready()
provides enough value for it's extra complexity, and can be removed entirely.Value of poll_ready
As I understand it (from Inventing the Service trait) poll_ready is a mechanism to provide back-pressure in the case that a service is not immediately available to handle a request. Each
Poll::Ready(Ok(()))
you receive gives you permission to send exactly one request in viacall
.Poll::Ready(Err(err))
indicates that this service will never be able to handle requests again.This allows you to check if a service is able to handle a request before constructing it, which could be a more efficient if holding the request around is expensive. Note that it doesn't help if constructing the request is expensive because once you've called poll_ready you are committed to calling the service (unless the service has stopped but that's incredibly rare in my experience).
Issues with poll_ready
Difficult to adhere to the contract of poll_ready
Imagine a middleware service like Load Shed but with an additional queue to handle bursts of traffic. That service wants to: return ready from poll_ready when the queue is empty, return pending from poll_ready if the queue isn't full, and return ready again if the queue is full so that it can return an overload error from call. Similar surprisingly complex logic appears in services like Steer where it must keep track of which inner service it hasn't received a ready token from.
Not calling call after poll_ready
It's very easy to leak resources if you fail to call the Service and run that future to completion after receiving the go-ahead from poll_ready, see #408 for some existing discussion. Note that this is basically impossible to avoid at the moment, even if you write
service.ready().await?.call(request).await?
, the future might be dropped after ready has returned but before call has been polled. Additionally, if constructing the request is expensive, it probably can also fail (or is itself asynchronous), which would mean you have to build it before calling poll_ready!I believe #412 would solve this issue, but at the same time adds more complexity to the API.
Implementing back-pressure without poll_ready
Fundamentally, I think that everything achieved by poll_ready can be achieved without it., and removing it would align much better with Rust's design philosophy of "make easy things easy and hard things possible"1.
Back-pressure
I am confused by the arguments in #3 and #112, so it's possible I'm missing something here, but moving the code from poll_ready to the start of (the future returned by) call in every Service would not change their behaviour. The only downside to doing this that I can see is that it would prevent you from delaying the construction of the request until the service is ready.
Delayed construction of requests
It's still possible to delay the construction of requests without poll_ready, using the
MakeService
pattern. You can implement aReadyService: (()) -> Result<OneshotService, Error>
, which gives you the ability to wait for something to be ready before constructing the request. It's easily composable with middleware on both the ReadyService and OneshotService side, and even gives you #412 for free because the OneshotService is, in essence, a token.Conclusion
One slightly odd result of all this is that Service would become ~identical to
FnMut(Request) -> F where F: Future<Output=Result<Response, Error>>
, which makes me wonder slightly on what value it's providing. I'm interested to here what I've missed, why poll_ready was implemented in the first place, and whether this is a possible future (pun intended). Sorry this is so long but it's been sitting in my head for a while now and I needed to get it out.Footnotes
There's probably a better source for this but: Rust API Guidelines - Type Safety ↩
The text was updated successfully, but these errors were encountered: