-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server-side timeout mechanism #10360
base: master
Are you sure you want to change the base?
Conversation
caa1124
to
3798f55
Compare
f323668
to
f26f928
Compare
Previously, a task was cancelled using /** null thread means the task is invalid and will do nothing */
private final AtomicReference<Thread> threadReference = new AtomicReference<>();
......
threadReference.set(null); But error-prone asks me not to ignore the
So I now use Future for cancellation. The difference is:
|
@sorra the approach in this PR is quite different from @ejona86 #9684 (comment) : specifically "creating a Context with your deadline and then adding a cancellationListener for that Context on when it closes and call ServerCall.close()" etc. Also note the second part of the comment "However, ServerCall will need its methods to become synchronized and you'll want to ignore calls to sendMessage(), close(), and the like after the deadline expires." This either needs a much bigger change in gRPC or we can say this is really something for the application to do. |
@sanjaypujare thank you.
|
I think the |
Sorry too busy these days.
void deliver() {
try {
executor.execute(this);
} catch (Throwable t) {
log.log(Level.INFO, "Exception notifying context listener", t);
}
}
/**
* Close the {@link ServerStream} because an internal error occurred. Allow the application to
* run until completion, but silently ignore interactions with the {@link ServerStream} from now
* on.
*/
private void internalClose(Status internalError) {
log.log(Level.WARNING, "Cancelling the stream with status {0}", new Object[] {internalError});
stream.cancel(internalError);
serverCallTracer.reportCallEnded(internalError.isOk()); // error so always false
} |
It is difficult to test threading behavior in unit tests. |
Cancellation listener is exactly what you want for stopping the application. The RPC can be cancelled for many reasons, like Deadline, the client explicitly cancelled, or I/O failures. It'd look like: @Override
public void onHalfClose() {
Context context = Context.current();
Thread currentThread = Thread.currentThread();
Context.CancellationListener cancelled = c -> currentThread.interrupt();
context.addListener(cancelled, Executors.directExecutor());
try {
super.onHalfClose();
} finally {
// You must remove; you don't want to interrupt the wrong thread
context.removeListener(cancelled);
}
} |
Does this depend on receiving the HalfClose from the client? If the sever wants to time out without having to depend on a halfClose from the client (say the client has died or connectivity is lost) do we need to do something more? |
Thank you. This is my updated approach using CancellableContext and CancellationListener: try (Context.CancellableContext context = Context.current()
.withDeadline(Deadline.after(timeout, unit), scheduler)) {
Thread thread = Thread.currentThread();
Context.CancellationListener cancelled = c -> {
if (c.cancellationCause() == null) {
return;
}
thread.interrupt();
// logging ......
};
context.addListener(cancelled, MoreExecutors.directExecutor());
context.run(invocation);
return true;
} Ordinary |
Yesterday it worked in production 👏🏻
|
It is assuming that the RPC is unary or server-streaming. To extend it to more cases, you'd add the listener to more callbacks.
Note that your approach does not close the RPC. So the RPC is still consuming memory in gRPC when your application returns and the client is left hanging. If your application is handling the interruption by cancelling the RPC, then it'd work fine, although would be fragile. This is looking in a state where it is useful to you, but needs serious changes to be accepted into gRPC. It it is too specialized and error-prone at the moment. For gRPC, we'd want the interruption handling to be a separate interceptor from the Context/Deadline handling, since they are separate features. The Deadline handling should also create the new Context within interceptCall and use Contexts.interceptCall() to set it on the thread for each callback, as interceptors expect a consistent Context. And when the deadline expires, the RPC would need to be closed. Closing the RPC is pretty annoying to make thread-safe, but we could share code with |
@ejona86 Thank you for the review comment.
My approach only tries to stop the application RPC method invocation (not to stop other stages because there should be other mechanisms to handle other stages properly) and is based on the assumption that What about this: let
In both conditions, RPC can be completed at a determined state with memory freed (except that the application is not interrupted and runs infinitely, which is the original problem that I want to solve with this PR). If the application does not know how to deal with interruption, it can simply do nothing about it.
Sorry what does this mean?
Is this a statement about the status quo, or an ask for improvement? |
@ejona86 I think I get most of your point after more learning.
Such a context is shared by each callback, so a holistic timeout is applied to the whole lifecycle, which is better than a halfClose-only timeout, right?
It ensures the client will not hang even if the application forgets to send a response on timeout (though I think it should not happen because the application should either do it correctly or just not do it so a Status.UNKNOWN will be returned).
But I still do not know what this means. How to ensure the correct thread is interrupted if interruption handling is a separate interceptor? Could you please explain more about the design you expect? |
My recent changes on September 3:
|
* access by serializing everything on an executor. | ||
*/ | ||
@ExperimentalApi("https://github.com/grpc/grpc-java/issues/2189") | ||
class SerializingServerCall<ReqT, RespT> extends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is extracted from TransmitStatusRuntimeExceptionInterceptor
.
d7a917e
to
0c0d2a9
Compare
0c0d2a9
to
06d771c
Compare
September 16 update:
|
…r call should have been closed)
Hi @ejona86 @sanjaypujare , would you like to review this PR? |
I know you are busy reviewing so many PRs. |
Hello! |
gRPC Java Server-side Timeout Design
Author: Dongqing Hu
Date: 2023-09-16
References:
Feature request #10361
Pull request #10360
Intention
Regarding #9684 , there have been multiple asks in the community for the server-side timeout. Servlets and database connections support timeout, so why gRPC does not provide it?
Our application is having such a problem. The grpc-java server usually runs workers in a ThreadPoolExecutor with a maximum size (infinite size is not better with the problem). If some server calls run infinitely (e.g. in an infinite loop or in a waiting state infinitely), they will occupy some threads. And if this situation persists, eventually all threads in the pool will be occupied and no work can be done anymore, which results in a service downtime.
The client-side timeout only helps the client stop infinite waiting, it does not help the server stop infinite processing.
So the server needs a forced timeout. Per the comment by @ejona86 #9684 (comment) , application developers can do it via a server interceptor. But such a thing is not straightforward enough for application developers to implement on their own. So why not provide a built-in API in the grpc-java framework?
Alternatives
Alternative 1:
Apply AOP (dynamic proxy or byte code weaving) to each application RPC service class, the AOP intercepts each application RPC method with timeout control.
Alternative 2:
Each RPC method explicitly delegates the request handling to another executor that supports timeout control. These methods must remember to pass along the gRPC context to the another executor.
These alternatives are too invasive to the application.
Design
Overall Flow
As we know. After
startCall
, a server call will go through listenable stages likeonReady
,onMessage
,onHalfClose
, andonComplete
/onCancel
.A new interceptor is introduced in the util module. It can intercept
startCall
and create aCancellableContext timeoutContext
using the configured timeout, and the timeout context has a cancellation listener to close the server call withStatus.CANCELLED
. ThetimeoutContext
is attached to each stage, so each stage is able to know if timeout is reached by callingcontext.isCancelled()
. Whether each stage checkscontext.isCancelled()
or not, the server call is eventually closed (is this enough?).The core code is like:
Especially, if option
shouldInterrupt
== true, the unary server call'sonHalfClose
will have an additional cancellation listener to interrupt the current thread (the thread is in the execution of the application RPC method). Eventually, if timeout is not reached,onComplete
/onCancel
will normally cancel the timeout context to allow it to be garbage collected.Notable Details
Serializing:
SerializingServerCall
is used to close the server call thread-safely.Status:
If the timeout is reached, it always results in a Status.CANCELLED with description "server call timeout".
Interruption:
shouldInterrupt
== true and the stage isonHalfClose
of a unary server call (where the application RPC method is invoked). If interruption has been performed, the interrupt state is always reset when leavingonHalfClose
. This is to allow the worker thread to be safely reused for the next task in a ForkJoinPool. For more information, refer to https://bugs.openjdk.org/browse/JDK-8223430.Pending Questions