-
Notifications
You must be signed in to change notification settings - Fork 687
Send halfClose immediately after messages to prevent late halfClose issues with Envoy #3031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
murgatroid99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be significantly simplified.
| /** | ||
| * Tracks whether halfClose has been sent to this child call. | ||
| */ | ||
| halfCloseSent: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either of these fields is necessary. nextMessageToSend should be enough to track all of the relevant state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we half close + increment nextMessageTosend, this line would't not work so callback of the message would not be executed, this is why it's added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new field, just make the message index a second argument to handleChildWriteCompleted. In sendNextChildMessage, be careful to capture childCall.nextMessageToSend in a message index variable before potentially incrementing it when sending the immediate half close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a commit for this feedback.
| // - If halfCloseIndex is 0, there are no messages, so send immediately | ||
| // - If halfCloseIndex is N, the last message is at index N-1 | ||
| // - If highestSentMessageIndex >= N-1, all messages have been sent | ||
| if (halfCloseIndex === 0 || call.highestSentMessageIndex >= halfCloseIndex - 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If halfCloseIndex is 0, then call.nextMessageToSend had better be >= -1, so there's no need to check halfCloseIndex === 0 separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll leave my comments here regarding the need for highestSentMessageIndex.
The condition
call.nextMessageToSend === halfCloseIndex || call.nextMessageToSend === halfCloseIndex - 1
was added as a quick way to confirm that the problem is related to half-close timing.
When testing this with client-streaming calls, keeping call.nextMessageToSend += 1; inside that if block results in
gRPC Error: 13 INTERNAL: Write error: write after end.
If that line is removed, no messages are sent from the client at all.
This behavior can be reproduced when the client sends only one message and then closes the stream. If multiple messages are sent before closing, the if condition does not match and the issue does not occur.
Additionally, the “Cardinality violations” tests in test-server-errors.ts also fail.
The root cause is that, for clientUnaryStreaming RPCs, a callback (Stream.onwrite) is attached to the call context and sent along with the message. When this callback is invoked, execution inside sendMessageWithContext is paused, and the next line in user code (call.end()) is executed.
At that point, underlyingCall.nextMessageToSend is 0 and halfCloseIndex is 1, making the following condition true:
call.nextMessageToSend === halfCloseIndex - 1.
As a result, halfClose is called on the stream before any message is actually sent. When execution of sendMessageWithContext later resumes, it attempts to write to an already ended stream, which leads to the write error.
Example reproducer, if you uncomment second write you'll see that issue is not reproducable:
app.post('/api/policies', (req, res) => {
// call client.UploadPolicies which is client-side streaming
const call = client.UploadPolicies((error, response) => {
if (error) {
console.error('gRPC Error:', error);
return res.status(500).json({
error: 'Failed to upload policies',
details: error.message
});
}
console.log('UploadPolicies response:', response);
});
let i = 1;
let policy = { id: `policy-${i}`, name: `Policy ${i}` };
const resp = call.write({ policy });
console.log('call.write called!');
// i++;
// policy = { id: `policy-${i}`, name: `Policy ${i}` };
// call.write({ policy });
call.end();
res.json({ message: 'Policies upload initiated' });
});There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it solves the problem to move that context.callback?.(); line into a process.nextTick call, and I think that makes sense anyway. That callback is called asynchronously in other cases, so it would be better to be consistent and make it asynchronous here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try that out, another solution came to my mind which might simplify the changes:
Since this issue happens with unary rpc calls, we could use sendMessageWithContext in here, and passing a new flag like WriteFlags.Endwith context so that in retrying-call the halfClose will be called right after sending message when the flag is set? Unrelated to current solution/changes, different changes will be introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a commit to use nextMessageToSend and defer callback execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that will make it more complicated, because it adds an extra case to consider without removing the need to handle any other case.
murgatroid99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now. Thank you for your contribution.
|
I have published this in version 1.14.3 |
This pr is created for issue #2569.
When sending unary RPC requests under high concurrency to servers behind Envoy/Istio sidecars, clients were receiving RST_STREAM errors. This occurred because the retrying call implementation delayed sending halfClose until message send callbacks completed, causing the client to half-close after the server in some cases. Envoy resets the stream when this happens, as it expects clients to half-close before servers.
Changes:
This ensures halfClose is sent as soon as messages are passed to the underlying transport, without waiting for flow control callbacks, eliminating the race condition with Envoy.
Fixes late client half-close issue described in the gRPC protocol where Envoy assumes client half-closes before server.