Send halfClose immediately after messages to prevent late halfClose issues with Envoy #3031

serkanerip · 2025-12-08T14:03:58Z

This pr is created for issue #2569.

When sending unary RPC requests under high concurrency to servers behind Envoy/Istio sidecars, clients were receiving RST_STREAM errors. This occurred because the retrying call implementation delayed sending halfClose until message send callbacks completed, causing the client to half-close after the server in some cases. Envoy resets the stream when this happens, as it expects clients to half-close before servers.

Changes:

Track highest sent message index separately from callback completion
Send halfClose immediately when all messages have been sent to transport
Optimize unary/final message case by sending halfClose right after message
Use halfCloseSent flag to prevent duplicate halfClose calls

This ensures halfClose is sent as soon as messages are passed to the underlying transport, without waiting for flow control callbacks, eliminating the race condition with Envoy.

Fixes late client half-close issue described in the gRPC protocol where Envoy assumes client half-closes before server.

linux-foundation-easycla · 2025-12-08T14:04:06Z

The committers listed above are authorized under a signed CLA.

✅ login: serkanerip / name: Serkan (1d546ee, 298e39a, 68a9a6d, 699ca49, b62f609)

…ssues with Envoy

murgatroid99

This can be significantly simplified.

murgatroid99 · 2025-12-09T20:10:53Z

packages/grpc-js/src/retrying-call.ts

+  /**
+   * Tracks whether halfClose has been sent to this child call.
+   */
+  halfCloseSent: boolean;


I don't think either of these fields is necessary. nextMessageToSend should be enough to track all of the relevant state.

If we half close + increment nextMessageTosend, this line would't not work so callback of the message would not be executed, this is why it's added.

Instead of adding a new field, just make the message index a second argument to handleChildWriteCompleted. In sendNextChildMessage, be careful to capture childCall.nextMessageToSend in a message index variable before potentially incrementing it when sending the immediate half close.

Sent a commit for this feedback.

murgatroid99 · 2025-12-09T20:14:28Z

packages/grpc-js/src/retrying-call.ts

+        // - If halfCloseIndex is 0, there are no messages, so send immediately
+        // - If halfCloseIndex is N, the last message is at index N-1
+        // - If highestSentMessageIndex >= N-1, all messages have been sent
+        if (halfCloseIndex === 0 || call.highestSentMessageIndex >= halfCloseIndex - 1) {


If halfCloseIndex is 0, then call.nextMessageToSend had better be >= -1, so there's no need to check halfCloseIndex === 0 separately.

I’ll leave my comments here regarding the need for highestSentMessageIndex.

The condition
call.nextMessageToSend === halfCloseIndex || call.nextMessageToSend === halfCloseIndex - 1
was added as a quick way to confirm that the problem is related to half-close timing.

When testing this with client-streaming calls, keeping call.nextMessageToSend += 1; inside that if block results in
gRPC Error: 13 INTERNAL: Write error: write after end.
If that line is removed, no messages are sent from the client at all.

This behavior can be reproduced when the client sends only one message and then closes the stream. If multiple messages are sent before closing, the if condition does not match and the issue does not occur.

Additionally, the “Cardinality violations” tests in test-server-errors.ts also fail.

The root cause is that, for clientUnaryStreaming RPCs, a callback (Stream.onwrite) is attached to the call context and sent along with the message. When this callback is invoked, execution inside sendMessageWithContext is paused, and the next line in user code (call.end()) is executed.

At that point, underlyingCall.nextMessageToSend is 0 and halfCloseIndex is 1, making the following condition true:
call.nextMessageToSend === halfCloseIndex - 1.

As a result, halfClose is called on the stream before any message is actually sent. When execution of sendMessageWithContext later resumes, it attempts to write to an already ended stream, which leads to the write error.

Example reproducer, if you uncomment second write you'll see that issue is not reproducable:

app.post('/api/policies', (req, res) => { // call client.UploadPolicies which is client-side streaming const call = client.UploadPolicies((error, response) => { if (error) { console.error('gRPC Error:', error); return res.status(500).json({ error: 'Failed to upload policies', details: error.message }); } console.log('UploadPolicies response:', response); }); let i = 1; let policy = { id: `policy-${i}`, name: `Policy ${i}` }; const resp = call.write({ policy }); console.log('call.write called!'); // i++; // policy = { id: `policy-${i}`, name: `Policy ${i}` }; // call.write({ policy }); call.end(); res.json({ message: 'Policies upload initiated' }); });

I think it solves the problem to move that context.callback?.(); line into a process.nextTick call, and I think that makes sense anyway. That callback is called asynchronously in other cases, so it would be better to be consistent and make it asynchronous here too.

I'll try that out, another solution came to my mind which might simplify the changes:

Since this issue happens with unary rpc calls, we could use sendMessageWithContext in here, and passing a new flag like WriteFlags.Endwith context so that in retrying-call the halfClose will be called right after sending message when the flag is set? Unrelated to current solution/changes, different changes will be introduced.

Sent a commit to use nextMessageToSend and defer callback execution.

I think that will make it more complicated, because it adds an extra case to consider without removing the need to handle any other case.

murgatroid99

This looks good to me now. Thank you for your contribution.

murgatroid99 · 2025-12-11T15:17:29Z

I have published this in version 1.14.3

half close right after write

68a9a6d

serkanerip added 2 commits December 8, 2025 20:50

revert changes

298e39a

Send halfClose immediately after messages to prevent late halfClose i…

1d546ee

…ssues with Envoy

serkanerip changed the title ~~Issue 2569 root cause demonstration~~ Send halfClose immediately after messages to prevent late halfClose issues with Envoy Dec 9, 2025

serkanerip marked this pull request as ready for review December 9, 2025 19:43

serkanerip mentioned this pull request Dec 9, 2025

@grpc/grpc-js throw 'Received RST_STREAM with code 0' with retry enabled #2569

Open

murgatroid99 requested changes Dec 9, 2025

View reviewed changes

serkanerip added 2 commits December 10, 2025 19:05

remove halfCloseSent field

699ca49

Use nextMessageToSend for early half-close

b62f609

serkanerip requested a review from murgatroid99 December 10, 2025 17:05

murgatroid99 added the kokoro:run label Dec 10, 2025

kokoro-team removed the kokoro:run label Dec 10, 2025

murgatroid99 approved these changes Dec 10, 2025

View reviewed changes

murgatroid99 merged commit 164d14f into grpc:master Dec 10, 2025
4 of 5 checks passed

murgatroid99 mentioned this pull request Dec 10, 2025

Backport "Send halfClose immediately after messages to prevent late halfClose issues with Envoy" to 1.14.x #3032

Merged

Send halfClose immediately after messages to prevent late halfClose issues with Envoy #3031

Send halfClose immediately after messages to prevent late halfClose issues with Envoy #3031

Uh oh!

Conversation

serkanerip commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

murgatroid99 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serkanerip Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

murgatroid99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

murgatroid99 commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

serkanerip commented Dec 8, 2025 •

edited

Loading

linux-foundation-easycla bot commented Dec 8, 2025 •

edited

Loading

serkanerip Dec 10, 2025 •

edited

Loading