TCP connection stall, avoided by disabling AutoTuneSndBuf #10243

djs55 · 2024-04-02T13:50:50Z

Description

[First of all, thanks for writing this awesome piece of software!]

Using the gvisor tcpip library on macOS (across a range of versions, including the latest I could find i.e. gvisor.dev/gvisor v0.0.0-20240331093445-9d995324d058) I see a stall where:

in a macOS terminal: iperf3 -c localhost -t 3600 (the 3600 is aspirational)
connects to localhost:5201 which is accepted by a proxy which calls gonet.DialContextTCP
which uses gvisor to send ethernet frames over virtio to a Linux 6.6 kernel running in a VM
where iperf3 -s is running (in a docker container)

Typically after a few minutes the stream stalls (usually < 3 minutes). If I apply this patch it doesn't seem to stall:

diff --git a/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go b/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
index 7b0af1759d..a21f477a14 100644
--- a/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
+++ b/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
@@ -865,6 +865,7 @@ func newEndpoint(s *stack.Stack, protocol *protocol, netProto tcpip.NetworkProto
                windowClamp:   DefaultReceiveBufferSize,
                maxSynRetries: DefaultSynRetries,
        }
+       e.sndQueueInfo.TCPSndBufState.AutoTuneSndBufDisabled.Store(1)
        e.ops.InitHandler(e, e.stack, GetTCPSendBufferLimits, GetTCPReceiveBufferLimits)
        e.ops.SetMulticastLoop(true)
        e.ops.SetQuickAck(true)

I can also achieve the same effect by creating my endpoint manually (not using gonet) and using:

// This sets AutoTuneSndBufDisabled. Without this iperf3 stalls writing.
ep.SocketOptions().SetSendBufferSize(ep.SocketOptions().GetSendBufferSize(), true)

Attached is a .pcap of a TCP session which was transferring data quickly until it stalled:
gvisor-report-s100.zip

Let me know what you think! Thanks again.

Related to docker/for-mac#7207

Steps to reproduce

The bug reproduces with Docker Desktop 4.28 on Mac (which uses gvisor internally).

In one terminal:

docker run -it --rm -p 5201:5201/tcp -p 5201:5201/udp dockerpinata/iperf3:2.1 iperf3 -s

In another terminal:

iperf3 -t 3600 -c localhost

runsc version

No response

docker version (if using docker)

No response

uname

Darwin foo.local 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:41 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8103 arm64

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

The text was updated successfully, but these errors were encountered:

kevinGC · 2024-04-02T18:56:16Z

Thanks for the report. Let me see whether we can repro this.

Also: I don't think I knew that gVisor's netstack was used by Docker on Mac. That's awesome.

kevinGC · 2024-04-02T21:38:53Z

I can repro this on Mac. On Linux it runs for the full hour without issue. I'll look at whether there's anything obviously platform-dependent, but it would be really helpful if I could build docker for mac to enable debug logs and add bits that'll help me track down the issue. Is there any way for me to do so?

kevinGC · 2024-04-02T22:25:34Z

A couple first observations about the pcap:

There are a bunch of [TCP ACKed unseen segment] that I can't explain given that they occur when the connection looks otherwise healthy. Maybe this just has to do with the packet capture method?
The packet just before throughput drops to zero is the server (running netstack) saying its receive window is zero. It's weird that the fix -- which affects the send buffer -- resolves an issue with the receive buffer.

kevinGC · 2024-04-02T22:45:13Z

I misunderstood the setup. gVisor is sending packets that are originally from the client, and the server is running in a Linux VM. So it's not weird that messing with the send buffer affects behavior.

That means that the server is advertising a zero window, which is strange.

kevinGC · 2024-04-02T23:22:29Z

And another question: is the pcap between the client and proxy, or the proxy and iperf server?

djs55 · 2024-04-03T07:28:08Z

Hi Kevin, thanks for taking a look!

It would be really helpful if I could build docker for mac to enable debug logs and add bits that'll help me track down the issue. Is there any way for me to do so?

The only way I can suggest is if there are particular things you'd like to add or enable, if you point me at them I can publish a developer build via our internal CI. (Sorry I realise this is very inconvenient!)

gVisor is sending packets that are originally from the client, and the server is running in a Linux VM. So it's not weird that messing with the send buffer affects behavior.

Yes that's it -- the iperf3 -c is on the Mac and is sending as much data as it can via gvisor, as quickly as it can. I saw the (Linux kernel) server advertising a zero window too, which is definitely confusing.

I generated the pcap on the Linux side with

docker run -it -v ~/bugs/receive-window-closed/:/foo --net=host  djs55/tcpdump -n -i eth0 -w /foo/iperf.pcap

On Linux by default Docker Desktop is using qemu's builtin networking. I'd like to switch this to gvisor by default but there's a different bug in play (which I suspect is a lower-level qemu virtio-net problem) If you'd like to play with it, you can edit .docke/desktop/settings.json and set networkType to gvisor.

Back on Mac, the default ethernet layer is a SOCK_DGRAM socketpair shared with the Apple virtualization.framework which implements virtio-net. Perhaps there's some issue there which is triggering bad behaviour on the higher layer. Under stress it does drop datagrams. I don't know if it re-orders things too. I notice that if I switch the hypervisor to qemu (from the whale menu -> Settings -> uncheck "Use Virtualisation framework") which on Mac does use gvisor (sorry, too many configuration combinations!) then the iperf3 seems slower (10x) but healthier.

Thanks again for all your help so far

djs55 · 2024-04-03T10:43:42Z

Another data point: I shared a build with the AutoTuneSndBuf tweak on docker/for-mac#7207 (comment) and the user reports it seems to help, but they still see the zero window stall eventually. So it sounds like a lower level issue, one which the AutoTuneSndBuf is good at triggering, but doesn't cause directly.

djs55 · 2024-04-03T16:26:48Z

A nice thing about this kind of bug is that it gives me an opportunity to increase debuggability a bit. I've created a dev build of Docker Desktop Mac arm64 Mac amd64 which, if this file exists:

touch /tmp/docker-desktop-gvisor.pcap

and then the app is started, it will record the last 1024 packets sent to the VM in a ring. When the app quits it writes the ring to disk. So this gives the trace more from the gvisor point of view (rather than using docker run tcpdump to see it from the VM's point of view) Here's an example run with a stall:
docker-desktop-gvisor.zip

It looks similar to the previous trace

Edit: the pcap is has a bug where the frame length == capture length == 100

djs55 · 2024-04-03T18:08:27Z

[ Sorry for the spam, here's another trace: docker-desktop-gvisor.zip taken on the host with the frame length field fixed, so Wireshark can see the payload length. I'll refresh the DD builds in case that's useful. ]

kevinGC · 2024-04-03T18:09:45Z

Can you try a build with this patch? My current theory is that the proxy is blocked waiting to write out some data, but the send buffer shrinks and we forget to notify the proxy that it can write again. That would show up in the pcap as a a receive buffer zero window because packets sent by iperf would queue up at the proxy/netstack waiting for the send.

I have a theory+test for how AutoTuneSndBuf could cause the issue. I'm less sure about how the user is getting the stall without auto tuning.

djs55 · 2024-04-03T18:42:23Z

I refreshed to gvisor.dev/gvisor v0.0.0-20240403010941-979aae3d2c21 and applied the patch (minus the test parts which weren't vendored):

% git show
commit 11931d7816e1c29428300ad9206ada8a26768e33 (HEAD -> gvisor-debug)
Author: David Scott <[email protected]>
Date:   Wed Apr 3 19:14:04 2024 +0100

    https://github.com/google/gvisor/pull/10245/commits/ca92de4400d4351ba3b505b3ced2e42d9fd6323f

diff --git a/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go b/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
index cf0b0093b9..6f3cc457a8 100644
--- a/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
+++ b/vendor/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go
@@ -2974,11 +2974,11 @@ func (e *Endpoint) updateSndBufferUsage(v int) {
        notify = notify && e.sndQueueInfo.SndBufUsed < int(newSndBufSz)>>1
        e.sndQueueInfo.sndQueueMu.Unlock()
 
-       if notify {
-               // Set the new send buffer size calculated from auto tuning.
-               e.ops.SetSendBufferSize(newSndBufSz, false /* notify */)
-               e.waiterQueue.Notify(waiter.WritableEvents)
-       }
+       // if notify {
+       // Set the new send buffer size calculated from auto tuning.
+       e.ops.SetSendBufferSize(newSndBufSz, false /* notify */)
+       e.waiterQueue.Notify(waiter.WritableEvents)
+       // }
 }
 
 // readyToRead is called by the protocol goroutine when a new segment is ready

It still stalled with a zero window :( Here's a pcap of the last ~1000 packets from the host's point of view: docker-desktop-gvisor.zip

djs55 · 2024-04-03T19:06:42Z

The part I don't understand from the pcap, (looking at the latest one), Linux (receiving the iperf stream) sends an ACK

924	0.017288	192.168.65.3	192.168.65.1	TCP	66	5201 → 16816 [ACK] Seq=1980864929 Ack=814060412 Win=10142 Len=0 TSval=291954484 TSecr=312347615

assuming the window scaling is 7, it should be fine to send up to 814060412 + 10142 * 128 == 815338364. gvisor happily sends

961	0.017890	192.168.65.1	192.168.65.3	TCP	32834	16816 → 5201 [PSH, ACK] Seq=815305596 Ack=1980864929 Win=4096 Len=32768 TSval=312347616 TSecr=291954484[Packet size limited during capture]

which ends at 815305596 + 32768 == 815338364 which means the window has genuinely hit zero. Then Linux doesn't open the window

962	0.019635	192.168.65.3	192.168.65.1	TCP	66	[TCP ZeroWindow] 5201 → 16816 [ACK] Seq=1980864929 Ack=815272828 Win=0 Len=0 TSval=291954486 TSecr=312347615

so gvisor sends a series of ACKs (window probes?)

965	0.220037	192.168.65.1	192.168.65.3	TCP	66	16816 → 5201 [ACK] Seq=815272827 Ack=1980864929 Win=4096 Len=0 TSval=312347818 TSecr=291954486

but Linux never opens the window. I don't see any more traffic about this connection from Linux in fact. Maybe it's really some kind of connection tracking / firewall issue on my end preventing Linux from opening the window? I'll try to rule that out just in case...

kevinGC · 2024-04-03T19:31:46Z

Agreed that those are window probes sent with exponential backoff. And thanks for looking into the conntrack/firewall stuff.

TCP checksums in the gVisor -> Linux direction are all zero. Maybe virtio expects real checksums and is dropping packets before they hit tcpdump? Try turning off RXChecksumOffload. Doesn't make a lot of sense given that the stream works for a while first.
We sometimes run into issues with os.File (e.g. where, if we don't dup() the file descriptor ourselves, Go will close the file descriptor when the os.File is garbage collected. Are you using os.File anywhere to connect things? Again it doesn't seem that likely -- if this were the issue I'd expect we wouldn't seen window probes.

hbhasker · 2024-04-04T13:28:48Z

@kevinGC why are TCP checksums zero when sending from gvisor? If we don't have GSO offload then they should be set by gvisor? Is it possible that somehow window probes are being sent with a zero checksum incorrectly? If so Linux would drop them and never see the zero window probe resulting in this behavior.

Given that the proxy is a go program not running gvisor maybe the stack is not being setup the same way as gvisor so could behave differently.

hbhasker · 2024-04-04T13:31:21Z

gvisor/pkg/tcpip/transport/tcp/snd.go

Line 908 in 5b30c4a

func (s *sender) sendZeroWindowProbe() {

is the code that sends ZWP.

hbhasker · 2024-04-04T13:34:47Z

gvisor/pkg/tcpip/transport/tcp/connect.go

Line 839 in 5b30c4a

if gso.Type != stack.GSONone && gso.NeedsCsum {

is where we decide if we need to set a checksum.

djs55 · 2024-04-04T13:36:06Z

@kevinGC thanks for your thoughts. I'll dig in a bit more on my side (once I'm back from PTO next week). Thanks again for all your help so far.

FYI the checksums are off because I've set stack.CapabilityTXChecksumOffload in gvisor (which is sending to Linux) and modified Linux to not expect the checksums at the virtio-net layer (a bit of a hack because there's no way to negotiate this capability currently). I can certainly try turning the checksums back on to see what happens.

hbhasker · 2024-04-04T13:48:41Z

If Linux is configured to ignore checksums it should not matter. But worth giving it a shot anyway. That said somethings about how send/receive tuning works.

Send buffer tuning relies on how much data is acked by receiver to grow send buffers. Receive buffer tuning depends on how quickly the data is read by the application in this case your proxy code. If your receive buffers are not growing it's probably because the proxy is not reading fast enough.

If send buffers are not growing it's probably because you are not getting timely acks which move the sliding window forward fast enough.

If Linux is not opening it's receive buffer it would mean that iperf is not reading the data which seems highly unlikely. The other reason it could go to zero is if there are too many out of order packets being received and gvisor isnt sending the packet that closes the gap. In such a case then out of order packets will fill up the receive buffer on the Linux side resulting in a stall till the packet that moves the window forward is sent by gVisor.

hbhasker · 2024-04-04T14:01:16Z

Something I missed. I noticed you are using a SOCK_DGRAM unix socket, I would change it to SOCK_SEQPACKET which guarantees in order delivery.

hbhasker · 2024-04-04T14:07:01Z

Just saw the pcap the whole ACKing a packet not seen seems to indicate Out of Order delivery of packets. Your issue might be the use of SOCK_DGRAM vs SOCK_SEQPACKET.

djs55 · 2024-04-04T14:11:38Z

@hbhasker thanks, I’ll investigate and report back :)

Also since there’s no evidence of gvisor malfunctioning (the link to the auto tune buffer option seems more like a coincidence now) feel free to close this ticket in the meantime!

kevinGC · 2024-04-05T21:58:24Z

Thanks for all the help Bhasker. A few thoughts:

ACKing a packet not seen seems to indicate Out of Order delivery of packets

My understanding is that ACKing unseen packets usually comes from capture not keeping up with transmission. I'm not sure I follow how unseen packet ACKs indicates out of order delivery.

gvisor isnt sending the packet that closes the gap

gVisor is sending window probes and getting nothing back; if if there's a gap we should be getting back an ACK for a sent-and-dropped packet. Even with a zero window, we should be getting some kind of response.

And I'll just leave this open until I hear whether SOCK_SEQPACKET does the trick.

kevinGC · 2024-04-05T22:17:36Z

Oh, here's a thought: our zero window probes are broken? IIUC they should be sending the next byte in the probe. From RFC 9293 3.8.6.1:

The sending TCP peer must regularly transmit at least one octet of new data (if available), or retransmit to the receiving TCP peer even if the send window is zero, in order to "probe" the window. This retransmission is essential to guarantee that when either TCP peer has a zero window the reopening of the window will be reliably reported to the other.

Per the pcap, we're sending zero-length probes, which IIUC Linux doesn't have to ACK (why would it, it's just a duplicate and up-to-date ACK):

This seems to be supported by Wireshark calling the probes TCP Dup ACK instead of TCP ZeroWindowProbe, which it seems to do elsewhere (picture from unrelated wireshark Q&A post):

So the theory is we're filling up the Linux receive buffer, getting an ACK+zerowindow packet, and never realizing that the window has reopened because we don't properly probe and thus never know that iperf has consumed its bytes.

Per [RFC 9293 3.8.6.1], zero window probes need at least one byte of payload. Currently we send empty packets, which the receiver sees as an up-to-date, but useless, ACK. This can result in a stall when the receiver never sends data (only ACKs), as netstack will probe forever and the receiver won't know to respond. See #10243. [RFC 9293 3.8.6.1]: https://datatracker.ietf.org/doc/html/rfc9293#name-zero-window-probing PiperOrigin-RevId: 622301779

hbhasker · 2024-04-05T23:48:10Z

That's a good point. I wonder how we missed that. Maybe worth making a quick fix and seeing?

…

On Fri, Apr 5, 2024, 3:18 PM Kevin Krakauer ***@***.***> wrote: Oh, here's a thought: our zero window probes are broken? IIUC they should be sending the next byte in the probe. From RFC 9293 3.8.6.1 <https://datatracker.ietf.org/doc/html/rfc9293#name-zero-window-probing>: The sending TCP peer must regularly transmit at least one octet of new data (if available), or retransmit to the receiving TCP peer even if the send window is zero, in order to "probe" the window. This retransmission is essential to guarantee that when either TCP peer has a zero window the reopening of the window will be reliably reported to the other. Per the pcap, we're sending zero-length probes, which IIUC Linux doesn't have to ACK (why would it, it's just a duplicate and up-to-date ACK): Screenshot.2024-04-05.at.3.07.12.PM.png (view on web) <https://github.com/google/gvisor/assets/1718925/be6127ce-75f1-4778-84d1-906790dc330c> This seems to be supported by Wireshark calling the probes TCP Dup ACK instead of TCP ZeroWindowProbe, which it seems to do elsewhere (picture from unrelated wireshark Q&A <https://osqa-ask.wireshark.org/questions/21230/tcp-acked-unseen-segment-wireshark-bug/> post): [image: TCP ZeroWindowProbe] <https://camo.githubusercontent.com/a76d255879fa5de4eb8565af696a12ab503df7953e0fb039d2603fc8f57076aa/68747470733a2f2f6f7371612d61736b2e77697265736861726b2e6f72672f757066696c65732f436170747572655f31372e504e47> — Reply to this email directly, view it on GitHub <#10243 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFSYNOBMDMJMLQ6KF7YF43Y34PJNAVCNFSM6AAAAABFTOAPCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQG4YDAMZUGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kevinGC · 2024-04-05T23:49:27Z

#10252 has the change, although I'm still wrestling with tests.

hbhasker · 2024-04-06T00:14:37Z

Wait but that's sending snduna-1. So sending the last acked byte should be fine too since it will elicit a dup ack.

…

On Fri, Apr 5, 2024, 4:49 PM Kevin Krakauer ***@***.***> wrote: #10252 <#10252> has the change, although I'm still wrestling with tests. — Reply to this email directly, view it on GitHub <#10243 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFSYNK7RUXLHL6LKIECBHTY342B5AVCNFSM6AAAAABFTOAPCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQG43TSNJVHE> . You are receiving this because you commented.Message ID: ***@***.***>

kevinGC · 2024-04-06T00:21:49Z

@djs55 could you try #10252?

@hbhasker looks like Linux doesn't follow the RFC either: it basically sends a keepalive, which has the same effect. I.e. it sends a byte with an already-ACKed sequence number, not the next byte. That seems easier, so I'll do that when I get the chance. But I think #10252 should also fix the problem for now.

djs55 · 2024-04-08T09:47:46Z

Good news -- with #10252 I can't reproduce the stall any more! iperf3 has run for 3 hours so far today with no problem. To sanity-check I removed the patch and it stalled in under 60s. For fun I plotted a graph from the first 2 hours of the iperf output:

Regarding SEQPACKET: I think this will improve the situation on Linux (thanks for the tip!) Unfortunately I think it's not supported by macOS yet. macOS has the headers but rejects my attempt to use the socket type:

% cat main.c 
#include <sys/socket.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

void check(char *description, int family, int type){
  int s = socket(family, type, 0);
  if (s == -1) {
    printf("%s errno = %d (%s)\n", description, errno, strerror(errno));
  } else {
    close(s);
    printf("%s OK\n", description);
  }
}

int main(){
  check("AF_INET  SOCK_SEQPACKET", AF_INET, SOCK_SEQPACKET);
  check("AF_LOCAL SOCK_SEQPACKET", AF_LOCAL, SOCK_SEQPACKET);
  check("AF_INET  SOCK_DGRAM    ", AF_INET, SOCK_DGRAM);
  check("AF_LOCAL SOCK_DGRAM    ", AF_LOCAL, SOCK_DGRAM);
}
% cc -o seqpacket main.c
% ./seqpacket 
AF_INET  SOCK_SEQPACKET errno = 43 (Protocol not supported)
AF_LOCAL SOCK_SEQPACKET errno = 43 (Protocol not supported)
AF_INET  SOCK_DGRAM     OK
AF_LOCAL SOCK_DGRAM     OK

(on Linux it looks ok) I also found a comment in the chromium source

#if BUILDFLAG(IS_APPLE)
  // macOS does not support SEQPACKET.

which is a shame.

However the bug looks fixed to me which is great news. Thanks, both, for all your help!

kevinGC · 2024-04-08T20:50:33Z

Glad that worked! We're going to go with #10258 (don't need to segment split, more Linux-like, fewer test changes). Feel free to try it or wait for it to get merged. It should also fix the issue.

hbhasker · 2024-04-08T23:18:13Z

A few things I think we should verify.

a) I read the linux code and it doesn't seem to send any data at all. It's behaviour is only different when the urgent pointer is set where it sends two probes instead of 1. gVisor doesn't implement urgent pointer so we can just ignore that case.

b) As per RFC https://datatracker.ietf.org/doc/html/rfc9293#section-3.10.7.4 sending a packet with an ACK bit set and an out of window sequence number will generate an ACK even if there is no payload. I would write a small linux packet impact test to verify this. Also see (https://sourcegraph.com/github.com/torvalds/linux@fec50db7033ea478773b159e0e2efb135270e3b7/-/blob/net/ipv4/tcp_input.c?L5893) which will always send an out of window dup ack as long as its not rate limited.

c) The implementation in 0b36b49 is not what linux does and introduces a payload in the stream that is not even part of the original data. I would not do this as this would be very strange in TCP packet captures where suddenly ZWP carry a different byte for the same sequence number in probes vs a regular retransmit of data.

I think we are missing something else here. A packetimpact test should help figure it out better.

kevinGC · 2024-04-09T05:58:59Z

Good catches.

a: Agreed, and I ran a little manual test to check. Linux does not send any data; it basically just sends a keepalive. My Mac does send a byte, and the internet says Windows does too.

b: Right. In my manual test Linux does respond to the payload-less packet, so I expect we'll see the same from a packetimpact test. And poking around the kernel code it certainly looks like it should send an ACK.

c: I think this is harmless because the byte must be discarded for being out-of-window. Keepalives are even explicitly allowed to send garbage.

I'll modify the packetimpact tests to verify the payload size of ZWPs. I expect we'll see zero-length probes.

But as far as fixing this issue is concerned, I think there are two options:

We suspect there's a configuration/iptables/etc issue and work to find it.
We switch to netstack: zero window probes must have a payload #10252, which is not quite what Linux does but is RFC-compliant and avoids any possible issue with (c).

hbhasker · 2024-04-09T14:56:41Z

Reg 1. If it was an iptables issue we would observe same stall when using Linux as the client.

Sending a payload should not resolve the iptables issue.

I would also check if somehow the checksum for probes is wrong. I know it's unlikely but worth ruling it out.

Also the sending garbage payload is a MAY and only to support non compliant TCP implementations. I would suggest just store the last byte before the first unacked byte in the TCP struct separately and use that rather send junk.

kevinGC · 2024-04-10T00:05:16Z

I mean iptables in the particular spin+config of Linux that Docker Mac is using. I don't think we have a way to see "Linux as the client" because the proxy never used Linux's stack, it used Mirage's before gVisor.

I will check (heh) the checksum.

And if we're going to avoid the garbage byte, I'd rather just do #10252:

It's easier on the CPU: that PR adds no extra work in the common case and requires a segment split in sendZeroWindowProbe in the rare cases where we hit a zero window. But updating the last-ACKed byte has to happen for every ACK of every connection. It would probably happen in handleRcvdSegment, and getting that last byte involves dealing with the underlying buffer's linked lists. That seems like a bunch of extra CPU cycles in the hot path, i.e. when there's no zero window.
It's a simpler change: plumbing the last-ACKed byte is going to be single-purpose code that traverses the packages tcp->stack.PacketBuffer->buffer.Buffer vs. a few lines in sendZeroWindowProbe.
It's RFC-compliant.
Sending the last-ACKed byte isn't what Linux does, and if we're going to diverge from Linux behavior I'd rather pick the simpler implementation.

Certainly open to discussion, but IMO one way has most of the advantages.

hbhasker · 2024-04-11T18:33:16Z

So I think the garbage byte is a fine option to avoid splitting the segment etc. That said I am still curious on why it works for linux to send a zero byte probe but not for gvisor. Is the sequence number sent by linux any different? From reading the code it should be sndUna-1 but would be good to confirm.

Anyway +1 to using the garbage byte for now but maybe keep the issue open to investigate when you have time to see why it works for linux vs not for us. Should be easy to replicate a ZW in gvisor by just not reading the data at all and let it fill and see what linux does vs what a gvisor sender does in a packet impact test.

djs55 added the type: bug Something isn't working label Apr 2, 2024

kevinGC self-assigned this Apr 3, 2024

copybara-service bot mentioned this issue Apr 5, 2024

netstack: zero window probes must have a payload #10252

Closed

copybara-service bot mentioned this issue Apr 8, 2024

netstack: zero window probes must have a payload #10258

Closed

copybara-service bot closed this as completed in 91a283f Apr 8, 2024

kevinGC reopened this Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCP connection stall, avoided by disabling AutoTuneSndBuf #10243

TCP connection stall, avoided by disabling AutoTuneSndBuf #10243

djs55 commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024 •

edited

Loading

djs55 commented Apr 3, 2024

kevinGC commented Apr 3, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024

kevinGC commented Apr 3, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

djs55 commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

djs55 commented Apr 4, 2024

kevinGC commented Apr 5, 2024 •

edited

Loading

kevinGC commented Apr 5, 2024 •

edited

Loading

hbhasker commented Apr 5, 2024 via email

kevinGC commented Apr 5, 2024

hbhasker commented Apr 6, 2024 via email

kevinGC commented Apr 6, 2024

djs55 commented Apr 8, 2024

kevinGC commented Apr 8, 2024

hbhasker commented Apr 8, 2024

kevinGC commented Apr 9, 2024

hbhasker commented Apr 9, 2024

kevinGC commented Apr 10, 2024

hbhasker commented Apr 11, 2024

TCP connection stall, avoided by disabling AutoTuneSndBuf #10243

TCP connection stall, avoided by disabling AutoTuneSndBuf #10243

Comments

djs55 commented Apr 2, 2024

Description

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

kevinGC commented Apr 2, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024 • edited Loading

djs55 commented Apr 3, 2024

kevinGC commented Apr 3, 2024

djs55 commented Apr 3, 2024

djs55 commented Apr 3, 2024

kevinGC commented Apr 3, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

djs55 commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

hbhasker commented Apr 4, 2024

djs55 commented Apr 4, 2024

kevinGC commented Apr 5, 2024 • edited Loading

kevinGC commented Apr 5, 2024 • edited Loading

hbhasker commented Apr 5, 2024 via email

kevinGC commented Apr 5, 2024

hbhasker commented Apr 6, 2024 via email

kevinGC commented Apr 6, 2024

djs55 commented Apr 8, 2024

kevinGC commented Apr 8, 2024

hbhasker commented Apr 8, 2024

kevinGC commented Apr 9, 2024

hbhasker commented Apr 9, 2024

kevinGC commented Apr 10, 2024

hbhasker commented Apr 11, 2024

djs55 commented Apr 3, 2024 •

edited

Loading

kevinGC commented Apr 5, 2024 •

edited

Loading

kevinGC commented Apr 5, 2024 •

edited

Loading