You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-api-machinery/3157-watch-list/README.md
+26Lines changed: 26 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -385,6 +385,32 @@ Since the cacheWatcher starts processing the cacheWatcher.incoming channel only
385
385
In that case, it will be added to the list of blockedWatchers and will be given another chance to deliver an event after all nonblocking watchers have sent the event.
386
386
All watchers that have failed to deliver the event will be closed.
387
387
388
+
Closing the watchers would make the clients retry the requests and download the entire dataset again even though they might have received a complete list before.
389
+
To mitigate the issue, before sending the initial event we propose:
390
+
391
+
Compare the bookmarkAfterResourceVersion (from Step 2) with the current RV the watchCache is on
392
+
and wait until the difference between the RVs is < 1000 (the buffer size).
393
+
If the difference is greater than that it seems there is no need to go on since the buffer could be filled before we will receive an event with the expected RV.
394
+
Assuming all updates would be for the resource the watch request was opened for (which seems unlikely).
395
+
In case the watchCache was unable to catch up to the bookmarkAfterResourceVersion for some timeout value hard-close (ends the current connection by tearing down the current TCP connection with the client) the current connection so that client re-connects to a different API server with most-up to date cache.
396
+
Taking into account the baseline etcd performance numbers waiting for 10 seconds will allow us to receive ~5K events, assuming ~500 QPS throughput (see https://etcd.io/docs/v3.4/op-guide/performance/)
397
+
398
+
Once we are past this step (we know the difference is smaller) and the buffer fills up we:
399
+
400
+
- case-1: won’t close the connection immediately if the bookmark event with the expected RV exists in the buffer.
401
+
In that case, we will deliver the initial events, any other events we have received which RVs are <= bookmarkAfterResourceVersion, and finally the bookmark event, and only then we will soft-close (simply ends the current connection without tearing down the TCP connection) the current connection.
402
+
An informer will reconnect with the RV from the bookmark event.
403
+
Note that any new event received was ignored since the buffer was full.
404
+
405
+
- case-2: soft-close the connection if the bookmark event with the expected RV for some reason doesn't exist in the buffer.
406
+
An informer will reconnect arriving at the step that compares the RVs first.
407
+
408
+
In the future we could improve the way the buffer is managed:
409
+
- cap the size (cannot allocate more than X MB of memory) of the buffer
410
+
- make the buffer dynamic - especially when the difference between RVs is > than 1000
411
+
- inject new events directly to the initial list, i.e. to have the initial list loop consume the channel directly and avoid to wait for the whole initial list being processed before.
412
+
- maybe even apply some compression techniques to the buffer
413
+
388
414
Note: The RV is effectively a global counter that is incremented every time an object is updated.
389
415
This imposes a global order of events. It is equivalent to a LIST followed by a WATCH request.
0 commit comments