clientv3: Canceling `Watch()` doesn't send `&pb.WatchCancelRequest` #9416

yudai · 2018-03-09T03:01:46Z

Summary

Canceling a context that has been used to invoke Watch() is supposed to stop subscribing events for the Watch(), meaning that canceling should release resources in the server side.

However, clientv3 doesn't send a &pb.WatchCancelRequest message to the server when a substream is canceled, which leads resource leak in the side until the parent watch stream itself is closed.

Issue

Tested with the master branch (00b8423).

watchGrpcStream serves watch substreams in serveSubstream and accepts context canceling for them there as well (using ws.initReq.ctx).
When ws.initReq.ctx is canceled, it eventually sends the ws to w.closingc, and the ws is handled by the watchGrpcStream.run(). Then it calls w.closeSubstream(ws) and the substream is removed from internal w.substreams. (the client side is fine, it releases the resources for substreams)

However, this canceling procedure for substreams doesn't send any cancel messages to the server. The server therefore doesn't know a substream is canceled by the client at this moment.

When the watched key of a substream is actively updated, an event to the key eventually resolve this inconsistency here.

On the other hand, when the key is not updated at all, the resources for the substream in the server side kept forever until the watch stream itself is canceled or the client is dropped.

This behavior can be an issue when there are long-running clients that dynamically watch/unwatch different keys that are not actively updated after unwatching.

Reproducing

You can reproduce the situation by the following code.

package main

import "github.com/coreos/etcd/clientv3"

import (
	"context"
	"fmt"
	"log"
	"time"
)

func main() {
	cli, err := clientv3.New(clientv3.Config{
		Endpoints:   []string{"localhost:2379"},
		DialTimeout: 5 * time.Second,
	})
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("Start")

	cli.Watch(context.Background(), "/") // this watch keeps the stream alive

	num := 1000
	for i := 0; i < num; i++ {
		ctx, cancel := context.WithCancel(context.Background())
		key := fmt.Sprintf("/%d", i)
		cli.Watch(ctx, key)
		cancel()
	}

	log.Printf("Done")

	time.Sleep(time.Hour)
        // etcd_debugging_mvcc_watch_stream_total 1
        // etcd_debugging_mvcc_watcher_total 1001   <- expected to be 1
}

Proposal

I think we can simply send a cancel request when w.closingc receives a closing stream.

https://github.com/coreos/etcd/blob/release-3.2/clientv3/watch.go#L513

Adding something like this?

		case ws := <-w.closingc:
			cr := &pb.WatchRequest_CancelRequest{
				CancelRequest: &pb.WatchCancelRequest{
					WatchId: ws.id,
				},
			}
			req := &pb.WatchRequest{RequestUnion: cr}
			wc.Send(req)

The text was updated successfully, but these errors were encountered:

gyuho · 2018-03-09T03:29:31Z

I think this is a valid use case for multiplexed watch stream requests + gRPC proxy. Right now, only way to cancel mvcc watchers is closing the whole watch stream.

What do you think? @heyitsanthony @xiang90

heyitsanthony · 2018-03-09T03:29:46Z

@yudai that's how it works today. A server-side mechanism to LRU evict watches in the v3rpc layer might be easier/cover more cases than reworking the client's cancel path.

yudai · 2018-03-09T19:08:20Z

Thanks for the comments. Please correct me if I misunderstand something below.

I think it's fine and working as expected that a mvcc.WatchStream is kept running when there is a corresponding watch stream in the client side. We need to close the whole watch stream to close mvcc.WatchStream.

The issue I wanted to mention was that mvcc.WatchStream.Cancel(id WatchID) is not called properly when you cancel a context for a clientv3.Watch() call (which creates a substream). mvcc.WatchStream.Cancel(id WatchID) releases mvcc.watcher (I'm a bit confused. this seems a way to cancel mvcc watchers without closing the whole watch stream), but since we don't call it, instances of mvcc.watcher piles up and not going to be freed, until you close the master watch stream.

In the v3rpc layer, v3rpc.serverWatchStream calls this mvcc.WatchStream.Cancel(id WatchID) properly when it receives *pb.WatchRequest_CancelRequest(https://github.com/coreos/etcd/blob/master/etcdserver/api/v3rpc/watch.go#L265-L268).

I therefore think we already have a proper implementation for substream canceling in the server side, and the only missing part is in the client side. The client library simply doesn't send *pb.WatchRequest_CancelRequest when users cancel contexts.

I assume this issue could be critical for gRPC proxy as well. I actually realized this issue when I was processing watch requests in our private gRPC proxy. Since clients don't send substream cancel requests, our proxy cannot release its internal resources for watch substreams as long as watch streams are alive. I think that the same thing should happen for gRPC proxy.

The mvcc layer also keeps unnecessary mvcc.watchers and other resources for context-canceled watch substreams as long as the watch stream for them is alive, as you can see etcd_debugging_mvcc_watcher_total in the reproducing code above. If a client repeats calling clientv3.Watch() and canceling it (it uses only one watch stream and generates many substreams), this can happen in any environments.

In my opinion, it would be better to make the client send cancel requests when users cancel contexts. It's pretty confusing for users that canceling contexts is not always canceling watch. Sending canceling requests on context canceling would be more straight forward and complies better with the defined protocol.

yudai · 2018-04-04T18:39:08Z

Created reproduction code:

package main

import "github.com/coreos/etcd/clientv3"

import (
  "context"
  "log"
  "time"
)

func main() {
  cli, err := clientv3.New(clientv3.Config{
    Endpoints:   []string{"localhost:2379"},
    DialTimeout: 5 * time.Second,
  })
  if err != nil {
    log.Fatal(err)
  }
  log.Printf("Start")

  cli.Watch(context.Background(), "/")

  for {
    ctx, cancel := context.WithCancel(context.Background())
    cli.Watch(ctx, "/")
    cancel()
  }

  log.Printf("Done")
}

You can confirm etcd_debugging_mvcc_watcher_total keeps increasing and never decrease with this code (as long as you don't put something to /).

gyuho · 2018-05-25T19:47:16Z

@yudai @heyitsanthony I will run some benchmarks to measure the overhead of canceling each watcher on closing channel receive, from client-side. Or investigate if it can be easily handled in storage layer.

scotwells · 2019-02-20T01:35:56Z

We seem to be experiencing a similar issue with our current production setup running on etcd v3.2.24. The number of open watches on the server continues to climb until we hit between ~23k and ~32k open watchers. Once the number of etcd open watches flat lines, etcd will fail to send watches out to clients and we see a sharp rise in Slow Watchers.

While diving into the code, we noticed that the watch request on the etcd server was not closed until the client got an event from the server after the client context was canceled. As an attempted workaround, we configured the clients to use WithProgressNotify. This change seemed to only delay the pile up, instead of resolving the issue.

Has there been any progress on determining if there is significant overhead in canceling a watch request when the context gets canceled on the client side?

Code executed when the client context is canceled

etcd/clientv3/watch.go

Lines 513 to 520 in e06761e

    
           case ws := <-w.closingc: 
        
           	w.closeSubstream(ws) 
        
           	delete(closing, ws) 
        
           	if len(w.substreams)+len(w.resuming) == 0 { 
        
           		// no more watchers on this stream, shutdown 
        
           		return 
        
           	} 
        
           }

Code executed when an event is received from the server after the context is closed

etcd/clientv3/watch.go

Lines 489 to 496 in e06761e

    
           cancelSet[pbresp.WatchId] = struct{}{} 
        
           cr := &pb.WatchRequest_CancelRequest{ 
        
           	CancelRequest: &pb.WatchCancelRequest{ 
        
           		WatchId: pbresp.WatchId, 
        
           	}, 
        
           } 
        
           req := &pb.WatchRequest{RequestUnion: cr} 
        
           wc.Send(req)

Number of Open Watches vs Slow Watchers on the server

stale · 2020-04-06T22:51:11Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

fcvarela · 2020-05-04T17:18:15Z

@xiang90 Any chance this is actively being worked on? This has caused us production issues several times before and we'll invest the time to fix it if no one else is working on it.

spzala · 2020-05-04T20:58:26Z

@xiang90 Any chance this is actively being worked on? This has caused us production issues several times before and we'll invest the time to fix it if no one else is working on it.
Reopening the issue per your comment, thanks @fcvarela

jackkleeman · 2020-05-06T10:50:58Z

I will run some benchmarks to measure the overhead of canceling each watcher on closing channel receive, from client-side

Can there be an overhead? Once a watch is closed on a client, we will eventually send a cancel request when the next message arrives. So, there are two cases by cancelling proactively:

We end up cancelling earlier, as we don't wait for a message. Arguably this saves resources.
We cancel where we otherwise wouldn't have, as no message would ever come. This is the case that is causing @monzo and presumably others production issues, and I don't know if leaving streams open indefinitely should be considered a performance improvement over closing them when no longer required.

I'm working on a PR to do what others above have suggested.

Currently, watch cancel requests are only sent to the server after a message comes through on a watch where the client has cancelled. This means that cancelled watches that don't receive any new messages are never cancelled; they persist for the lifetime of the client stream. This has negative connotations for locking applications where a watch may observe a key which might never change again after cancellation, leading to many accumulating watches on the server. By cancelling proactively, in most cases we simply move the cancel request to happen earlier, and additionally we solve the case where the cancel request would never be sent. Fixes etcd-io#9416 Heavy inspiration drawn from the solutions proposed there.

The new fork at github.com/cloudwan/etcd_for_gohan contains backported to v3.3.18 fix for etcd-io/etcd#9416 . We use watches heavily and with that fix, the memory usage in etcd server no longer grows uncontrollably. Regenerated go.sum while at it.

@jackkleeman

Currently, watch cancel requests are only sent to the server after a message comes through on a watch where the client has cancelled. This means that cancelled watches that don't receive any new messages are never cancelled; they persist for the lifetime of the client stream. This has negative connotations for locking applications where a watch may observe a key which might never change again after cancellation, leading to many accumulating watches on the server. By cancelling proactively, in most cases we simply move the cancel request to happen earlier, and additionally we solve the case where the cancel request would never be sent. Fixes etcd-io#9416 Heavy inspiration drawn from the solutions proposed there. This a port to v3.3.8 of a change made by @jackkleeman in 87aa5a9.

Currently, watch cancel requests are only sent to the server after a message comes through on a watch where the client has cancelled. This means that cancelled watches that don't receive any new messages are never cancelled; they persist for the lifetime of the client stream. This has negative connotations for locking applications where a watch may observe a key which might never change again after cancellation, leading to many accumulating watches on the server. By cancelling proactively, in most cases we simply move the cancel request to happen earlier, and additionally we solve the case where the cancel request would never be sent. Fixes etcd-io#9416 Heavy inspiration drawn from the solutions proposed there.

The new fork at github.com/cloudwan/etcd_for_gohan contains backported to v3.3.18 fix for etcd-io/etcd#9416 . We use watches heavily and with that fix, the memory usage in etcd server no longer grows uncontrollably. Regenerated go.sum while at it.

Currently, watch cancel requests are only sent to the server after a message comes through on a watch where the client has cancelled. This means that cancelled watches that don't receive any new messages are never cancelled; they persist for the lifetime of the client stream. This has negative connotations for locking applications where a watch may observe a key which might never change again after cancellation, leading to many accumulating watches on the server. By cancelling proactively, in most cases we simply move the cancel request to happen earlier, and additionally we solve the case where the cancel request would never be sent. Fixes etcd-io#9416 Heavy inspiration drawn from the solutions proposed there.

gyuho added area/performance type/feature labels Mar 9, 2018

gyuho self-assigned this May 25, 2018

gyuho added this to the etcd-v3.4 milestone May 25, 2018

gyuho added type/bug and removed type/feature area/performance labels May 25, 2018

This was referenced Jun 1, 2018

mvcc: separate synced/unsynced benchmarks #9796

Merged

*: optimize watch stream multiplexing garbage collection #9797

Closed

gyuho mentioned this issue Jul 18, 2018

apiserver's etcd client can get into a non-operational state, all calls on that resource 504 kubernetes/kubernetes#66312

Closed

xiang90 self-assigned this Feb 26, 2019

gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019

stale bot added the stale label Apr 6, 2020

stale bot closed this as completed Apr 27, 2020

spzala reopened this May 4, 2020

stale bot removed the stale label May 4, 2020

jackkleeman mentioned this issue May 6, 2020

clientv3: cancel watches proactively on client context cancellation #11850

Merged

stujb mentioned this issue May 7, 2020

Installing argocd causes unbounded etcd memory usage argoproj/argo-cd#3556

Closed

gyuho closed this as completed in #11850 May 7, 2020

cierpuchaw mentioned this issue Aug 24, 2020

Use the newly forked etcd client. cloudwan/gohan#786

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clientv3: Canceling `Watch()` doesn't send `&pb.WatchCancelRequest` #9416

clientv3: Canceling `Watch()` doesn't send `&pb.WatchCancelRequest` #9416

yudai commented Mar 9, 2018 •

edited

Loading

gyuho commented Mar 9, 2018

heyitsanthony commented Mar 9, 2018

yudai commented Mar 9, 2018

yudai commented Apr 4, 2018 •

edited

Loading

gyuho commented May 25, 2018

scotwells commented Feb 20, 2019

stale bot commented Apr 6, 2020

fcvarela commented May 4, 2020

spzala commented May 4, 2020

jackkleeman commented May 6, 2020

clientv3: Canceling Watch() doesn't send &pb.WatchCancelRequest #9416

clientv3: Canceling Watch() doesn't send &pb.WatchCancelRequest #9416

Comments

yudai commented Mar 9, 2018 • edited Loading

Summary

Issue

Reproducing

Proposal

gyuho commented Mar 9, 2018

heyitsanthony commented Mar 9, 2018

yudai commented Mar 9, 2018

yudai commented Apr 4, 2018 • edited Loading

gyuho commented May 25, 2018

scotwells commented Feb 20, 2019

stale bot commented Apr 6, 2020

fcvarela commented May 4, 2020

spzala commented May 4, 2020

jackkleeman commented May 6, 2020

clientv3: Canceling `Watch()` doesn't send `&pb.WatchCancelRequest` #9416

clientv3: Canceling `Watch()` doesn't send `&pb.WatchCancelRequest` #9416

yudai commented Mar 9, 2018 •

edited

Loading

yudai commented Apr 4, 2018 •

edited

Loading