Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data race when deferring a Close() #333

Closed
sb10 opened this issue Jan 29, 2020 · 5 comments · Fixed by #342
Closed

data race when deferring a Close() #333

sb10 opened this issue Jan 29, 2020 · 5 comments · Fixed by #342
Labels
Milestone

Comments

@sb10
Copy link

sb10 commented Jan 29, 2020

I am doing:

[myfunc]
mymutex.Lock()
defer mymutex.Unlock()

sshClient, _, err := s.SSHClient(ctx)
if err != nil {
	return err
}

client, err := sftp.NewClient(sshClient)
if err != nil {
	return err
}
defer client.Close()

[use client]

return err

(where s.SSHClient is creating/storing/reusing an *ssh.Client.)

On rare occasion I get this:

WARNING: DATA RACE
Write at 0x00c000c18d20 by goroutine 340:
  golang.org/x/crypto/ssh.(*channel).CloseWrite()
      /home/ubuntu/go/pkg/mod/golang.org/x/[email protected]/ssh/channel.go:542 +0x5e
  golang.org/x/crypto/ssh.(*sessionStdin).Close()
      /home/ubuntu/go/pkg/mod/golang.org/x/[email protected]/ssh/session.go:536 +0x55
  github.com/pkg/sftp.(*clientConn).recv.func1()
      /home/ubuntu/go/pkg/mod/github.com/pkg/[email protected]/conn.go:71 +0x6c
  github.com/pkg/sftp.(*clientConn).recv()
      /home/ubuntu/go/pkg/mod/github.com/pkg/[email protected]/conn.go:77 +0x392
  github.com/pkg/sftp.(*clientConn).loop()
      /home/ubuntu/go/pkg/mod/github.com/pkg/[email protected]/conn.go:60 +0x72

Previous write at 0x00c000c18d20 by goroutine 366:
  golang.org/x/crypto/ssh.(*channel).CloseWrite()
      /home/ubuntu/go/pkg/mod/golang.org/x/[email protected]/ssh/channel.go:542 +0x5e
  golang.org/x/crypto/ssh.(*sessionStdin).Close()
      /home/ubuntu/go/pkg/mod/golang.org/x/[email protected]/ssh/session.go:536 +0x55
  github.com/pkg/sftp.(*clientConn).Close()
      /home/ubuntu/go/pkg/mod/github.com/pkg/[email protected]/conn.go:55 +0x95
  github.com/pkg/sftp.(*Client).Close()
      <autogenerated>:1 +0x43
[...]

Both goroutines are inside mutex-protected myfunc.

So the issue here is that my own Close() call in myfunc's defer is competing with the internal Close() call that gets triggered in the event of a recvPacket() error.

I'm not sure what I should do about this.

@sb10 sb10 changed the title data race when deferring a close data race when deferring a *Client.Close() Jan 29, 2020
@sb10 sb10 changed the title data race when deferring a *Client.Close() data race when deferring a Close() Jan 29, 2020
@puellanivis
Copy link
Collaborator

Hm, it looks like recv uses a mutex calling Close() twice at the same time https://github.com/pkg/sftp/blob/master/conn.go#L70-L72 but the Close() function itself, does not: https://github.com/pkg/sftp/blob/master/conn.go#L54-L55

We could probably just add the c.conn,Lock() and c.conn.Unlock() into the Close() function.

@eikenb do you know of anything off the top of your head that could break as a result? If not, I’ll setup a PR.

@eikenb
Copy link
Member

eikenb commented Jan 31, 2020

Looking around there are many places where conn.Close() is called without the lock and I'm thinking that maybe these should all have locks.

So I mean overridding Close() on the conn struct with one that uses the lock.

Basically something like...

diff --git a/conn.go b/conn.go
index 62e585e..cfbc1d2 100644
--- a/conn.go
+++ b/conn.go
@@ -31,6 +31,12 @@ func (c *conn) sendPacket(m encoding.BinaryMarshaler) error {
 	return sendPacket(c, m)
 }
 
+func (c *conn) Close() error {
+	c.Lock()
+	defer c.Unlock()
+	return c.WriteCloser.Close()
+}
+
 type clientConn struct {
 	conn
 	wg         sync.WaitGroup
@@ -67,9 +73,7 @@ func (c *clientConn) loop() {
 // appropriate channel.
 func (c *clientConn) recv() error {
 	defer func() {
-		c.conn.Lock()
 		c.conn.Close()
-		c.conn.Unlock()
 	}()
 	for {
 		typ, data, err := c.recvPacket()

@eikenb eikenb added the bug label Jan 31, 2020
@sb10
Copy link
Author

sb10 commented Feb 5, 2020

FYI, my work-around was this:

 // SFTPClient is like sftp.NewClient(), but the underlying
 // clientConn.conn.WriteCloser is mutex protected to avoid data races between
 // closes due to errors and direct Close() calls on the *sftp.Client.
 func SFTPClient(conn *ssh.Client) (*sftp.Client, error) {
 	s, err := conn.NewSession()
 	if err != nil {
 		return nil, err
 	}
 	if err = s.RequestSubsystem("sftp"); err != nil {
 		return nil, err
 	}
 	pw, err := s.StdinPipe()
 	if err != nil {
 		return nil, err
 	}
 	pw = &threadSafeWriteCLoser{WriteCloser: pw}
 	pr, err := s.StdoutPipe()
 	if err != nil {
 		return nil, err
 	}

 	return sftp.NewClientPipe(pr, pw)
 }

 type threadSafeWriteCLoser struct {
 	io.WriteCloser
 	sync.Mutex
 }

 func (c *threadSafeWriteCLoser) Close() error {
 	c.Lock()
 	defer c.Unlock()
 	return c.WriteCloser.Close()
 }

But yes, it would be better to do this properly upstream of me.

@eikenb
Copy link
Member

eikenb commented Mar 7, 2020

Pushed up the PR to fix this. Probably overkill, but I thought I'd briefly fish for someone with the secret to reproducing data races. :)

@puellanivis
Copy link
Collaborator

Yeah, reproducing data races is actually super hard to setup. It’s why I usually run through exhaustive proofs of behavior, rather than relying upon any sort of testing. It’s a lot more work, but tests for data races are often almost impossible to actually build.

@eikenb eikenb added this to the v1.12.0 milestone Jul 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants