Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure final value arrives to DoneCh() always #4472

Closed
wants to merge 1 commit into from

Conversation

bonifaido
Copy link
Contributor

stopCh is already selected in renewAuth() and renewLease(), so this case block is not needed here and otherwise, it causes a race condition and sometimes the final result doesn't arrive in the DoneCh()

@jefferai
Copy link
Member

The only way that the condition you describe can occur is if doneCh is already full since it's a buffered channel of size one. But as we only ever write one value to it this should never block, which is why the stopCh check is there as a safety check.

@bonifaido
Copy link
Contributor Author

bonifaido commented Apr 27, 2018

Porbably I'm wrong here, but I think in some cases this nil: https://github.com/bonifaido/vault/blob/master/api/renewer.go#L187
doesn't arrive to the doneCh if one calls Stop() on the Renewer, and so a select block runs forever on the DoneCh().

@jefferai
Copy link
Member

Is this behavior you've actually seen or you think it might happen? Select blocks take the first non blocking channel so what you're describing should not happen.

@bonifaido
Copy link
Contributor Author

I have seen it actually, I can extract an example if you would like, but basically when you call a Stop() from another goroutine it might happen if you are watching done and renew channels like this:

renewer, err := client.NewRenewer(&RenewerInput{
	Secret: mySecret,
})

go renewer.Renew()

// Stop the renewer 2 minutes from now
go func() {
	time.Sleep(2 * time.Minutes)
	renewer.Stop()
}()

for {
	select {
	case err := <-renewer.DoneCh():
		if err != nil {
			log.Fatal(err)
		}
		// Renewal is now over, but sometimes this block doesn't get executed
		return
		
	case renewal := <-renewer.RenewCh():
		log.Printf("Successfully renewed: %#v", renewal)
	}
}

@jefferai
Copy link
Member

jefferai commented May 4, 2018

I would love to see an example extracted if you have been able to actually create this problem.

@bonifaido
Copy link
Contributor Author

bonifaido commented May 4, 2018

package main

import (
	"io/ioutil"
	"log"
	"os"
	"time"

	"github.com/hashicorp/vault/api"
)

func main() {

	token, err := ioutil.ReadFile(os.Getenv("HOME") + "/.vault-token")
	if err != nil {
		log.Fatal(err)
	}

	config := api.DefaultConfig()
	config.Address = "http://localhost:8200"

	client, err := api.NewClient(config)
	if err != nil {
		log.Fatal(err)
	}

	client.SetToken(string(token))

	secret, err := client.Auth().Token().Create(&api.TokenCreateRequest{TTL: "30s"})
	if err != nil {
		log.Fatal(err)
	}

	renewer, err := client.NewRenewer(&api.RenewerInput{
		Secret: secret,
	})

	go renewer.Renew()

	// Stop the renewer 5 seconds from now
	go func() {
		time.Sleep(5 * time.Second)
		renewer.Stop()
		time.Sleep(5 * time.Second)
		log.Fatal("DoneCh should have done the work already!")
	}()

	for {
		select {
		case err := <-renewer.DoneCh():
			if err != nil {
				log.Fatal(err)
			}
			// Renewal is now over, but sometimes this block doesn't get executed
			return

		case renewal := <-renewer.RenewCh():
			log.Printf("Successfully renewed: %#v", renewal)
		}
	}
}

You will need to start a Vault dev server for this (but I'm pretty sure that you know how to do that :-) )
If you this message, then the issue occured, for me it happens 3 times from 10 (go run main.go 10x):

2018/05/04 16:21:31 DoneCh should have done the work already!

stopCh is already selected in renewAuth() and renewLease(), so this case block is not needed here and otherwise, it causes a race condition and sometimes the final result doesn't arrive in the DoneCh()
@jefferai
Copy link
Member

jefferai commented May 4, 2018

Thanks for the sample program! It turns out that this is a combination of two issues, one in the API and one in your program:

API: We should close the doneCh when we're done. This will cause any reads from it to return nil.

Your program: You're making an assumption that you'll always get a value back from DoneCh when you call stop, but really the right call flow would be that when you are stopping the renewer, you also stop paying attention to it, because at that point it doesn't matter whether or not you get another value back. So my change to the API will mean that your program should work, but ideally it'd be architected differently.

Below is the program I used for testing which is adapted from yours. I didn't make the changes suggested above just to make it easy to show that the API change fixes the issue with your current code; I'll commit the change to the API separately:

package main

import (
        "io/ioutil"
        "log"
        "os"
        "sync"
        "time"

        "github.com/hashicorp/vault/api"
)

func main() {

        token, err := ioutil.ReadFile(os.Getenv("HOME") + "/.vault-token")
        if err != nil {
                log.Fatal(err)
        }

        config := api.DefaultConfig()
        config.Address = "http://localhost:8200"

        client, err := api.NewClient(config)
        if err != nil {
                log.Fatal(err)
        }

        client.SetToken(string(token))

        secret, err := client.Auth().Token().Create(&api.TokenCreateRequest{TTL: "30s"})
        if err != nil {
                log.Fatal(err)
        }

        renewer, err := client.NewRenewer(&api.RenewerInput{
                Secret: secret,
        })

        go renewer.Renew()

        stopCh := make(chan struct{})

        wait := &sync.WaitGroup{}
        wait.Add(2)

        // Stop the renewer 5 seconds from now
        go func() {
                defer wait.Done()
                time.Sleep(5 * time.Second)
                log.Println("stopping")
                renewer.Stop()
                time.Sleep(5 * time.Second)
                select {
                case <-stopCh:
                        log.Println("stopped")
                        return
                default:
                        log.Fatal("DoneCh should have done the work already!")
                }
        }()

        go func() {
                defer wait.Done()
                for {
                        select {
                        case err := <-renewer.DoneCh():
                                if err != nil {
                                        log.Fatal(err)
                                }
                                // Renewal is now over, but sometimes this block doesn't get executed
                                log.Print("renewer done")
                                close(stopCh)
                                return

                        case renewal := <-renewer.RenewCh():
                                log.Printf("Successfully renewed: %#v", renewal)
                        }
                }
        }()

        wait.Wait()
}

@bonifaido
Copy link
Contributor Author

bonifaido commented May 5, 2018

Thanks for the fix @jefferai, I think the API is more bulletproof this way.

jefferai added a commit that referenced this pull request May 7, 2018
Unlike switch, select will randomly choose from among multiple cases if
more than one are satisfiable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants