Skip to content

tablet_picker: keep trying to find a tablet until context expires#6546

Merged
deepthi merged 10 commits intovitessio:masterfrom
planetscale:ds-fix-tp-busyloop
Aug 14, 2020
Merged

tablet_picker: keep trying to find a tablet until context expires#6546
deepthi merged 10 commits intovitessio:masterfrom
planetscale:ds-fix-tp-busyloop

Conversation

@deepthi
Copy link
Copy Markdown
Collaborator

@deepthi deepthi commented Aug 6, 2020

There was a bug in the tablet_picker implementation which could cause it to loop forever.

  1. We were not checking whether the context is still valid.
  2. If there are non-zero tablets of any kind, but zero tablets of the desired type, we loop forever (see 1.)

Along with fixing the above problem, this change has a few other improvements

  • Always use a short context for topo calls
  • Do retry if a tablet is not found, but only after a delay.

Signed-off-by: deepthi deepthi@planetscale.com

Signed-off-by: deepthi <deepthi@planetscale.com>
@deepthi deepthi requested a review from sougou as a code owner August 6, 2020 23:24
@deepthi
Copy link
Copy Markdown
Collaborator Author

deepthi commented Aug 6, 2020

@sougou can you please review the tests and logging? We don't want to spam the logs but still produce useful information.

deepthi and others added 2 commits August 7, 2020 17:11
Signed-off-by: deepthi <deepthi@planetscale.com>
…teMainflow

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
candidates := tp.getMatchingTablets(ctx)
if len(candidates) == 0 {
// if no candidates were found, sleep and try again
log.Infof("No tablet found for streaming, sleeping for %d", tabletPickerRetryDelay)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I was planning to add something similar.

rohit-nayak-ps and others added 7 commits August 11, 2020 10:41
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
@deepthi deepthi modified the milestone: 7.0.1 Aug 11, 2020
@deepthi deepthi merged commit e16a6c5 into vitessio:master Aug 14, 2020
deepthi added a commit to planetscale/vitess that referenced this pull request Aug 14, 2020
…tessio#6546)

* tablet_picker: keep trying to find a tablet until context expires

Signed-off-by: deepthi <deepthi@planetscale.com>

* tablet_picker: keyspace/shard/cells must be provided

Signed-off-by: deepthi <deepthi@planetscale.com>

* tablet picker: return ks/cell/shard in test result for TestShardMigrateMainflow

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix wrangler tests by choosing a valid source per-test

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix worker and testlib tests

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet_picker: gofmted

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: fix test races

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: gofmt

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* tablet picker: gofmt

Signed-off-by: Rohit Nayak <rohit@planetscale.com>

* healthcheck: cleanup test code where we were setting the same flag twice

Signed-off-by: deepthi <deepthi@planetscale.com>

Co-authored-by: Rohit Nayak <rohit@planetscale.com>
@deepthi deepthi deleted the ds-fix-tp-busyloop branch September 9, 2020 23:54
@askdba askdba added this to the v8.0 milestone Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants