Skip to content

Healthcheck - limit healthchecks by keyspace#5815

Merged
rafael merged 1 commit intovitessio:masterfrom
tinyspeck:sp-limit-hc-by-keyspace
Feb 21, 2020
Merged

Healthcheck - limit healthchecks by keyspace#5815
rafael merged 1 commit intovitessio:masterfrom
tinyspeck:sp-limit-hc-by-keyspace

Conversation

@spark4
Copy link
Copy Markdown
Contributor

@spark4 spark4 commented Feb 13, 2020

Overview

Currently, vtgates support a keyspaces_to_watch flag which can enable data isolation by keyspace. However, vtgates still perform healthchecks against all tablets regardless of the value provided to this flag.

This PR introduces changes to healthchecks such that when the keyspaces_to_watch flag is utilized, vtgates will only maintain healthchecks on tablets in keyspaces that the vtgate has access to.

--
This closes #5387.
Relevant PR: #4420

Implementation

  1. Introduce a new TabletRecorder filter, FilterByKeyspace, which will only add tablets from keyspaces that the vtgate has access to
  2. Update the discovery gateway to use this filter when values for keyspaces_to_watch is passed in

Testing

In addition to unit testing, I performed the following manual tests:

  • Confirmed that vtgates that use the keyspaces_to_watch flag were only performing healthchecks on tablets from the desired keyspace
  • Confirmed that vtgates that do not use the keyspaces_to_watch flag performed health checks on the expected tablets
  • Confirmed on a vtctld that various keyspace commands continue to work as expected
  • Confirmed that backups continue to work as expected

@spark4 spark4 requested a review from sougou as a code owner February 13, 2020 23:32
@spark4 spark4 requested a review from rafael February 14, 2020 00:06
@sougou sougou requested a review from deepthi February 14, 2020 16:04
Copy link
Copy Markdown
Member

@rafael rafael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serry, this is looking good to me. Added some minor comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for: fbk.isIncluded(new) might be redundant. A tablet shouldn't be able to change keyspaces. So if it was included in the old it should be included in new.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, maybe we should check for those failure conditions - isIncluded(new) && !isIncluded(old) and the reverse - and return a FAILED_PRECONDITION error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it was included in the old it should be included in new.

☝️ I'm probably missing some important context here -- can you expand more about what you mean here? My understanding is that given that tablets can't change keyspaces, the fbk.isIncluded(old) is the unnecessary check here (as opposed to fbk.isIncluded(new)).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I posted the above comment on a non-refreshed page - just now seeing @deepthi's comment.

@deepthi it looks like the callers of ReplaceTablet do not have mechanisms for handling these types of errors. Do we know what the expected behavior for FAILED_PRECONDITION errors during tablet replacement is?

Given the low likelihood of us hitting this case, would logging be sufficient here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see.. the interface won't let you return an error.
Yes, logging it would be fine.

@deepthi
Copy link
Copy Markdown
Collaborator

deepthi commented Feb 14, 2020

Nicely written PR description!
Can you clarify the following?

  • Confirmed that vtgates that use the keyspaces_to_watch flag were only performing healthchecks on tablets from the desired keyspace.
  • Confirmed that vtgates that do use the keyspaces_to_watch flag performed health checks on the expected tablets.

These two look similar. Maybe the second case was for vtgates that don't use the keyspaces_to_watch flag?

@spark4
Copy link
Copy Markdown
Contributor Author

spark4 commented Feb 14, 2020

Hey @deepthi - good catch. Yes, the second one was meant to say do not use. I will update the PR description now!

@spark4 spark4 force-pushed the sp-limit-hc-by-keyspace branch 6 times, most recently from 4f58466 to 55f869c Compare February 20, 2020 15:42
Signed-off-by: Serry Park <serrypark@slack-corp.com>

Signed-off-by: Serry Park <me@serry.co>
@spark4 spark4 force-pushed the sp-limit-hc-by-keyspace branch from 87a63c4 to a4055f1 Compare February 21, 2020 19:51
Copy link
Copy Markdown
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@rafael rafael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rafael rafael merged commit edb59bd into vitessio:master Feb 21, 2020
@rafael rafael deleted the sp-limit-hc-by-keyspace branch February 21, 2020 21:27
@spark4 spark4 restored the sp-limit-hc-by-keyspace branch March 2, 2020 22:03
ajm188 pushed a commit to tinyspeck/vitess that referenced this pull request Mar 11, 2020
Healthcheck - limit healthchecks by keyspace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

keyspaces_to_watch does not limit vtgate->tablet health checks

3 participants