Skip to content

Rebuild SrvVSchema during InitTablet.#6276

Merged
sougou merged 1 commit intovitessio:masterfrom
planetscale:rebuild-vschema
Jun 6, 2020
Merged

Rebuild SrvVSchema during InitTablet.#6276
sougou merged 1 commit intovitessio:masterfrom
planetscale:rebuild-vschema

Conversation

@enisoc
Copy link
Copy Markdown
Member

@enisoc enisoc commented Jun 5, 2020

We used to rebuild SrvVSchema in all cells as part of running GetOrCreateShard from any cell. That caused VSchema problems to spread instantly to all cells at once, so we changed EnsureVSchema to only rebuild SrvVSchema in the local cell (#5930).

We believed that at least one tablet in a given cell would still end up rebuilding SrvVschema in the cell since they all call GetOrCreateShard at startup. However, I didn't realize that GetOrCreateShard
short-circuits if the shard already exists, so it was skipping EnsureVSchema entirely. As a result, only one cell (corresponding to the one tablet that won the race) actually had its SrvVSchema rebuilt.

This moves the rebuild of SrvVSchema to be an explicit step in InitTablet alongside the rebuild of the keyspace graph. This keeps GetOrCreateShard as a task that only needs to be done by one tablet globally because it affects global topo. The rebuilding of per-cell records should be done by a tablet in that cell.

@enisoc enisoc requested review from deepthi and zmagg June 5, 2020 21:56
@enisoc enisoc requested a review from sougou as a code owner June 5, 2020 21:56
We used to rebuild SrvVSchema in all cells as part of running
GetOrCreateShard from any cell. That caused VSchema problems to spread
instantly to all cells at once, so we changed EnsureVSchema to only
rebuild SrvVSchema in the local cell.

We believed that at least one tablet in a given cell would still end up
rebuilding SrvVschema in the cell since they all call GetOrCreateShard
at startup. However, I didn't realize that GetOrCreateShard
short-circuits if the shard already exists, so it was skipping
EnsureVSchema entirely.

This moves the rebuild of SrvVSchema to be an explicit step in
InitTablet alongside the rebuild of the keyspace graph. This keeps
GetOrCreateShard as a task that only needs to be done by one tablet
globally because it affects global topo. The rebuilding of per-cell
records should be done by a tablet in that cell.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc force-pushed the rebuild-vschema branch from b22d9d6 to ba4a04b Compare June 5, 2020 22:54
switch {
case err == nil:
// Check if vschema was rebuilt after the initial creation of the keyspace.
if _, keyspaceExists := srvVSchema.GetKeyspaces()[*initKeyspace]; !keyspaceExists {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also need a RebuildKeyspaceGraph?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Are you saying if a keyspace is missing from SrvVSchema, it implies that RebuildKeyspace has not run for that keyspace?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that SrvKeyspace and SrvVSchema were completely independent. I'll double check.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. I think they are independent.
But I also think that a missing SrvKeyspace is going to be a problem. And now that I think about it, there are some workflows where this has led to odd errors.
However, I think that's an orthogonal issue, and it's better to resolve that separately.

switch {
case err == nil:
// Check if vschema was rebuilt after the initial creation of the keyspace.
if _, keyspaceExists := srvVSchema.GetKeyspaces()[*initKeyspace]; !keyspaceExists {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. I think they are independent.
But I also think that a missing SrvKeyspace is going to be a problem. And now that I think about it, there are some workflows where this has led to odd errors.
However, I think that's an orthogonal issue, and it's better to resolve that separately.

@sougou sougou merged commit 19bc9d4 into vitessio:master Jun 6, 2020
@enisoc enisoc deleted the rebuild-vschema branch June 8, 2020 23:35
@deepthi deepthi added this to the v7.0 milestone Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants