Skip to content

helm: add InitShardMaster Job#3612

Merged
enisoc merged 13 commits intovitessio:masterfrom
derekperkins:init-shard-master
Feb 5, 2018
Merged

helm: add InitShardMaster Job#3612
enisoc merged 13 commits intovitessio:masterfrom
derekperkins:init-shard-master

Conversation

@derekperkins
Copy link
Copy Markdown
Member

To make the helm install process more seamless, this will automatically initialize the shard master. One job is created per shard.

@derekperkins derekperkins force-pushed the init-shard-master branch 3 times, most recently from f45ce67 to dc5b82b Compare February 1, 2018 00:32
@derekperkins
Copy link
Copy Markdown
Member Author

This ended up being surprisingly difficult, and Go templates don't provide any way for me to add up the replicas on behalf of the user. That's not very good, and the wrong number will cause problems.

Otherwise I think this will be super useful to anyone booting up a Vitess cluster for the first time.

@derekperkins
Copy link
Copy Markdown
Member Author

@enisoc this is ready for review

@derekperkins
Copy link
Copy Markdown
Member Author

Here's a sample of the logs, and everything looks good, including the error retry logic.

+ shardTablets='zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '$4 == "master" {print $1}'
+ masterTablet=
+ '[' ']'
++ awk '{print $1}'
++ wc
++ echo 'zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
+ tabletCount=1
+ '[' 1 == 2 ']'
+ sleep 5
+ '[' ']'
++ vtctlclient -server vtctld.vitess:15999 ListAllTablets zone1
+ cellTablets='zone1-0575853000 sharded-db -80 replica zone1-sharded-db-x-80-replica-0.vttablet:15002 zone1-sharded-db-x-80-replica-0.vttablet:3306 []
zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-0575853000 sharded-db -80 replica zone1-sharded-db-x-80-replica-0.vttablet:15002 zone1-sharded-db-x-80-replica-0.vttablet:3306 []
zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk 'substr( $5,1,21 ) == "zone1-sharded-db-80-x" {print $0}'
+ shardTablets='zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '$4 == "master" {print $1}'
+ masterTablet=
+ '[' ']'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '{print $1}'
++ wc
+ tabletCount=2
+ '[' 2 == 2 ']'
+ TABLETS_READY=true
+ '[' true ']'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk 'substr( $5,1,31 ) == "zone1-sharded-db-80-x-replica-0" {print $1}'
+ tablet_id=zone1-1104301100
+ vtctlclient -server vtctld.vitess:15999 InitShardMaster -force sharded-db/80- zone1-1104301100
W0201 00:52:40.112918     254 main.go:58] W0201 00:52:40.112648 reparent.go:181] master-elect tablet zone1-1104301100 is not the shard master, proceeding anyway as -force was used
W0201 00:52:40.113460     254 main.go:58] W0201 00:52:40.112720 reparent.go:187] master-elect tablet zone1-1104301100 is not a master in the shard, proceeding anyway as -force was used
E0201 00:52:40.129791     254 main.go:61] Remote error: rpc error: code = Unknown desc = Tablet zone1-1104301100 ResetReplication failed (either fix it, or Scrap it): rpc error: code = Unknown desc = TabletManager.ResetReplication on zone1-1104301100 error: net.Dial(/vtdataroot/tabletdata/mysql.sock) to local server failed: dial unix /vtdataroot/tabletdata/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000);Tablet zone1-1104301101 ResetReplication failed (either fix it, or Scrap it): rpc error: code = Unknown desc = TabletManager.ResetReplication on zone1-1104301101 error: net.Dial(/vtdataroot/tabletdata/mysql.sock) to local server failed: dial unix /vtdataroot/tabletdata/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
+ sleep 5
+ vtctlclient -server vtctld.vitess:15999 InitShardMaster -force sharded-db/80- zone1-1104301100
W0201 00:52:45.216219     262 main.go:58] W0201 00:52:45.216317 reparent.go:181] master-elect tablet zone1-1104301100 is not the shard master, proceeding anyway as -force was used
W0201 00:52:45.216684     262 main.go:58] W0201 00:52:45.216392 reparent.go:187] master-elect tablet zone1-1104301100 is not a master in the shard, proceeding anyway as -force was used

@derekperkins
Copy link
Copy Markdown
Member Author

My only hesitation with this is that it could cause an infinite loop if the total number of tablets is setup incorrectly or if the tablets are never healthy. I'm not sure the best way to handle a timeout during each of the loops. (wishes he had Go context)

@derekperkins
Copy link
Copy Markdown
Member Author

I just added a 10 minute timeout that will cause the job to fail so it doesn't run indefinitely

@derekperkins derekperkins force-pushed the init-shard-master branch 2 times, most recently from d2ea052 to 8592fec Compare February 3, 2018 05:05
@enisoc
Copy link
Copy Markdown
Member

enisoc commented Feb 3, 2018

Nothing is impossible if you believe hard enough. Also Sprig is available in Helm.

{{- define "tablet-counts" -}}
{{- range . -}}
{{- repeat (int .vttablet.replicas) "x" -}}
{{- end -}}
{{- end -}}
{{ $totalTabletCount := len (include "tablet-counts" $shard.tablets) }}

@derekperkins
Copy link
Copy Markdown
Member Author

I'm astounded by your ingenuity with that. I spent a good 30+ minutes trying to figure out how to append or set inside a map or something to get around that. I'm super impressed, while still being baffled that assigning to variables in Go templates isn't a thing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Operator, I have a few Jobs that stick around as a record that "this was already done". For example, if you delete them, they will run again every time you helm upgrade anyway.

If the Jobs were truly one-off, it would be important to delete them since the number of finished Jobs could grow without bound. However, since we are deterministically creating only one Job per shard, and uninstalling the chart should delete any Jobs we create, I don't think it's harmful to keep the Jobs around.

Note that the Pod GC will clean up the terminated Pods eventually, yet the Job will remember in its status that it already completed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change the comment to reflect that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that there could temporarily be no tablet of type master, even post-ISM. We need to be careful not to run ISM again in this case (especially with -force), since it can cause data loss.

I think a somewhat more reliable signal would be an empty master_alias entry in the result of GetShard. I think even if we transiently don't have a running master, the shard record should still contain the alias of the last known master. I'm not 100% on that though, so we should still try to think if there's a better way to be sure the shard has never been initialized.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do whatever you think is the most reliable. Should I implement your GetShard suggestion or are you looking for a better way?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go inside the else, before the sleep? If the tablets are ready, we don't need to check for timeout.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

-db-config-filtered-uname "vt_filtered"
-db-config-filtered-dbname "vt_{{$keyspace.name}}"
-db-config-filtered-charset "utf8"
{{ if gt (int $shard.tabletCount) 1 }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the computed $totalTabletCount?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was an oversight

@derekperkins
Copy link
Copy Markdown
Member Author

I made all the changes you requested except for the master tablet check, plus I added the semi-sync and heartbeat options.

@derekperkins
Copy link
Copy Markdown
Member Author

I added a second master tablet check using GetShard in addition to the original ListAllTablets check. It will not perform InitShardMaster if either of those calls returns a master. As a part of that, I added jq to the vtctlclient docker image.

@enisoc This is ready for review

Copy link
Copy Markdown
Member

@enisoc enisoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than one comment.

-enable_replication_reporter
{{ if $orc.enabled }}
{{ if $defaultVttablet.enableSemisync }}
{{ if gt $totalTabletCount 1 }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot until you brought it up in another context that rdonly tablets don't ACK. So we should actually check the replica count of only the replica type tablets to avoid getting stuck during ISM.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about eliminating that check altogether? It felt somewhat weird to not enable semisync when the user explicitly enabled it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good to me. But maybe add a comment above enableSemiSync in values.yaml that you need at least 2 replica-type (master-eligible) tablets.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@enisoc
Copy link
Copy Markdown
Member

enisoc commented Feb 5, 2018

LGTM

Approved with PullApprove

@enisoc enisoc merged commit 4d73828 into vitessio:master Feb 5, 2018
@derekperkins derekperkins deleted the init-shard-master branch March 2, 2018 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants