helm: add InitShardMaster Job by derekperkins · Pull Request #3612 · vitessio/vitess

derekperkins · 2018-01-31T18:23:01Z

To make the helm install process more seamless, this will automatically initialize the shard master. One job is created per shard.

derekperkins · 2018-02-01T00:35:04Z

This ended up being surprisingly difficult, and Go templates don't provide any way for me to add up the replicas on behalf of the user. That's not very good, and the wrong number will cause problems.

Otherwise I think this will be super useful to anyone booting up a Vitess cluster for the first time.

derekperkins · 2018-02-01T00:35:11Z

@enisoc this is ready for review

derekperkins · 2018-02-01T01:25:35Z

Here's a sample of the logs, and everything looks good, including the error retry logic.

+ shardTablets='zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '$4 == "master" {print $1}'
+ masterTablet=
+ '[' ']'
++ awk '{print $1}'
++ wc
++ echo 'zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
+ tabletCount=1
+ '[' 1 == 2 ']'
+ sleep 5
+ '[' ']'
++ vtctlclient -server vtctld.vitess:15999 ListAllTablets zone1
+ cellTablets='zone1-0575853000 sharded-db -80 replica zone1-sharded-db-x-80-replica-0.vttablet:15002 zone1-sharded-db-x-80-replica-0.vttablet:3306 []
zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-0575853000 sharded-db -80 replica zone1-sharded-db-x-80-replica-0.vttablet:15002 zone1-sharded-db-x-80-replica-0.vttablet:3306 []
zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk 'substr( $5,1,21 ) == "zone1-sharded-db-80-x" {print $0}'
+ shardTablets='zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '$4 == "master" {print $1}'
+ masterTablet=
+ '[' ']'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk '{print $1}'
++ wc
+ tabletCount=2
+ '[' 2 == 2 ']'
+ TABLETS_READY=true
+ '[' true ']'
++ echo 'zone1-1104301100 sharded-db 80- replica zone1-sharded-db-80-x-replica-0.vttablet:15002 zone1-sharded-db-80-x-replica-0.vttablet:3306 []
zone1-1104301101 sharded-db 80- replica zone1-sharded-db-80-x-replica-1.vttablet:15002 zone1-sharded-db-80-x-replica-1.vttablet:3306 []'
++ awk 'substr( $5,1,31 ) == "zone1-sharded-db-80-x-replica-0" {print $1}'
+ tablet_id=zone1-1104301100
+ vtctlclient -server vtctld.vitess:15999 InitShardMaster -force sharded-db/80- zone1-1104301100
W0201 00:52:40.112918     254 main.go:58] W0201 00:52:40.112648 reparent.go:181] master-elect tablet zone1-1104301100 is not the shard master, proceeding anyway as -force was used
W0201 00:52:40.113460     254 main.go:58] W0201 00:52:40.112720 reparent.go:187] master-elect tablet zone1-1104301100 is not a master in the shard, proceeding anyway as -force was used
E0201 00:52:40.129791     254 main.go:61] Remote error: rpc error: code = Unknown desc = Tablet zone1-1104301100 ResetReplication failed (either fix it, or Scrap it): rpc error: code = Unknown desc = TabletManager.ResetReplication on zone1-1104301100 error: net.Dial(/vtdataroot/tabletdata/mysql.sock) to local server failed: dial unix /vtdataroot/tabletdata/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000);Tablet zone1-1104301101 ResetReplication failed (either fix it, or Scrap it): rpc error: code = Unknown desc = TabletManager.ResetReplication on zone1-1104301101 error: net.Dial(/vtdataroot/tabletdata/mysql.sock) to local server failed: dial unix /vtdataroot/tabletdata/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
+ sleep 5
+ vtctlclient -server vtctld.vitess:15999 InitShardMaster -force sharded-db/80- zone1-1104301100
W0201 00:52:45.216219     262 main.go:58] W0201 00:52:45.216317 reparent.go:181] master-elect tablet zone1-1104301100 is not the shard master, proceeding anyway as -force was used
W0201 00:52:45.216684     262 main.go:58] W0201 00:52:45.216392 reparent.go:187] master-elect tablet zone1-1104301100 is not a master in the shard, proceeding anyway as -force was used

derekperkins · 2018-02-01T23:32:08Z

My only hesitation with this is that it could cause an infinite loop if the total number of tablets is setup incorrectly or if the tablets are never healthy. I'm not sure the best way to handle a timeout during each of the loops. (wishes he had Go context)

derekperkins · 2018-02-02T20:34:59Z

I just added a 10 minute timeout that will cause the job to fail so it doesn't run indefinitely

enisoc · 2018-02-03T08:03:23Z

Nothing is impossible if you believe hard enough. Also Sprig is available in Helm.

{{- define "tablet-counts" -}}
{{- range . -}}
{{- repeat (int .vttablet.replicas) "x" -}}
{{- end -}}
{{- end -}}

{{ $totalTabletCount := len (include "tablet-counts" $shard.tablets) }}

derekperkins · 2018-02-03T08:34:47Z

I'm astounded by your ingenuity with that. I spent a good 30+ minutes trying to figure out how to append or set inside a map or something to get around that. I'm super impressed, while still being baffled that assigning to variables in Go templates isn't a thing.

Necessary for conditionally enabling semi sync if greater than one Also used when waiting to run InitShardMaster to make sure we don’t orphan any slaves

enisoc · 2018-02-03T22:15:49Z

helm/vitess/templates/NOTES.txt

In the Operator, I have a few Jobs that stick around as a record that "this was already done". For example, if you delete them, they will run again every time you helm upgrade anyway.

If the Jobs were truly one-off, it would be important to delete them since the number of finished Jobs could grow without bound. However, since we are deterministically creating only one Job per shard, and uninstalling the chart should delete any Jobs we create, I don't think it's harmful to keep the Jobs around.

Note that the Pod GC will clean up the terminated Pods eventually, yet the Job will remember in its status that it already completed.

Ok, I'll change the comment to reflect that.

enisoc · 2018-02-03T22:18:06Z

helm/vitess/templates/_vttablet.tpl

It's possible that there could temporarily be no tablet of type master, even post-ISM. We need to be careful not to run ISM again in this case (especially with -force), since it can cause data loss.

I think a somewhat more reliable signal would be an empty master_alias entry in the result of GetShard. I think even if we transiently don't have a running master, the shard record should still contain the alias of the last known master. I'm not 100% on that though, so we should still try to think if there's a better way to be sure the shard has never been initialized.

I can do whatever you think is the most reliable. Should I implement your GetShard suggestion or are you looking for a better way?

enisoc · 2018-02-03T22:20:52Z

helm/vitess/templates/_vttablet.tpl

Should this go inside the else, before the sleep? If the tablets are ready, we don't need to check for timeout.

enisoc · 2018-02-03T22:24:05Z

helm/vitess/templates/_vttablet.tpl

        -db-config-filtered-uname "vt_filtered"
        -db-config-filtered-dbname "vt_{{$keyspace.name}}"
        -db-config-filtered-charset "utf8"
+{{ if gt (int $shard.tabletCount) 1 }}


Can this use the computed $totalTabletCount?

Yeah, that was an oversight

Also use calculated totalTabletCount

derekperkins · 2018-02-03T22:53:51Z

I made all the changes you requested except for the master tablet check, plus I added the semi-sync and heartbeat options.

derekperkins · 2018-02-04T02:53:35Z

I added a second master tablet check using GetShard in addition to the original ListAllTablets check. It will not perform InitShardMaster if either of those calls returns a master. As a part of that, I added jq to the vtctlclient docker image.

@enisoc This is ready for review

enisoc

Looks good other than one comment.

enisoc · 2018-02-05T21:52:31Z

helm/vitess/templates/_vttablet.tpl

        -enable_replication_reporter
-{{ if $orc.enabled }}
+{{ if $defaultVttablet.enableSemisync }}
+  {{ if gt $totalTabletCount 1 }}


I forgot until you brought it up in another context that rdonly tablets don't ACK. So we should actually check the replica count of only the replica type tablets to avoid getting stuck during ISM.

What would you think about eliminating that check altogether? It felt somewhat weird to not enable semisync when the user explicitly enabled it.

Yeah that sounds good to me. But maybe add a comment above enableSemiSync in values.yaml that you need at least 2 replica-type (master-eligible) tablets.

enisoc · 2018-02-05T22:21:10Z

LGTM

googlebot added the cla: yes label Jan 31, 2018

derekperkins force-pushed the init-shard-master branch 3 times, most recently from f45ce67 to dc5b82b Compare February 1, 2018 00:32

derekperkins force-pushed the init-shard-master branch from 4326045 to 5b0366a Compare February 2, 2018 20:11

derekperkins force-pushed the init-shard-master branch 2 times, most recently from d2ea052 to 8592fec Compare February 3, 2018 05:05

derekperkins added 5 commits February 3, 2018 15:20

helm: add InitShardMaster Job

f109905

helm: disable semi_sync if only one tablet

c0a8162

helm: add note to delete init-shard-master jobs

3b94173

helm: timeout InitShardMaster after 10 minutes

b2ea065

helm: calculate total tablets in helm chart

224a6bd

Necessary for conditionally enabling semi sync if greater than one Also used when waiting to run InitShardMaster to make sure we don’t orphan any slaves

derekperkins force-pushed the init-shard-master branch from eb73a0c to 224a6bd Compare February 3, 2018 22:21

enisoc reviewed Feb 3, 2018

View reviewed changes

derekperkins added 5 commits February 3, 2018 15:25

helm: add enable heartbeat option to vttablet

237a4bb

helm: add enable semi-sync option to vttablet

37e28dd

Also use calculated totalTabletCount

helm: check ISM timeout if tablets are not ready

a7645aa

helm: remove recommendation to remove ISM Job

57046f6

helm: use heartbeat enabled option in Orc config

1d987cb

derekperkins added 2 commits February 3, 2018 18:28

docker: add jq to vtctlclient Dockerfile

6627aeb

helm: check GetShard for master tablet before ISM

820081c

enisoc reviewed Feb 5, 2018

View reviewed changes

helm: change enableSemisync to >2 replica warning

d246e8e

enisoc merged commit 4d73828 into vitessio:master Feb 5, 2018

derekperkins deleted the init-shard-master branch March 2, 2018 21:52

Conversation

derekperkins commented Jan 31, 2018

Uh oh!

derekperkins commented Feb 1, 2018

Uh oh!

derekperkins commented Feb 1, 2018

Uh oh!

derekperkins commented Feb 1, 2018

Uh oh!

derekperkins commented Feb 1, 2018

Uh oh!

derekperkins commented Feb 2, 2018

Uh oh!

enisoc commented Feb 3, 2018

Uh oh!

derekperkins commented Feb 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekperkins commented Feb 3, 2018

Uh oh!

derekperkins commented Feb 4, 2018

Uh oh!

enisoc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enisoc commented Feb 5, 2018 • edited by alainjobart Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enisoc commented Feb 5, 2018 •

edited by alainjobart

Loading