Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi rack support #192

Merged
merged 34 commits into from
Aug 21, 2019

Conversation

alourie
Copy link
Contributor

@alourie alourie commented Jul 5, 2019

first mvp

Seems to be working ok for creating clusters, should be ok for scaling up, but probably not for scaling down

* Change proj structure to follow the same pattern for CDC/Backup
controllers
* Refactor code for better naming and convenience
* Add StatefulSet handling
* Add stub backup code
* Add backupType field to BackupOperation on java/golang sidecar.

Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
* Support multi-racks
* Probably doesn't support scaling down properly

Signed-off-by: Alex Lourie <[email protected]>
Signed-off-by: Alex Lourie <[email protected]>
@alourie
Copy link
Contributor Author

alourie commented Jul 5, 2019

I'll squash this when merging, don't worry

@alourie
Copy link
Contributor Author

alourie commented Jul 25, 2019

Should be ok now for deploying and scaling, as well as some basic e2e testing.

@zegelin
Copy link
Contributor

zegelin commented Jul 26, 2019

Some general comments:

  • Instead of a ConfigMap per rack (yuck!), let's instead implement a custom Snitch class that somehow can infer the rack and dc from the environment.

  • Some places seem to deal with Rack structs, others map[string]int32, and other places just string. Let's switch everything to Rack for consistency.

  • Maybe I missed it somewhere, but if n-racks = n-statefulsets, there should be a loop somewhere that loops over each rack and tries to reconcile each statefulset, no?

pkg/controller/cassandradatacenter/configmap.go Outdated Show resolved Hide resolved
pkg/controller/cassandradatacenter/reconciler.go Outdated Show resolved Hide resolved
}

// fine, so we have a mismatch. Will seek to reconcile.
rackSpec, err := getRackSpec(rctx, request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this more -- we only reconcile one Rack per reconcile? What triggers the additional reconciles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we now only reconcile one rack per reconcile. You are correct that at the moment it doesn't support config changes/upgrades, but scaling up/down will work because k8 will reconcile until the total number of pods matches the spec.

@alourie
Copy link
Contributor Author

alourie commented Jul 27, 2019

Thanks @zegelin.

  1. I will look into dropping the rack/dc data into the pod/C* container and configuring it either via script or snitch. This will eliminate the need for multiple config maps and will get us a bit closer to doing it all with 1 StatefulSet (not yet possible really).
  2. Yes, Rack is better, I only recently introduced it as a convenience and haven't gotten to fixing all the places yet. Will do so.
  3. We don't need to loop over racks to reconcile because we have it for free from k8 which will call the Reconciler until all the racks (StatefulSets) are reconciled. So we only care for finding any 1 rack that needs reconciling and do that.

Also, at the moment all this wouldn't support upgrades or conf changes (such as different image url/version or hardware changes or whatever), as if there is no change in replica numbers, nothing will be done. I will remove that check and will see how much work needed for taking care of that (considering that for some of those changes we would need rolling restarts of the C* containers). Should we support it for the "first cut"?

Signed-off-by: Alex Lourie <[email protected]>
@@ -12,4 +12,7 @@ do
done
)

# Update the rack settings from env
sed -i'' "s/rack=.*$/rack=${CASSANDRA_RACK}/g" /etc/cassandra/cassandra-rackdc.properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just one possible solution, to see if it works. It does.

@@ -46,6 +46,9 @@ spec:
type: integer
prometheusSupport:
type: boolean
racks:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know which racks the c* cluster is using? We need to make sure we can assign to the correct user defined fault domains in Kubernetes. See https://kubernetes.io/docs/setup/best-practices/multiple-zones/

Each stateful set should apply labels via the pod template to assign which failure domain the stateful set is applied to. See https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozone

In terms of the CRD definition, the user should at a minimum be able to specify which failure domains are available to the cluster. This way we can create the right number of statefulsets and ensure the correct labels are applied to them.

See https://kubernetes.io/docs/reference/kubernetes-api/labels-annotations-taints/#failure-domainbetakubernetesiozone

Signed-off-by: Alex Lourie <[email protected]>
@alourie
Copy link
Contributor Author

alourie commented Aug 9, 2019

@z @benbromhead the last version seems to be a working model with 1 configmap and stateful set per rack. The rack selection is performed based on user input and used as label selectors on the nodes, distribution is calculated automatically and applied separately. The rack scaleup/decommission is performed in a round-robin algorithm.

@alourie
Copy link
Contributor Author

alourie commented Aug 9, 2019

after merge conflict resolved, ready to merge.

@alourie
Copy link
Contributor Author

alourie commented Aug 20, 2019

merge conflicts resolved again, can squash and merge.

@benbromhead
Copy link
Collaborator

@alourie can you rebase and I'll merge

@benbromhead
Copy link
Collaborator

Due to PR for secrets etc

@benbromhead benbromhead merged commit 1d075a2 into instaclustr:superdupertopsecretrewrite Aug 21, 2019
@alourie alourie mentioned this pull request Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants