-
Notifications
You must be signed in to change notification settings - Fork 225
Conversation
Other changes:
|
Another consideration: I don't think either the gce/azure variant output a user-friendly kubeconfig at the end for connectivity outside of the cluster. |
for reference, here's a kube-proxy daemonset yaml that works with this: https://github.com/colemickens/polykube/blob/e5e6f463e0aae8e4dee204f5496a066f932f438b/kubernetes/kube-proxy.yaml |
Thanks Cole! This looks great, I will give this a review soon. cc @errordeveloper |
(Another random question I've had -- is it completely safe to run kubelet in container now with Docker 1.11.x? Like volume/secret mounts will work?) |
Secrets work for sure. Formal validation is still required, but it On Tue, 14 Jun 2016, 00:47 Cole Mickens, [email protected] wrote:
|
(I guess I should've known as much -- I deployed my test app using secrets to the cluster I brought up with this.) |
…as a SAN. Let kubelet restart. Add set-kubeconfig helper for configuring kubeconfig after deployment. Add note about extra-sans-ip being unused.
Flakiness is fixed as far as I can tell - etcd had restarted, it hadn't been persisting anywhere on the host and the apiserver/kubelets did not get along. I suspect that kubelets weren't completely re-registering and so were just failing to update their status since the apiserver didn't know about them. Either way, etcd is persisting now and I haven't had problems with nodes disappearing anymore. I also had to add the cluster ip for the kubernetes service as a SAN so that the certs would be valid for in-cluster-config usages (kubernetes-dashboard was complaining. I don't know if it pre-resolves the kubernetes service clusterip or if it's made to not rely on dns being available. Regardless...) Added a helper for configuring kubectl after deployment. Probably should be temporary or should be improved as it requires the user to manually get the MASTER_IP and set CLUSTER_NAME even those things can be acquired through Terraform and/or inspecting |
"--etcd-servers=http://127.0.0.1:2379", | ||
"--cloud-provider={{ phase1['cloud_provider'] }}", | ||
{% if phase1['cloud_provider'] == "azure" %} | ||
"--cloud-config=/etc/kubernetes/azure.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For comparison, this could be done with jsonnet like this:
local build_params(arr) =
std.flattenArrays(std.filter(function(a) a != null, arr));
build_params([
[
"--this",
"--that",
],
if "gce" == "gce" then
["--life"],
if "azure" == "gce" then
["--death"]
else
["--parties"],
])
produces:
[
"--this",
"--that",
"--life",
"--parties"
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do you think that is better? I think the j2 version is much easier to parse/grok at a glance, albeit with slightly more json cruft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid a templating language with a python dependency because of kubernetes/kubernetes#27547. I prefer to use a language with an interpreter that can be statically linked, e.g. go templates or jsonnet. I don't like using text templating langauges to template data formats so I lean towards jsonnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this running from within the ansible container anyway? I'm fine with changing it, but I think it's orthogonal to whether or not python is on the host, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so much that we can't possible get it to run on these os'es. It's more about the size of the dependency. python:2.7-slim was at 200MB when I looked. This is larger then the GCI vm image. This hits us on time to boot and is prohibitive. So now we have code like this https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L813-L845. There's a benifit to sharing templates of some form for the kubernetes components but I don't think that it's not long term viable to do this with jinja. In comparison jsonnet is a couple of MBs and only has a dependency on libc (when it's not statically compiled). Go templates would be similarly lean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've also noticed that pulling the install-k8s container adds noticeable time to the deployment.
I'll convert it back to jsonnet in my kubernetes-anywhere
branch.
So we were using /var/etcd instead of /var/lib/etcd before this change which caused data to not be saved between etcd container restarts? |
DNS.4 = kubernetes.default.svc.cluster.local | ||
DNS.5 = names.master_vm | ||
IP.1 = ${azurerm_public_ip.pip.ip_address} | ||
IP.2 = 10.3.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make node cidr, service cidr, pod cidr configurable ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
On Tue, 14 Jun 2016, 22:26 Mike Danese, [email protected] wrote:
In min-turnup/phase1/azure/lib/azure.jsonnet
#106 (comment):
pip: {
resource_group_name: "${azurerm_resource_group.rg.name}",
location: cfg.azure.location,
name: names.master_public_ip,
public_ip_address_allocation: "static",
provisioner: [{
"local-exec": {
command: |||
cat <<EOF > ../../phase1b/crypto/san-extras
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
DNS.5 = names.master_vm
IP.1 = ${azurerm_public_ip.pip.ip_address}
IP.2 = 10.3.0.1
We should probably make node cidr, service cidr, pod cidr configurable
ASAP.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kube-deploy/pull/106/files/f56f5e25dce862b13c240daede4900fe2e9c680e#r67060401,
or mute the thread
https://github.com/notifications/unsubscribe/AAPWS4rcn0N4jZ2HvDAL4Jnapaf7vQnvks5qLxx7gaJpZM4IzsbA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a combination of template and file provisioner could probably be better here... (Please ignore, that won't work for various reasons.) Anyhow, it'd be great to investigate if TLS provider would do the job (see #35).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using terraform to build the certs SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good to me, but I don't want to try to add that on top of this PR. I can potentially look at in the future (but that would be after azure-cloudprovider).
@colemickens what are your thoughts after implementing a phase 1? How was it better than and worse then implementing a kube-up.sh provider? Excluding the rewrite of the manifest files, the diff seems fairly small. |
Thank you both for the review and comments. They all make sense and I'll address these ASAP, but it may take a few days - I'm out of town right now for my brother's wedding and it's taking more of my attention than I'd expected. (I'm also simultaneously trying to get the cloudprovider itself out for review.) |
@mikedanese I muddled the etcd/flakiness discussion, sorry. I had removed the hostPath volume entirely for Azure since it was using the GCE PD mount path. The fix was that I re-added the volume, not that I changed the path (that was unintentional). I've actually reverted the path to |
I've responded to or fixed all of the comments and pushed a new commit. Note that I didn't revert j2->jsonnet or fix the short abbreviated names in Terraform, but I did offer my comments about those suggestions above. Overall, yes, this was more straightforward and less painful than diving into the Salt mine. I don't have a lot else to say other than that, though the thought in the back of my mind is that the Salt kube-up.sh did more than this is doing. That might just be addons though, I can't think of much else. I have several questions though:
And then these additional concerns:
|
Unfortunately terraform rc2 seems to have a blocking regression from rc1: hashicorp/terraform#7248 |
After seeing how far ahead https://github.com/colemickens/kubernetes-anywhere/commits/azure There are some hacks in there to work around Terraform bugs and some weird bug I'm now experiencing in Docker. They're noted in comments with links to the upstream bugs. It's not a super nice git history. I have fixed some typos and made some slight, but hopefully appropriate changes to the GCE phase1. I can split this into smaller commits if needed. That's on top of the new, unmerged The Azure deployment has an extra feature, it writes out a |
Also you need to run make fmt to format the jsonnet. |
Code is gone. |
I changed:
--service-cluster-ip-range
to not overlap with the ip range that will be used for the actual nodes.--cloud-config
flaghyperkube-amd64
instead of thekube-{apiserver,scheduler,controller-manager}
images.kubelet.service
was already usinghyperkube-amd64
so it's already on box, plus it makes it easier for me to test changes to cloudprovider if I only have to rebuild and upload hyperkube instead of 4+ containers.Dockerfile
that hasterraform
,jq
,jsonnet
, etc. (especially since this relies of a pre-release version of terraform [https://github.com/azurerm: Can't enable ipv4_forwarding on a NIC hashicorp/terraform#6803]).This worksaround Azure's lack of metadata service by using terraform's
template_file
functionality. I use terraform'sbase64encode
andfile
functions to access the contents of the crypto assets and usetemplate_file
to interpolate them into aconfigure-vm.sh
template that is rendered and passed ascustom_data
to the VM. Similarly it passes through some environment information and Active Directory credentials that are required by thecloudprovider
implementation. Thetemplate_file
approach was chosen instead of using theremote-exec
andfile
provisioners in terraform because they don't work easily for Azure currently: hashicorp/terraform#7122.This relies on the
azure
cloudprovider
, so it current defaults to my ownhyperkube-amd
repo/image. (I'm blocked on sending the PR for thecloudprovider
for reasons... but hopefully that will stop being a thing soon)It should probably include a script to help create the Azure Service Principal that could then be used by Terraform and passed through for
cloudprovider
.So, it's not ready to merge yet for reasons mentioned above, but I wanted to get this out for any comments. I was able to deploy a 20 cluster on the first try after getting the single master/node working: http://i.imgur.com/boyFZsw.png.
The biggest issue I see is that
addons
seem to be unaddressed. For now I haven't addedkube-proxy
as a manifest pod since I assume we're going to run it via aDaemonSet
addon....