Multi-platform clusters #163

malaskim · 2017-02-13T15:48:46Z

Hi everyone,

I'm trying to set up a multi-platform cluster with a master node on Ubuntu (amd64), and some worker nodes on raspberry pi3 (arm, with hypriotOS)
I can get the cluster set up with the tutorial here (https://kubernetes.io/docs/getting-started-guides/kubeadm/, awesome work by the way!!)

However when I tried to start a container with
kubectl run hypriot --image=hypriot/rpi-busybox-httpd --replicas=1 --port=80
I got the following error : Error syncing pod, skipping: failed to "SetupNetwork" for "hypriot-1452852107-f11tt_default" with SetupNetworkError: "Failed to setup network for pod "hypriot-1452852107-f11tt_default(527e63f1-f1f4-11e6-b5e5-080027eb4e93)" using network plugins "cni": cni config unintialized; Skipping pod"

Whereas I've successfully started flannel on the master node with this command :
kubectl create -f https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml
All the services are running (dns, flannel,proxy etc...) on the master node

If I reproduce this set up with the master node on a raspberry Pi (then all the cluster is on arm), everything works :)

As suggested in the section Multi-platform clusters on https://porter.io/github.com/luxas/kubernetes-on-arm, I've also try to run the command for all the architecture but I got the following error :
Error from server (AlreadyExists): error when creating "STDIN": serviceaccounts "flannel" already exists
Error from server (AlreadyExists): error when creating "STDIN": configmaps "kube-flannel-cfg" already exists
Error from server (AlreadyExists): error when creating "STDIN": daemonsets.extensions "kube-flannel-ds" already exists

I also figured out that after running the "kubeadm join" command on one of my raspberry, there is no kube-proxy and kube-flannel created on the Rpi3 when the master node is amd64, whereas it is indeed created if the master node is on the same platform (i.e another Rpi3)

Looks like I'm missing something about the multi-architecture configuration here...if anyone has any idea ?

Thanks you for your help !

GheRivero · 2017-02-14T09:47:55Z

The main issue is that flannel manifest has amd64 arch hard-coded into it:

quay.io/coreos/flannel:v0.7.0-amd64

I know there are some efforts to support multi platform images (also called "manifest" in docker) that will allow to download the proper image just using a common name, so there is no need to one manifest per arch... But it's a WIP.

As a workaround: Modify the flannel manifest locally to match the arm64 arch and see what happens (should fail in the master node though...)

malaskim · 2017-02-14T11:20:33Z

Thanks for your answer !

I manually changed the flannel manifest to start one instance on the master node (adm64 with quay.io/coreos/flannel:v0.7.0-amd64)
and one on the worker (arm zith quay.io/coreos/flannel:v0.7.0-arm), but I still have an error "Error Syncyng pod"

I find it weird that when I do a join from an arm-node to a amd64-master-node, no "kube-proxy" is started on the worker node, whereas when I repeat the same on a unique architecture platform (both on arm or both on amd64), a kube-proxy is always started on the joining worker node.
Is it possible that there is a similar problem of hard-coding with the kube-proxy service when trying to deploy on a cross-platform ??

Thx again for your help !

GheRivero · 2017-02-14T14:41:13Z

I'm afraid you'll hit that problem with every hard-coded image name that doesn't support multiarch :/

…

On Tue, Feb 14, 2017 at 12:20 PM, malaskim ***@***.***> wrote: Thanks for your answer ! I manually changed the flannel manifest to start one instance on the master node (adm64 with quay.io/coreos/flannel:v0.7.0-amd64) and one on the worker (arm zith quay.io/coreos/flannel:v0.7.0-arm), but I still have an error "Error Syncyng pod" I find it weird that when I do a join from an arm-node to a amd64-master-node, no "kube-proxy" is started on the worker node, whereas when I repeat the same on a unique architecture platform (both on arm or both on amd64), a kube-proxy is always started on the joining worker node. Is it possible that there is a similar problem of hard-coding with the kube-proxy service when trying to deploy on a cross-platform ?? Thx again for your help ! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#163 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPB5aP0KBPnBn1om2sLks-tHot6Uz73ks5rcY4BgaJpZM4L_XWm> .

-- Pinky: "Gee, Brain, what do you want to do tonight?" The Brain: "The same thing we do every night, Pinky—try to take over the world!" .''`. Pienso, Luego Incordio : :' : `. `' `- www.debian.org www.openstack.com GPG Key: BC52FA6F GPG fingerprint: 1904 7374 5A88 BF8D FFE8 44A0 DD0B A251 BC52 FA6F

malaskim · 2017-02-14T15:09:32Z

Arf yes understandable..
I can change manually the flannel one as I can directly specify the .yml file, but for the kube-proxy one, it is automated in the join process of kubeadm I guess..If you have any idea how I could change it even manually, just to make it work..i'm buying :)
Thx!

ps: should I understand that this issue is related to this one #51, which should be solved in 1.6 version ? Sorry by advance if I'm mistaking, I'm discovering the github world..

GheRivero · 2017-02-14T15:31:00Z

Yeah! That's the one I was referring to. Time to check the progress....

…

On Tue, Feb 14, 2017 at 4:09 PM, malaskim ***@***.***> wrote: Arf yes understandable.. I can change manually the flannel one as I can directly specify the .yml file, but for the kube-proxy one, it is automated in the join process of kubeadm I guess..If you have any idea how I could change it even manually, just to make it work..i'm buying :) Thx! ps: should I understand that this issue is related to this one #51 <#51>, which should be solved in 1.6 version ? Sorry by advance if I'm mistaking, I'm discovering the github world.. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#163 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPB5eMKsL5iTbqMX-XtgjlwZ62bW1ynks5rccOsgaJpZM4L_XWm> .

-- Pinky: "Gee, Brain, what do you want to do tonight?" The Brain: "The same thing we do every night, Pinky—try to take over the world!" .''`. Pienso, Luego Incordio : :' : `. `' `- www.debian.org www.openstack.com GPG Key: BC52FA6F GPG fingerprint: 1904 7374 5A88 BF8D FFE8 44A0 DD0B A251 BC52 FA6F

sudhagarc · 2017-04-21T04:42:32Z

+1

Looks like support for this issue is pushed out of 1.6 version. Is there any update on this issue? If anyone has solved it manually, that works for me.

sudhagarc · 2017-04-22T04:48:31Z

Found a hack/workaround for this issue:

Before running kubeadm join from node with different architecture, I pulled the correct docker image for the architecture and renamed (re-tagged) it to match the master's architecture. Then the kube-proxy container started fine.

E.g., In my case, node is raspberry pi 3 (arm) and my master is x86 (amd64). I executed the following command on my node, before invoking kubeadm join:
docker tag gcr.io/google_containers/kube-proxy-arm:v1.6.0 gcr.io/google_containers/kube-proxy-amd64:v1.6.0

Hope this helps someone to keep going. Hoping for an official fix soon :)

codesnk · 2017-05-03T10:04:11Z

@sudhagarc Is the hack still working for you? The false tagging on Pi3 does not work for me as the proxy and weave net pods go into crashloopbackoff. For me, the master still download the original kube-proxy-amd64 image as the image id is different from the hacked arm image.

UPDATE: It works, although a bit inconsistently. I was making a mistake in tagging. The amd64 version image has an underscore (_) in its tag while the arm version has a minus (-).

aitorhh · 2017-05-03T13:24:59Z

I can summarize the problems and how I solved to have the multi-platform cluster:

architecture of images (kube-proxy) is set based on the master node (cmd/kubeadm/app/images/images.go)
multi-images not supported by docker yet but available in registry

So, to solve the multi-platform with kubeadm:

use manifest tool to support docker images with schema V2. prototype tool available here
create manifest for all the kube images ("etcd" "k8s-dns-sidecar" "k8s-dns-kube-dns" "k8s-dns-dnsmasq-nanny" "kubedns" "dnsmasq-metrics" "kube-dnsmasq" "flannel" "defaultbackend" "kubernetes-dashboard" "kube-apiserver" "kube-controller-manager" "kube-scheduler" "kube-proxy") and push it to private registry
modify kubeadm (cmd/kubeadm/app/images/images.go) to remove architecture dependency. And compile
use the new compiled kubeadm (only on master) and execute with KUBE_REPO_PREFIX=<url-private-registry> ./kubeadm init ...

At the end I end up using weave as virtual network backend instead of flannel, which is multi-platform out-of-the-box.

luxas · 2017-05-29T14:43:10Z

Please see: https://github.com/luxas/kubeadm-workshop.
There's a lot of information about this there.

Also see #51 that's tracking the multi-platform support for kubeadm

squidpickles · 2018-01-27T08:31:35Z

I was able to make this work without needing a custom image for kube-proxy, using steps here.
Briefly, it involves modifying the DaemonSet containing kube-proxy only to apply to hosts matching the master node, then creating a duplicate DaemonSet that applies to the nodes' architecture, with the correct image.

It's definitely a workaround, but I like being able to use the official images for kube-proxy.

sbisiaux · 2018-05-04T18:41:05Z

Any updates to this I tried the above editing thekube-proxy DaemonSet, but keep getting a kernel:[ 126.609412] Internal error: Oops: 80000007 [#1] SMP ARM when the Pod starts up.

Linux kb06 4.14.34-v7+ #1110 SMP Mon Apr 16 15:18:51 BST 2018 armv7l GNU/Linux
`Client:
Version: 18.04.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 3d479c0
Built: Tue Apr 10 18:25:24 2018
OS/Arch: linux/arm
Experimental: false
Orchestrator: swarm

Server:
Engine:
Version: 18.04.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 3d479c0
Built: Tue Apr 10 18:21:25 2018
OS/Arch: linux/arm
Experimental: false`

kubeadm version: &version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/arm"}

drake7707 · 2018-07-04T10:23:21Z

I circumvented this issue by deploying an additional daemonset for each architecture, taken from https://gist.github.com/ssplatt/3d2f68a42e619f88dbed3244ad447708

Make sure to change the version in the image matching your deployment.

mrpaws · 2018-07-09T01:47:07Z

This is a known limitation and the current official solution is documented in the bootstrapping documentation for a single master cluster at [https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#instructions] :

If you join a node with a different architecture to your cluster, create a separate Deployment or DaemonSet for kube-proxy and kube-dns on the node. This is because the Docker images for these components do not currently support multi-architecture.

This will manifest itself for any non multiarch container image but Kubernetes provides the flexibility to work around the problem without a hack.

You can find info on writing daemonsets below and reposting @drake7707 's gist link, which helps make the documentation more applicable to this context:

https://gist.github.com/ssplatt/3d2f68a42e619f88dbed3244ad447708
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#writing-a-daemonset-spec

A few ways to handle, but these are step-by-step instructions that I just performed for supporting both arm64 and amd64 using daemonsets and pre-populated labels:

Save the currently running kube-proxy daemonset configuration:
$ kubectl get ds --namespace=kube-system kube-proxy --export -o yaml > ~/original_kube-proxy.yaml
Download both the kube-proxy-amd64.yaml and kube-proxy-arm.yaml files from the gist: [https://gist.github.com/ssplatt/3d2f68a42e619f88dbed3244ad447708]
Update image specification line to the latest version (currently 1.9.9), respectively:

image: gcr.io/google_containers/kube-proxy-arm:v1.9.9
image: gcr.io/google_containers/kube-proxy-amd64:v1.9.9

Create both daemonsets:
$ kubectl create -f ~/kube-proxy-amd64.yaml $ kubectl create -f ~/kube-proxy-arm.yaml
If last step successfully created, delete the original kube-proxy daemonset:
$ kubectl delete daemonset kube-proxy --namespace=kube-system
If you need to rollback, recreate the original kube-proxy daemonset:
$ ~/original_kube-proxy.yaml

So we basically replaced the existing kube-proxy daemonset based on the master's architecture with architecture-specific kube-proxy daemonsets.

drake7707 · 2018-07-09T08:24:25Z

Couple of things I encountered after applying this in my v1.11.0 cluster:

The kube-proxy from the gist does not apply 'priorityClassName: system-node-critical', so it can get evicted if resource pressure is present (which I encountered on a small SD card with an rpi).
When I compared the latest deployed kube-proxy in my cluster set up by kubeadm and the one from the gist, I noticed that the - --config=/var/lib/kube-proxy/config.conf was not present in the gist version but was in the original kube-proxy. I'm not entirely sure if this is necessary, but during networking issues I noticed the error 'can't distinguish internal and external traffic' in the kube-proxy container logs which went away when I added this arg. My networking issues were because of stale CNI configuration files so I don't know if this had any effect.

luxas closed this as completed May 29, 2017

nycmonkey mentioned this issue Oct 29, 2017

Outdated README: luxas/kube-proxy:v1.8.0-beta.1, which doesn't exist luxas/kubeadm-workshop#30

Open

pmichali mentioned this issue Oct 9, 2018

token changing when using v1alpha2 API #1164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-platform clusters #163

Multi-platform clusters #163

malaskim commented Feb 13, 2017

GheRivero commented Feb 14, 2017

malaskim commented Feb 14, 2017

GheRivero commented Feb 14, 2017 via email

malaskim commented Feb 14, 2017

GheRivero commented Feb 14, 2017 via email

sudhagarc commented Apr 21, 2017

sudhagarc commented Apr 22, 2017

codesnk commented May 3, 2017 •

edited

Loading

aitorhh commented May 3, 2017

luxas commented May 29, 2017

squidpickles commented Jan 27, 2018

sbisiaux commented May 4, 2018

drake7707 commented Jul 4, 2018

mrpaws commented Jul 9, 2018 •

edited

Loading

drake7707 commented Jul 9, 2018 •

edited

Loading

Multi-platform clusters #163

Multi-platform clusters #163

Comments

malaskim commented Feb 13, 2017

GheRivero commented Feb 14, 2017

malaskim commented Feb 14, 2017

GheRivero commented Feb 14, 2017 via email

malaskim commented Feb 14, 2017

GheRivero commented Feb 14, 2017 via email

sudhagarc commented Apr 21, 2017

sudhagarc commented Apr 22, 2017

codesnk commented May 3, 2017 • edited Loading

aitorhh commented May 3, 2017

luxas commented May 29, 2017

squidpickles commented Jan 27, 2018

sbisiaux commented May 4, 2018

drake7707 commented Jul 4, 2018

mrpaws commented Jul 9, 2018 • edited Loading

drake7707 commented Jul 9, 2018 • edited Loading

codesnk commented May 3, 2017 •

edited

Loading

mrpaws commented Jul 9, 2018 •

edited

Loading

drake7707 commented Jul 9, 2018 •

edited

Loading