Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

openshift/origin#21274 was merged that allows kubelet to run static pods even when kubelet is bootstrapping its certifcates with apiserver.

/hold

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 29, 2018
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 29, 2018
@abhinavdahiya
Copy link
Contributor Author

abhinavdahiya commented Oct 30, 2018

testing locally:
using the latest version of RHCOS from https://rhcos-release-browser-coreos.int.open.paas.redhat.com/html/

[core@adahiya-0-master-0 ~]$ cat /etc/os-release
NAME="Red Hat CoreOS"
VERSION="4.0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.0"
PRETTY_NAME="Red Hat CoreOS 4.0"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat 7"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.0"
REDHAT_SUPPORT_PRODUCT="Red Hat"
REDHAT_SUPPORT_PRODUCT_VERSION="4.0"
OSTREE_VERSION=47.29

etcd is running as static pod:

oc -n kube-system get pods etcd-member-adahiya-0-master-0
NAME                             READY     STATUS    RESTARTS   AGE
etcd-member-adahiya-0-master-0   1/1       Running   0          8m
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/config.hash: c26521395dbc1da7d5ef74aca0694d11
    kubernetes.io/config.mirror: c26521395dbc1da7d5ef74aca0694d11
    kubernetes.io/config.seen: 2018-10-29T23:56:49.011541332Z
    kubernetes.io/config.source: file
  creationTimestamp: 2018-10-29T23:59:58Z
  labels:
    k8s-app: etcd
  name: etcd-member-adahiya-0-master-0
  namespace: kube-system
  resourceVersion: "524"
  selfLink: /api/v1/namespaces/kube-system/pods/etcd-member-adahiya-0-master-0
  uid: bd5393bf-dbd6-11e8-8085-62c04d92e4a1
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - |
      #!/bin/sh
      set -euo pipefail

      source /run/etcd/environment

      /usr/local/bin/etcd \
        --discovery-srv tt.testing \
        --initial-advertise-peer-urls=https://${ETCD_IPV4_ADDRESS}:2380 \
        --cert-file=/etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.crt \
        --key-file=/etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.key \
        --trusted-ca-file=/etc/ssl/etcd/ca.crt \
        --client-cert-auth=true \
        --peer-cert-file=/etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.crt \
        --peer-key-file=/etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.key \
        --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
        --peer-client-cert-auth=true \
        --advertise-client-urls=https://${ETCD_IPV4_ADDRESS}:2379 \
        --listen-client-urls=https://0.0.0.0:2379 \
        --listen-peer-urls=https://0.0.0.0:2380 \
    env:
    - name: ETCD_DATA_DIR
      value: /var/lib/etcd
    - name: ETCD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    image: quay.io/coreos/etcd:v3.2.14
    imagePullPolicy: IfNotPresent
    name: etcd-member
    ports:
    - containerPort: 2380
      hostPort: 2380
      name: peer
      protocol: TCP
    - containerPort: 2379
      hostPort: 2379
      name: server
      protocol: TCP
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /run/etcd/
      name: discovery
    - mountPath: /etc/ssl/etcd
      name: certs
    - mountPath: /var/lib/etcd
      name: data-dir
  dnsPolicy: ClusterFirst
  hostNetwork: true
  initContainers:
  - args:
    - --discovery-srv=tt.testing
    - --output-file=/run/etcd/environment
    - --v=4
    image: docker.io/abhinavdahiya/origin-setup-etcd-environment
    imagePullPolicy: Always
    name: discovery
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /run/etcd/
      name: discovery
  - command:
    - /bin/sh
    - -c
    - |
      #!/bin/sh
      set -euo pipefail

      source /run/etcd/environment

      [ -e /etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.crt -a \
        -e /etc/ssl/etcd/system:etcd-server:${ETCD_DNS_NAME}.key ] || \
        /usr/local/bin/kube-client-agent \
          request \
            --orgname=system:etcd-servers \
            --cacrt=/etc/ssl/etcd/root-ca.crt \
            --assetsdir=/etc/ssl/etcd \
            --address=https://adahiya-0-api.tt.testing:6443 \
            --dnsnames=localhost,etcd.kube-system.svc,etcd.kube-system.svc.cluster.local,${ETCD_DNS_NAME} \
            --commonname=system:etcd-server:${ETCD_DNS_NAME} \
            --ipaddrs=${ETCD_IPV4_ADDRESS},127.0.0.1 \

      [ -e /etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.crt -a \
        -e /etc/ssl/etcd/system:etcd-peer:${ETCD_DNS_NAME}.key ] || \
        /usr/local/bin/kube-client-agent \
          request \
            --orgname=system:etcd-peers \
            --cacrt=/etc/ssl/etcd/root-ca.crt \
            --assetsdir=/etc/ssl/etcd \
            --address=https://adahiya-0-api.tt.testing:6443 \
            --dnsnames=${ETCD_DNS_NAME},tt.testing \
            --commonname=system:etcd-peer:${ETCD_DNS_NAME} \
            --ipaddrs=${ETCD_IPV4_ADDRESS} \
    image: quay.io/coreos/kube-client-agent:678cc8e6841e2121ebfdb6e2db568fce290b67d6
    imagePullPolicy: IfNotPresent
    name: certs
    resources: {}
    securityContext: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /run/etcd/
      name: discovery
    - mountPath: /etc/ssl/etcd
      name: certs
  nodeName: adahiya-0-master-0
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    operator: Exists
  volumes:
  - emptyDir: {}
    name: discovery
  - hostPath:
      path: /etc/ssl/etcd
      type: ""
    name: certs
  - hostPath:
      path: /var/lib/etcd
      type: ""
    name: data-dir
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-10-29T23:57:17Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-10-29T23:57:26Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: 2018-10-29T23:56:49Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://93397f616d48322fbd1130d0921543f1971fa88f42e1ec25f078cd9c78937deb
    image: quay.io/coreos/etcd:v3.2.14
    imageID: quay.io/coreos/etcd@sha256:688e6c102955fe927c34db97e6352d0e0962554735b2db5f2f66f3f94cfe8fd1
    lastState: {}
    name: etcd-member
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2018-10-29T23:57:25Z
  hostIP: 192.168.126.11
  initContainerStatuses:
  - containerID: cri-o://d562d128a0c6f1f7933358d8cf8428f97497959aeffda587de3725e0bb0ef15e
    image: docker.io/abhinavdahiya/origin-setup-etcd-environment:latest
    imageID: docker.io/abhinavdahiya/origin-setup-etcd-environment@sha256:1535b9373527ded931114a497bcd61c3ebe5385e4eda05c3abcbc638366023cd
    lastState: {}
    name: discovery
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://d562d128a0c6f1f7933358d8cf8428f97497959aeffda587de3725e0bb0ef15e
        exitCode: 0
        finishedAt: 2018-10-29T23:57:09Z
        reason: Completed
        startedAt: 2018-10-29T23:57:09Z
  - containerID: cri-o://c1788ae91c7ebc348e9fd3905091b2b853bfe8544efd5aee66c0bbd007ed9a34
    image: quay.io/coreos/kube-client-agent:678cc8e6841e2121ebfdb6e2db568fce290b67d6
    imageID: quay.io/coreos/kube-client-agent@sha256:8564ab65bcb1064006d2fc9c6e32a5ca3f4326cdd2da9a2efc4fb7cc0e0b6041
    lastState: {}
    name: certs
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://c1788ae91c7ebc348e9fd3905091b2b853bfe8544efd5aee66c0bbd007ed9a34
        exitCode: 0
        finishedAt: 2018-10-29T23:57:17Z
        reason: Completed
        startedAt: 2018-10-29T23:57:16Z
  phase: Running
  podIP: 192.168.126.11
  qosClass: BestEffort
  startTime: 2018-10-29T23:56:49Z

@abhinavdahiya abhinavdahiya changed the title WIP: deploy etcd as static pod deploy etcd as static pod Oct 30, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2018
@abhinavdahiya
Copy link
Contributor Author

/hold cancel

/cc @crawford @aaronlevy @deads2k

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2018
@abhinavdahiya
Copy link
Contributor Author

from one of the master installed by CI.

[core@ip-10-0-11-127 ~]$ cat /etc/os-release
NAME="Red Hat CoreOS"
VERSION="4.0.6953"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.0"
PRETTY_NAME="Red Hat CoreOS 4.0.6953"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat 7"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.0"
REDHAT_SUPPORT_PRODUCT="Red Hat"
REDHAT_SUPPORT_PRODUCT_VERSION="4.0"
OSTREE_VERSION=4.0.6953

This does not have the updated kubelet from openshift/origin#21274

@ashcrow
Copy link
Member

ashcrow commented Oct 30, 2018

/test e2e-aws

@ashcrow
Copy link
Member

ashcrow commented Oct 30, 2018

@abhinavdahiya I think this was brought up in chat, but RHCOS pipeline v1 is in a frozen state (but still producing images), RHCOS pipeline v2 is including the latest kubelet changes a bit faster.

@abhinavdahiya
Copy link
Contributor Author

Requires installer moving to new pipeline for RHCOS :( openshift/installer#554

@abhinavdahiya
Copy link
Contributor Author

from the bootstrap node in ci

[core@ip-10-0-9-10 ~]$ sudo oc --config /opt/tectonic/auth/kubeconfig get pods -n kube-system
NAME                                                 READY     STATUS    RESTARTS   AGE
etcd-member-ip-10-0-1-182.ec2.internal               1/1       Running   0          5m
etcd-member-ip-10-0-16-168.ec2.internal              1/1       Running   0          5m
etcd-member-ip-10-0-35-16.ec2.internal               1/1       Running   0          5m
kube-controller-manager-8mmzl                        1/1       Running   0          6m
kube-controller-manager-qxft8                        1/1       Running   0          6m
kube-controller-manager-vvxr5                        1/1       Running   0          6m
kube-dns-787c975867-lrbxm                            3/3       Running   0          6m
kube-flannel-9m6v7                                   2/2       Running   0          4m
kube-flannel-cxcxn                                   2/2       Running   0          4m
kube-flannel-lvlcb                                   2/2       Running   0          4m
kube-proxy-8kfhr                                     1/1       Running   0          6m
kube-proxy-c6m62                                     1/1       Running   0          6m
kube-proxy-fddw4                                     1/1       Running   0          6m
kube-scheduler-7msvs                                 1/1       Running   0          6m
kube-scheduler-8lrkr                                 1/1       Running   0          6m
kube-scheduler-zdpxr                                 1/1       Running   0          6m
metrics-server-5767bfc576-7c845                      0/2       Pending   0          1m
pod-checkpointer-7wp2b                               1/1       Running   0          6m
pod-checkpointer-7wp2b-ip-10-0-16-168.ec2.internal   1/1       Running   0          6m
pod-checkpointer-cdg5w                               1/1       Running   0          6m
pod-checkpointer-cdg5w-ip-10-0-1-182.ec2.internal    1/1       Running   0          6m
pod-checkpointer-hwgwr                               1/1       Running   0          6m
pod-checkpointer-hwgwr-ip-10-0-35-16.ec2.internal    1/1       Running   0          6m
tectonic-network-operator-k8qx6                      1/1       Running   0          6m
tectonic-network-operator-s46sd                      1/1       Running   0          6m
tectonic-network-operator-vhj7s                      1/1       Running   0          6m
[core@ip-10-0-9-10 ~]$ cat /etc/os-release
NAME="Red Hat CoreOS"
VERSION="4.0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.0"
PRETTY_NAME="Red Hat CoreOS 4.0"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat 7"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.0"
REDHAT_SUPPORT_PRODUCT="Red Hat"
REDHAT_SUPPORT_PRODUCT_VERSION="4.0"
OSTREE_VERSION=47.38

etcd running as static pods.

@abhinavdahiya
Copy link
Contributor Author

index
The workers cannot reach ignition server due to AWS load balancer error
cc @crawford

@ashcrow
Copy link
Member

ashcrow commented Oct 31, 2018

Looks like the same issue we keep hitting with #126

@ashcrow
Copy link
Member

ashcrow commented Oct 31, 2018

Yeah, we're hitting the same thing here with e2e. Mine was able to get through after ~7 retries.

@ashcrow
Copy link
Member

ashcrow commented Oct 31, 2018

/test e2e-aws

Copy link

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions and some minor nits about follow-up tasks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a location this can be reviewed? Somewhat opaque here. Feel like this should be in final location / reviewed as well to land this. If we merge as-is please open issues tracking that we need to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late nit, but can we change the name to something that indicates it’s scoped to either bootstrap or installer or machine? “installer-bootstrap-etcd” or “machine-master-etcd”? Would just prefer a bit of scoping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton i opened a issue to track the rename #152

i will fix this 😇

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can't release with this, so should be tracked as an open issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be privileged?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be privileged?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: open issue to fix this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever want ETCD_DNS_NAME to differ from the node name? The node name already has to resolve to a routable address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCD_DNS_NAME this is the name in the SRV record that resolves to this machine's IP.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@aaronlevy
Copy link

I'll give my unofficial lgtm (not in OWNER). Would be nice if we could reduce need for privileged containers, but don't need to block on that.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2018
@abhinavdahiya
Copy link
Contributor Author

/hold

need to wait for approval from @crawford and/or @smarterclayton

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2018
@aaronlevy
Copy link

Was not aware that a rando (me) could lgtm and would merge without another approver...

@abhinavdahiya
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2018
@crawford
Copy link
Contributor

crawford commented Nov 2, 2018

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 2, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aaronlevy, abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Contributor Author

/refresh

@abhinavdahiya abhinavdahiya added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Nov 2, 2018
@openshift-merge-robot openshift-merge-robot merged commit 79a629c into openshift:master Nov 2, 2018
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
…anup

Bug 1705753: some godoc changes on the config CRD type for better oc explain behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants