Skip to content

Conversation

@flaper87
Copy link
Contributor

@flaper87 flaper87 commented Aug 17, 2018

This PR adds an OpenStack module and the respective steps to run tectonic against an OpenStack cloud.

Fully implemented steps

  • Assets creation
  • Infra
  • Masters
  • workers

Features implementation:

  • Networks creation
  • Subnets creation
  • IPs allocation and assignment
  • Security groups management
  • Ignition config upload/download from the object store (swift)
  • Instance creation
  • Installation/Configuration of the OpenStack provider (currently disabled)
  • LB creation
  • DNS Records creation

@yifan-gu r?

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2018
@coreosbot
Copy link

Can one of the admins verify this patch?

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 17, 2018
@hardys
Copy link

hardys commented Aug 17, 2018

One initial comment - the patch series contains several fixup patches (variable names etc), it might be worth rebasing to squash those into the related feature patch - agree having a series of incremental additions vs one huge patch is a good idea though.

@flaper87 flaper87 force-pushed the openstack branch 2 times, most recently from e9f9165 to fcf08dc Compare August 17, 2018 08:21
@flaper87
Copy link
Contributor Author

@hardys good point. I had actually done that after submitting the PR and thought the push had gone through


data "ignition_user" "ssh_authorized_key" {
name = "core"
ssh_authorized_keys = ["${data.openstack_compute_keypair_v2.openstack_ssh_key.public_key}"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can simplify this a bit by adding support for metadata to the ignition openstack provider?

E.g atm it only reads the user_data from the metadata service/config-drive not http://169.254.169.254/openstack/2012-08-10/meta_data.json which will contain some predictable data like the hostname and ssh key?

If we added that it'd also make the initial boot of nodes with empty user_data easier as at least the ssh keys etc could be set up without explicit configuration.

@wking
Copy link
Member

wking commented Aug 17, 2018

I see golint failing, so:

/lint

^this should give us inline comments for any issues. [Edit: looks like I still need to enable that plugin for this repo.]

@wking
Copy link
Member

wking commented Aug 20, 2018

openshift/release#1205 is live now, so trying again:

/lint

Copy link
Contributor

@openshift-ci-robot openshift-ci-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking: 8 warnings.

Details

In response to this:

openshift/release#1205 is live now, so trying again:

/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Password string `json:"tectonic_openstack_credentials_password,omitempty" yaml:"password,omitempty"`
Token string `json:"tectonic_openstack_credentials_token,omitempty" yaml:"token,omitempty"`
UserDomainName string `json:"tectonic_openstack_credentials_user_domain_name,omitempty" yaml:"userDomainName,omitempty"`
UserDomainId string `json:"tectonic_openstack_credentials_user_domain_id,omitempty" yaml:"userDomainId,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field UserDomainId should be UserDomainID. More info.

UserDomainName string `json:"tectonic_openstack_credentials_user_domain_name,omitempty" yaml:"userDomainName,omitempty"`
UserDomainId string `json:"tectonic_openstack_credentials_user_domain_id,omitempty" yaml:"userDomainId,omitempty"`
ProjectDomainName string `json:"tectonic_openstack_credentials_project_domain_name,omitempty" yaml:"projectDomainName,omitempty"`
ProjectDomainId string `json:"tectonic_openstack_credentials_project_domain_id,omitempty" yaml:"projectDomainId,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field ProjectDomainId should be ProjectDomainID. More info.

UserDomainId string `json:"tectonic_openstack_credentials_user_domain_id,omitempty" yaml:"userDomainId,omitempty"`
ProjectDomainName string `json:"tectonic_openstack_credentials_project_domain_name,omitempty" yaml:"projectDomainName,omitempty"`
ProjectDomainId string `json:"tectonic_openstack_credentials_project_domain_id,omitempty" yaml:"projectDomainId,omitempty"`
DomainId string `json:"tectonic_openstack_credentials_domain_id,omitempty" yaml:"domainId,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field DomainId should be DomainID. More info.

@@ -0,0 +1,115 @@
package openstack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: should have a package comment, unless it's in another file for this package. More info.

Type string `json:"tectonic_openstack_worker_root_volume_type,omitempty" yaml:"type,omitempty"`
}

type Credentials struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported type Credentials should have comment or be unexported. More info.

}

type Credentials struct {
AuthUrl string `json:"tectonic_openstack_credentials_auth_url,omitempty" yaml:"authUrl,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field AuthUrl should be AuthURL. More info.

Cloud string `json:"tectonic_openstack_credentials_cloud,omitempty" yaml:"cloud,omitempty"`
Region string `json:"tectonic_openstack_credentials_region,omitempty" yaml:"region,omitempty"`
UserName string `json:"tectonic_openstack_credentials_auth_url,omitempty" yaml:"userName,omitempty"`
UserId string `json:"tectonic_openstack_credentials_user_id,omitempty" yaml:"userId,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field UserId should be UserID. More info.

Region string `json:"tectonic_openstack_credentials_region,omitempty" yaml:"region,omitempty"`
UserName string `json:"tectonic_openstack_credentials_auth_url,omitempty" yaml:"userName,omitempty"`
UserId string `json:"tectonic_openstack_credentials_user_id,omitempty" yaml:"userId,omitempty"`
TenantId string `json:"tectonic_openstack_credentials_tenant_id,omitempty" yaml:"tenantId,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint naming: struct field TenantId should be TenantID. More info.

@tomassedovic
Copy link
Contributor

When I tried this, the kubelet service (i.e. origin-node) fails to start. The container appears for a second and then disappears again and this repeats over and over.

# systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Wed 2018-08-22 15:38:25 UTC; 2s ago
  Process: 6772 ExecStart=/usr/bin/docker run --rm --net host --pid host --privileged --volume /dev:/dev:rw --volume /sys:/sys:ro --volume /var/run:/var/run:rw --volume /var/lib/cni/:/var/lib/cni:rw --volume /var/lib/docker/:/var/lib/docker:rw --volume /var/lib/kubelet/:/var/lib/kubelet:shared --volume /var/log:/var/log:shared --volume /etc/kubernetes:/etc/kubernetes:ro --entrypoint /usr/bin/hyperkube openshift/origin-node:latest kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --rotate-certificates --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/var/lib/cni/bin --network-plugin=cni --lock-file=/var/run/lock/kubelet.lock --exit-on-lock-contention --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged --node-labels=node-role.kubernetes.io/master --minimum-container-ttl-duration=6m0s --cluster-dns=10.89.0.10 --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/ca.crt --cloud-provider=openstack --anonymous-auth=false --register-with-taints=node-role.kubernetes.io/master=:NoSchedule $CGROUP_DRIVER_FLAG (code=exited, status=255)
  Process: 6768 ExecStartPre=/usr/bin/bash -c gawk '/certificate-authority-data/ {print $2}' /etc/kubernetes/kubeconfig | base64 --decode > /etc/kubernetes/ca.crt (code=exited, status=0/SUCCESS)
  Process: 6765 ExecStartPre=/bin/mkdir --parents /var/lib/kubelet/pki (code=exited, status=0/SUCCESS)
  Process: 6763 ExecStartPre=/bin/mkdir --parents /var/lib/cni (code=exited, status=0/SUCCESS)
  Process: 6762 ExecStartPre=/bin/mkdir --parents /run/kubelet (code=exited, status=0/SUCCESS)
  Process: 6760 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/cni/net.d (code=exited, status=0/SUCCESS)
  Process: 6757 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/checkpoint-secrets (code=exited, status=0/SUCCESS)
  Process: 6754 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 6772 (code=exited, status=255)

Aug 22 15:38:25 host-10-89-0-7 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Aug 22 15:38:25 host-10-89-0-7 systemd[1]: Unit kubelet.service entered failed state.
Aug 22 15:38:25 host-10-89-0-7 systemd[1]: kubelet.service failed.


# journalctl -u kubelet
...
Aug 22 15:35:02 host-10-89-0-7 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Aug 22 15:35:02 host-10-89-0-7 systemd[1]: Unit kubelet.service entered failed state.
Aug 22 15:35:02 host-10-89-0-7 systemd[1]: kubelet.service failed.
Aug 22 15:35:12 host-10-89-0-7 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Aug 22 15:35:12 host-10-89-0-7 systemd[1]: Starting Kubernetes Kubelet...
Aug 22 15:35:12 host-10-89-0-7 systemd[1]: Started Kubernetes Kubelet.
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --rotate-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet'
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's 
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --allow-privileged has been deprecated, will be removed in a future version
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --c
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --c
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --c
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --co
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: I0822 15:35:13.385158    5182 server.go:418] Version: v1.11.0+d4cacc0
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: I0822 15:35:13.385476    5182 server.go:496] acquiring file lock on "/var/run/lock/kubelet.lock"
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: I0822 15:35:13.385546    5182 server.go:501] watching for inotify events for: /var/run/lock/kubelet.lock
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: W0822 15:35:13.385956    5182 plugins.go:112] WARNING: openstack built-in cloud provider is now deprecated. Please use 'ex
Aug 22 15:35:13 host-10-89-0-7 docker[5157]: F0822 15:35:13.385996    5182 server.go:262] failed to run Kubelet: could not init cloud provider "openstack": no OpenStac
Aug 22 15:35:13 host-10-89-0-7 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Aug 22 15:35:13 host-10-89-0-7 systemd[1]: Unit kubelet.service entered failed state.
Aug 22 15:35:13 host-10-89-0-7 systemd[1]: kubelet.service failed.

@tomassedovic
Copy link
Contributor

So in addition to the conflict, now that the etcd stuff got removed, this no longers works when rebased on master.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2018
@tomassedovic
Copy link
Contributor

Okay, that last commit resolves the crash from earlier. But we still need to rebase & fix the etcd situation.

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2018
@tomassedovic
Copy link
Contributor

This latest batch rebases on top of master. That means removal of the etcd nodes and addition of the bootstrap step. The tectonic deployment finishes successfully, but the ignition is failing with:

[    8.689726] ignition[705]: INFO     : GET https://okd-4.0-tnc.openshift.example.com:80/config/master?etcd_index=0: attempt #1
[    8.744332] ignition[705]: INFO     : GET error: Get https://okd-4.0-tnc.openshift.example.com:80/config/master?etcd_index=0: dial tcp: lookup okd
-4.0-tnc.openshift.example.com on 10.89.0.2:53: no such host

So we'll need to investigate that further.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2018
@tomassedovic
Copy link
Contributor

@yifan-gu the provider still needs work to be fully functional, but we have enough here to create the OpenStack resources and finish ignition.

Could we consider merging it as is and improving the code in followup PRs? Having to constantly rebase this is kind of painful.

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2018
@flaper87
Copy link
Contributor Author

@tomassedovic thanks for rebasing the PR.

It sounds like we do have enough functionality implemented to consider merging this PR. @yifan-gu @wking what do you think?

It'll be easier for us to contribute and for you to review if we can create smaller PRs for the remaining functionality that needs to be implemented.

@tomassedovic
Copy link
Contributor

Rebased again. @wking @yifan-gu @crawford would really appreciate if we could discuss what would it take to merge this.

@flaper87
Copy link
Contributor Author

flaper87 commented Sep 5, 2018

Thanks for rebasing it, @tomassedovic

It'd be awesome to merge this PR and be able to break the remaining work into smaller pieces.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of commits here, and some (e.g. 469c1fb202) seem to be fixups for earlier commits. If we're supposed to review commit by commit, can you squash those fixups into the commits they're fixing (this makes review easier, but can be a lot of work for the submitters)? Or, if you'd rather this be reviewed as a monolithic unit, can you squash it down to a single commit?

config.tf Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is stale since #168, no?

Copy link

@bogdando bogdando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask if there is a chance to adopt the alternative Terraform solution https://github.com/kubernetes-incubator/kubespray/tree/master/contrib/terraform ?

Would be really nice to collaborate with kubespray community on that.

@hardys
Copy link

hardys commented Oct 1, 2018

@bogdando I'm not sure if that would make sense here, since AFAICS the kubespray terraform templates model the entire cluster, but this codebase is just installing a minimum bootstrap environment then using the cluster itself to scale out and add more nodes - so the two approaches atm are kind of different.

Also I note that destroy is not longer handled via terraform - presumably that is so that the cluster-created resources can also be cleaned up on destroy, but I'm unclear if that's part of a longer term move away from terraform for deploy also, perhaps @wking can comment on the plan there.

@hardys
Copy link

hardys commented Oct 1, 2018

Also what's the plan for destroy?

I looked at how the AWS destroy works now terraform doesn't destroy things - AFAICS awstagdeprovision is used to delete all-the-things in parallel based on a resource tag, e.g it's no longer really orchestrated but we expect the tags and retries to eventually enable cleanup.

I think we can probably do the same for openstack, the server resources already have properties set related to the cluster, but we'll have to add tag data to all the network/subnet/port/router resources.

I'm not aware of any similar code to awstagdeprovision for OpenStack, so unless anyone can suggest something I'll look at implementing something in pkg/destroy/openstack, similar to the libvirt one but I'll use tags/properties instead of a prefix.

As mentioned by @russellb it'd be great if we can provide that via a follow-up, as rebasing this current large patch has been a massive time-sink.

@crawford
Copy link
Contributor

crawford commented Oct 1, 2018

As mentioned by @russellb it'd be great if we can provide that via a follow-up, as rebasing this current large patch has been a massive time-sink.

That seems fine. We don't have the libvirt destroyer enabled yet either.

@tomassedovic
Copy link
Contributor

Note that currently, bootstrap doesn't get the ignition config because the router we create is missing the external gateway. You can fix it after the fact with:

openstack router set --external-gateway public openshift-external-router

But we'll want to add a prompt/env var to configure it.

@wking
Copy link
Member

wking commented Oct 1, 2018

Now that #373 has landed, can you squash in b1d6cf2?

@russellb
Copy link
Contributor

russellb commented Oct 1, 2018 via email

@wking
Copy link
Member

wking commented Oct 1, 2018

/retest

This looks good to me. Any last words before I /lgtm?

@russellb
Copy link
Contributor

russellb commented Oct 1, 2018 via email

@russellb russellb force-pushed the openstack branch 2 times, most recently from c38a45b to 0dd454a Compare October 1, 2018 18:39
@crawford crawford dismissed their stale review October 1, 2018 20:12

I haven't reviewed this since all of the recent changes.

@crawford
Copy link
Contributor

crawford commented Oct 1, 2018

Sweet! This looks really close. I'll let @wking /lgtm this once he's happy.

This commit includes support for OpenStack as a target deployment
platform.  There are still some things to implement, such as DNS and
destroy support, that will come in future PRs.

Contributors (in alphabetical order) include:

Co-authored-by: Flavio Percoco <[email protected]>
Co-authored-by: Jeremiah Stuever <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Steven Hardy <[email protected]>
Co-authored-by: Tomas Sedovic <[email protected]>
Co-authored-by: W. Trevor King <[email protected]>
@russellb
Copy link
Contributor

russellb commented Oct 1, 2018

/test shellcheck

@wking
Copy link
Member

wking commented Oct 1, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flaper87, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2018
@russellb
Copy link
Contributor

russellb commented Oct 2, 2018

/test e2e-aws

@wking
Copy link
Member

wking commented Oct 2, 2018

e2e-aws:

error: timed out waiting for the condition
error deploy/router did not come up
error: timed out waiting for the condition
timeout waiting for router to be available
2018/10/02 01:25:17 Container test in pod e2e-aws failed, exit code 1, reason Error

/retest

@openshift-merge-robot openshift-merge-robot merged commit 1c50820 into openshift:master Oct 2, 2018
@hardys
Copy link

hardys commented Oct 2, 2018

Note that destroy support has been started via #391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.