Skip to content
This repository was archived by the owner on Dec 9, 2020. It is now read-only.

Azure PR#115

Closed
cooktheryan wants to merge 1 commit intoopenshift:masterfrom
cooktheryan:azure-code
Closed

Azure PR#115
cooktheryan wants to merge 1 commit intoopenshift:masterfrom
cooktheryan:azure-code

Conversation

@cooktheryan
Copy link
Contributor

No description provided.

Copy link
Contributor

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a monolithic ARM template and also more separated templates as well? Are they both meant to be included here?

Attribution:
Thanks to:
Daniel Falkner - Microsoft Germany - For original templates
Harold Wong<Harold.Wong@microsoft.com> for his great support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Ivan McKinley also be in the Attribution list?

}
},
"sshPrivateData" : {
"type" : "string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be securestring?

"defaultValue" : "Standard_DS3_v2",
"allowedValues" : [
"Standard_DS3_v2",
"Standard_DS12\t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to limit this host to just these options? Other hosts seem to offer more VM size options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Premium storage as standard storage is not recommended.

"nodeSubnetRef" : "[concat(variables('vnetId'), '/subnets/', variables('nodesubnetName'))]",
"masterSubnetRef" : "[concat(variables('vnetId'), '/subnets/', variables('mastersubnetName'))]",
"centos" : {
"publisher" : "Openlogic",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more official CentOS image that can be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no centos support at all.

}
},
{
"name" : "logging",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the logging host used for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to be used for infrastructure logging. Pending from performance group.

- hosts: all
vars:
description: "Subscribe OSE"
tasks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this playbook be stored in the repo and transferred in some way? It would make it easier to re-use between cloud providers rather than having it hard coded in the script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall azure architecture is one bash sh per host, and that sh creates all assets, this is true all the way true. I think u are right if: subscription-manager is fixed, rhn is fixed, channels are disabled properly. Even found that the ansible "subscription-manager" module that is supposed to handle things well, changes functionality from time to time. So short term, I'd say it decreases reliablity. Also note this may get better as RHN is undergoing massive changes.

Im very comfortable with lots of runs, that the current code recovers from all displayed "variants".

My biggest cause of failure of runs was related to subscription manager/subscription module thru changes in ansible.

This is based on greater than 220 test runs, really wish it was simpler, but reality seems to challenge simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The overall azure architecture is one bash sh per host, and that sh creates all assets..."

@glennswest can you briefly describe what "one bash sh per host" means? Do you mean one tty, therefore one ssh session? Are you using Bash in all places? I noticed that some scripts are started using "sh <script_file>" with no SheBang "#!/bin/bash" in the script, which causes Bash to run in a minimalist mode and cause many scripts will fail, even though they work fine when called with /bin/bash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In azure ARM, a "extension" is used to download and launch a shell script. This is the bridge
between the ARM(Azure Resource Manager), and the resultant host. so bastion has "bastion.sh" store vm has "store.sh" and so on. there is a one to one match for each vm type. THe most complex of course is bastion as it running ansible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment is run from a deployment created via Azure Dashboard, by clicking on the link, and launching a ARM template which is a deployment language that uses "json".
There is no "outside" ssh involved, till deployment completes.

vars:
description: "Subscribe OSE"
tasks:
- name: Install iscsi initiator utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good candidate for a role in openshift-ansible-contrib rather than a playbook written out by a script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General architecture - all assets are generated by the vmtype shell.

- name: Create Master Directory
file: path=/etc/origin/master state=directory
- name: add initial user to OpenShift Enterprise
shell: htpasswd -c -b /etc/origin/master/htpasswd ${AUSERNAME} ${PASSWORD}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done during install using openshift_master_htpasswd_users or openshift_master_htpasswd_file

control_path = ~/.ansible/cp/ssh%%h-%%p-%%r
ssh_args = -o ControlMaster=auto -o ControlPersist=600s -o ControlPath=~/.ansible/cp-%h-%p-%r
EOF
chown ${AUSERNAME} /home/${AUSERNAME}/.ansible.cfg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest putting the ansible.cfg file in repo (and running ansible from that working directory).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible was a bit "non" deterministic and made sure this was in all the places that it would look. This is a workaround for blocker on upstream bug. also note that its a intentional choice to keep all the per host type in one place. I'd prefer and hope that the need for this totally goes away at some point, as its a ansible bug, but didnt want to be blocked waiting for the "fix" to flow downward.

On next ansible release should see if we can get rid of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I get wanting to set the params in an ansible.cfg file, I was just recommending the file exist as ansible.cfg in the working directory that you are running ansible from rather than setting it for the users. If users are going to write their own playbooks to augment the ones put in place, then it would be a poor experience if the .ansible.cfg files laid down break some of their assumptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but considering the way these are "parsed" has changed quite a bit, I'd say it needs to be in several places for now. As I ran into issues with ansible not reading parameters. Actually this whole issue, is there default, buried deep in ansible was not set properly. There already a "bug" on it upstream, but I didnt see the timing when it would "flow" downwards, so till next "ansible", this was a work around. (And over 2 weeks of time tracking what the cause is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup order for the ansible config is as follows:

  • cwd
  • ~/.ansible.cfg
  • /etc/ansible/ansible.cfg

The first file found is the one used, there is no merging of config files.

parted --script -a optimal /dev/sdg mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
parted --script -a optimal /dev/sdh mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
parted --script -a optimal /dev/sdi mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't these be xfs and not ext2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually can change this, but the reality is this is a label and a unused partition. Note the size.
The actual pv used are created by pv_create_lun later in the script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
vgcreate vg1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
cat <<EOF | base64 --decode > /root/ose_pvcreate_lun
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting on my ops hat for a second, this would freak me out until I decoded and reviewed what it was I was about to run...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, any reason to not just embed the decoded output, or include a separate file containing the content?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its actually a rather narly bash script that needs lots of escaping. Its also developed separate.
I do agree thats its a bit scary, but the resultant file is setting directly in clear site, and can be reused. This just made it easy to lift and shift the script without the arguments etc getting translated.

Also can fully discuss the script and its usage in the doc. As it lets u create and register new pv of any size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the script https://paste.fedoraproject.org/483513/36016614/ and it doesn't seem very "narly". It's pretty straight forward and could easily be included in this repo and cloned to the host or pulled using wget without causing any "arguments, etc getting translated".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the trick, its now in clear text. So totally transparent. - Resolved (Note till I test everything is going up to my github till I can test it all)

Copy link
Contributor

@themurph themurph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added items to be addressed.

## Create the cluster
### Create the cluster on the Azure Portal

<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fopenshift%2Fopenshift-ansible-contrib%2Fmaster%2Fazure-ansible%2Fazuredeploy.json" target="_blank">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links from line 14 to line 28 are incorrect, they are missing ../openshift-ansible-contrib/master/reference-architecture/...

"value" : "[reference(concat(parameters('vmName'), 'pip')).ipAddress]"
}
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at end of file.

export RESOURCEGROUP=$1
export AUSERNAME=$2
export PASSWORD=$3
export HOSTNAME=$4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$HOSTNAME is a Bourne shell reserved variable. You may want to use $HOST or something similar.

@@ -0,0 +1,75 @@
# RedHat Openshift Enterprise cluster on Azure

When creating the RedHat Openshift Enterprise Cluster on Azure, you will need a SSH RSA key for access.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"RedHat Openshift Enterprise Cluster" should be changed to "Red Hat OpenShift Container Platform 3"

"value" : ""
}
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at end of file.

],
"defaultValue" : "rhel",
"metadata" : {
"description" : "OS to use. Centos or Redhat Enterprise Linux"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Centos or Redhat" -> "CentOS or Red Hat"

chown root /root/.ssh/id_rsa.pub
chmod 600 /root/.ssh/id_rsa.pub
chown root /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way of distributing ssh keys to your hosts? ssh-copy-id may be a better alternative.

delay: 30
ignore_errors: yes
- name: disable all repos
shell: subscription-manager repos --disable="*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a common subscription-manager ansible role that can replace these 4 shell module tasks. @cooktheryan can help you integrate the role so that there will be better continuity between the playbooks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys they dont work. subscription-manager ansible module is unreliable. This is done this way for reliablity based on 200+ runs. Even disable * does not work. Its wonderful to be "clean", but reality is to get things to run everytime, with ansible its the way it is. Changing it to the common role, is where i started from, and wasted lots of time.
Changing this will "look" cleaner and fail often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glennswest if there isn't an issue for that already created do you mind reporting it against the appropriate installer roles?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are no official "installer roles" for this specifically because of the issues @glennswest mentions.

That said, I think there will be a certain scale where this approach will no longer work.

I've been working with the subscription-manager team, and there is work going on to help address the problems, but there are changes needed both on the server and client side to make this much more reliable at scale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont forget there major rewrite on rhn as well, also the general transition to certificates that is on going. Will review this on next release of ansible, as I expect it will improve.

ansible-playbook /home/${AUSERNAME}/postinstall.yml
ansible-playbook /home/${AUSERNAME}/setupiscsi.yml
cd /root
mkdir .kube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to "cd /root". Lines 300 and 301 can be combined into "mkdir -p /root/.kube"

parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
vgcreate vg1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
cat <<EOF | base64 --decode > /root/ose_pvcreate_lun
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a base64 encoded file in the middle of this script? Can it be added to the git repo and retrieved from there during the install?

Side note: this will get flagged by security scanning tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually found the way of not escaping the world, its pretty easy - just didnt realize you colud do it that way. So now the whole script is clear-text, testing now.
actually time is more "time of day" on azure, ie if u hit it right, its really fast.
45 minutes

Also suspect it would be quicker if I cached the rpms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually its azure getting there vm's up, after paper I suspect I can speed things up. Have two concepts that could make it alot faster.

@glennswest
Copy link
Contributor

Then store host is done differently than all the host. And the script is only used on store.sh

I prefer a consistency in ways things are done

Each host type has one script

Sent from my iPhone

On 17 Nov 2016, at 1:32 PM, Chris Murphy notifications@github.com wrote:

@themurph commented on this pull request.

In reference-architecture/azure-ansible/store.sh:

+systemctl restart target.service
+firewall-cmd --permanent --add-port=3260/tcp
+firewall-cmd --reload
+touch /root/.updateok
+pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj
+parted --script -a optimal /dev/sdc mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdd mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sde mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdf mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdg mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdh mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdi mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on
+pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
+vgcreate vg1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
+cat <<EOF | base64 --decode > /root/ose_pvcreate_lun
I've read the script https://paste.fedoraproject.org/483513/36016614/ and it doesn't seem very "narly". It's pretty straight forward and could easily be included in this repo and cloned to the host or pulled using wget without causing any "arguments, etc getting translated".


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@glennswest
Copy link
Contributor

Thought u had a Favourite one vs the ones already in my documentation

Sent from my iPhone

On 17 Nov 2016, at 1:49 PM, Chris Murphy notifications@github.com wrote:

@themurph commented on this pull request.

In reference-architecture/azure-ansible/README.md:

@@ -0,0 +1,75 @@
+# RedHat Openshift Enterprise cluster on Azure
+
+When creating the RedHat Openshift Enterprise Cluster on Azure, you will need a SSH RSA key for access.
+
+## SSH Key Generation
+
+1. Windows - https://www.digitalocean.com/community/tutorials/how-to-create-ssh-keys-with-putty-to-connect-to-a-vps
+2. Linux - https://help.ubuntu.com/community/SSH/OpenSSH/Keys#Generating_RSA_Keys
https://lmgtfy.com/?q=ssh+keys+site%3Aredhat.com


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@glennswest glennswest closed this Nov 22, 2016
@glennswest
Copy link
Contributor

Actually my intention to limit single greenfield install to 30 nodes and the. Grow beyond with add node at ten at a time

This is based on feedback from scot on Openstack testing

Sent from my iPhone

On 22 Nov 2016, at 12:26 AM, Jason DeTiberus notifications@github.com wrote:

@detiber commented on this pull request.

In reference-architecture/azure-ansible/bastion.sh:

    • name: register hosts
  • shell: subscription-manager register --username ${RHNUSERNAME} --password ${RHNPASSWORD}
  • register: task_result
  • until: task_result.rc == 0
  • retries: 10
  • delay: 30
  • ignore_errors: yes
    • name: attach sub
  • shell: subscription-manager attach --pool=$RHNPOOLID
  • register: task_result
  • until: task_result.rc == 0
  • retries: 10
  • delay: 30
  • ignore_errors: yes
    • name: disable all repos
  • shell: subscription-manager repos --disable="*"
    So, there are no official "installer roles" for this specifically because of the issues @glennswest mentions.

That said, I think there will be a certain scale where this approach will no longer work.

I've been working with the subscription-manager team, and there is work going on to help address the problems, but there are changes needed both on the server and client side to make this much more reliable at scale.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@cooktheryan
Copy link
Contributor Author

@glennswest what are your thoughts on me merging #125 into this pr that way we have a burn down list with the concerns

@glennswest
Copy link
Contributor

I did try to do a brief summary from this PR point of view. Not sure long term it helps merging it, as there so many changes already cooked in, and like to have a clean starting point in contrib.
Also there is a complete commit history upstream, that I do reference in the README.

So, in summary, I'd like to have a nice clean base to work from, and test, and then layer any additional small changes on top.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants