Conversation
detiber
left a comment
There was a problem hiding this comment.
I see a monolithic ARM template and also more separated templates as well? Are they both meant to be included here?
| Attribution: | ||
| Thanks to: | ||
| Daniel Falkner - Microsoft Germany - For original templates | ||
| Harold Wong<Harold.Wong@microsoft.com> for his great support. |
There was a problem hiding this comment.
Should Ivan McKinley also be in the Attribution list?
| } | ||
| }, | ||
| "sshPrivateData" : { | ||
| "type" : "string", |
There was a problem hiding this comment.
Should this be securestring?
| "defaultValue" : "Standard_DS3_v2", | ||
| "allowedValues" : [ | ||
| "Standard_DS3_v2", | ||
| "Standard_DS12\t" |
There was a problem hiding this comment.
Any reason to limit this host to just these options? Other hosts seem to offer more VM size options.
There was a problem hiding this comment.
Premium storage as standard storage is not recommended.
| "nodeSubnetRef" : "[concat(variables('vnetId'), '/subnets/', variables('nodesubnetName'))]", | ||
| "masterSubnetRef" : "[concat(variables('vnetId'), '/subnets/', variables('mastersubnetName'))]", | ||
| "centos" : { | ||
| "publisher" : "Openlogic", |
There was a problem hiding this comment.
Is there a more official CentOS image that can be used?
There was a problem hiding this comment.
actually no centos support at all.
| } | ||
| }, | ||
| { | ||
| "name" : "logging", |
There was a problem hiding this comment.
What is the logging host used for?
There was a problem hiding this comment.
Its to be used for infrastructure logging. Pending from performance group.
| - hosts: all | ||
| vars: | ||
| description: "Subscribe OSE" | ||
| tasks: |
There was a problem hiding this comment.
Can this playbook be stored in the repo and transferred in some way? It would make it easier to re-use between cloud providers rather than having it hard coded in the script.
There was a problem hiding this comment.
The overall azure architecture is one bash sh per host, and that sh creates all assets, this is true all the way true. I think u are right if: subscription-manager is fixed, rhn is fixed, channels are disabled properly. Even found that the ansible "subscription-manager" module that is supposed to handle things well, changes functionality from time to time. So short term, I'd say it decreases reliablity. Also note this may get better as RHN is undergoing massive changes.
Im very comfortable with lots of runs, that the current code recovers from all displayed "variants".
My biggest cause of failure of runs was related to subscription manager/subscription module thru changes in ansible.
This is based on greater than 220 test runs, really wish it was simpler, but reality seems to challenge simplicity.
There was a problem hiding this comment.
"The overall azure architecture is one bash sh per host, and that sh creates all assets..."
@glennswest can you briefly describe what "one bash sh per host" means? Do you mean one tty, therefore one ssh session? Are you using Bash in all places? I noticed that some scripts are started using "sh <script_file>" with no SheBang "#!/bin/bash" in the script, which causes Bash to run in a minimalist mode and cause many scripts will fail, even though they work fine when called with /bin/bash.
There was a problem hiding this comment.
In azure ARM, a "extension" is used to download and launch a shell script. This is the bridge
between the ARM(Azure Resource Manager), and the resultant host. so bastion has "bastion.sh" store vm has "store.sh" and so on. there is a one to one match for each vm type. THe most complex of course is bastion as it running ansible.
There was a problem hiding this comment.
The deployment is run from a deployment created via Azure Dashboard, by clicking on the link, and launching a ARM template which is a deployment language that uses "json".
There is no "outside" ssh involved, till deployment completes.
| vars: | ||
| description: "Subscribe OSE" | ||
| tasks: | ||
| - name: Install iscsi initiator utils |
There was a problem hiding this comment.
This seems like a good candidate for a role in openshift-ansible-contrib rather than a playbook written out by a script.
There was a problem hiding this comment.
General architecture - all assets are generated by the vmtype shell.
| - name: Create Master Directory | ||
| file: path=/etc/origin/master state=directory | ||
| - name: add initial user to OpenShift Enterprise | ||
| shell: htpasswd -c -b /etc/origin/master/htpasswd ${AUSERNAME} ${PASSWORD} |
There was a problem hiding this comment.
This can be done during install using openshift_master_htpasswd_users or openshift_master_htpasswd_file
| control_path = ~/.ansible/cp/ssh%%h-%%p-%%r | ||
| ssh_args = -o ControlMaster=auto -o ControlPersist=600s -o ControlPath=~/.ansible/cp-%h-%p-%r | ||
| EOF | ||
| chown ${AUSERNAME} /home/${AUSERNAME}/.ansible.cfg |
There was a problem hiding this comment.
I would suggest putting the ansible.cfg file in repo (and running ansible from that working directory).
There was a problem hiding this comment.
ansible was a bit "non" deterministic and made sure this was in all the places that it would look. This is a workaround for blocker on upstream bug. also note that its a intentional choice to keep all the per host type in one place. I'd prefer and hope that the need for this totally goes away at some point, as its a ansible bug, but didnt want to be blocked waiting for the "fix" to flow downward.
On next ansible release should see if we can get rid of it.
There was a problem hiding this comment.
Right, I get wanting to set the params in an ansible.cfg file, I was just recommending the file exist as ansible.cfg in the working directory that you are running ansible from rather than setting it for the users. If users are going to write their own playbooks to augment the ones put in place, then it would be a poor experience if the .ansible.cfg files laid down break some of their assumptions.
There was a problem hiding this comment.
True, but considering the way these are "parsed" has changed quite a bit, I'd say it needs to be in several places for now. As I ran into issues with ansible not reading parameters. Actually this whole issue, is there default, buried deep in ansible was not set properly. There already a "bug" on it upstream, but I didnt see the timing when it would "flow" downwards, so till next "ansible", this was a work around. (And over 2 weeks of time tracking what the cause is.
There was a problem hiding this comment.
The lookup order for the ansible config is as follows:
- cwd
- ~/.ansible.cfg
- /etc/ansible/ansible.cfg
The first file found is the one used, there is no merging of config files.
| parted --script -a optimal /dev/sdg mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on | ||
| parted --script -a optimal /dev/sdh mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on | ||
| parted --script -a optimal /dev/sdi mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on | ||
| parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on |
There was a problem hiding this comment.
shouldn't these be xfs and not ext2?
There was a problem hiding this comment.
actually can change this, but the reality is this is a label and a unused partition. Note the size.
The actual pv used are created by pv_create_lun later in the script.
| parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on | ||
| pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 | ||
| vgcreate vg1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 | ||
| cat <<EOF | base64 --decode > /root/ose_pvcreate_lun |
There was a problem hiding this comment.
Putting on my ops hat for a second, this would freak me out until I decoded and reviewed what it was I was about to run...
There was a problem hiding this comment.
Agreed, any reason to not just embed the decoded output, or include a separate file containing the content?
There was a problem hiding this comment.
its actually a rather narly bash script that needs lots of escaping. Its also developed separate.
I do agree thats its a bit scary, but the resultant file is setting directly in clear site, and can be reused. This just made it easy to lift and shift the script without the arguments etc getting translated.
Also can fully discuss the script and its usage in the doc. As it lets u create and register new pv of any size.
There was a problem hiding this comment.
I've read the script https://paste.fedoraproject.org/483513/36016614/ and it doesn't seem very "narly". It's pretty straight forward and could easily be included in this repo and cloned to the host or pulled using wget without causing any "arguments, etc getting translated".
There was a problem hiding this comment.
found the trick, its now in clear text. So totally transparent. - Resolved (Note till I test everything is going up to my github till I can test it all)
themurph
left a comment
There was a problem hiding this comment.
Added items to be addressed.
| ## Create the cluster | ||
| ### Create the cluster on the Azure Portal | ||
|
|
||
| <a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fopenshift%2Fopenshift-ansible-contrib%2Fmaster%2Fazure-ansible%2Fazuredeploy.json" target="_blank"> |
There was a problem hiding this comment.
Links from line 14 to line 28 are incorrect, they are missing ../openshift-ansible-contrib/master/reference-architecture/...
| "value" : "[reference(concat(parameters('vmName'), 'pip')).ipAddress]" | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
No newline at end of file.
| export RESOURCEGROUP=$1 | ||
| export AUSERNAME=$2 | ||
| export PASSWORD=$3 | ||
| export HOSTNAME=$4 |
There was a problem hiding this comment.
$HOSTNAME is a Bourne shell reserved variable. You may want to use $HOST or something similar.
| @@ -0,0 +1,75 @@ | |||
| # RedHat Openshift Enterprise cluster on Azure | |||
|
|
|||
| When creating the RedHat Openshift Enterprise Cluster on Azure, you will need a SSH RSA key for access. | |||
There was a problem hiding this comment.
"RedHat Openshift Enterprise Cluster" should be changed to "Red Hat OpenShift Container Platform 3"
| "value" : "" | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
No newline at end of file.
| ], | ||
| "defaultValue" : "rhel", | ||
| "metadata" : { | ||
| "description" : "OS to use. Centos or Redhat Enterprise Linux" |
There was a problem hiding this comment.
"Centos or Redhat" -> "CentOS or Red Hat"
| chown root /root/.ssh/id_rsa.pub | ||
| chmod 600 /root/.ssh/id_rsa.pub | ||
| chown root /root/.ssh/id_rsa | ||
| chmod 600 /root/.ssh/id_rsa |
There was a problem hiding this comment.
Is there a better way of distributing ssh keys to your hosts? ssh-copy-id may be a better alternative.
| delay: 30 | ||
| ignore_errors: yes | ||
| - name: disable all repos | ||
| shell: subscription-manager repos --disable="*" |
There was a problem hiding this comment.
We have a common subscription-manager ansible role that can replace these 4 shell module tasks. @cooktheryan can help you integrate the role so that there will be better continuity between the playbooks.
There was a problem hiding this comment.
Guys they dont work. subscription-manager ansible module is unreliable. This is done this way for reliablity based on 200+ runs. Even disable * does not work. Its wonderful to be "clean", but reality is to get things to run everytime, with ansible its the way it is. Changing it to the common role, is where i started from, and wasted lots of time.
Changing this will "look" cleaner and fail often.
There was a problem hiding this comment.
@glennswest if there isn't an issue for that already created do you mind reporting it against the appropriate installer roles?
There was a problem hiding this comment.
So, there are no official "installer roles" for this specifically because of the issues @glennswest mentions.
That said, I think there will be a certain scale where this approach will no longer work.
I've been working with the subscription-manager team, and there is work going on to help address the problems, but there are changes needed both on the server and client side to make this much more reliable at scale.
There was a problem hiding this comment.
Dont forget there major rewrite on rhn as well, also the general transition to certificates that is on going. Will review this on next release of ansible, as I expect it will improve.
| ansible-playbook /home/${AUSERNAME}/postinstall.yml | ||
| ansible-playbook /home/${AUSERNAME}/setupiscsi.yml | ||
| cd /root | ||
| mkdir .kube |
There was a problem hiding this comment.
No need to "cd /root". Lines 300 and 301 can be combined into "mkdir -p /root/.kube"
| parted --script -a optimal /dev/sdj mklabel gpt mkpart primary ext2 1M 100% set 1 lvm on | ||
| pvcreate /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 | ||
| vgcreate vg1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 | ||
| cat <<EOF | base64 --decode > /root/ose_pvcreate_lun |
There was a problem hiding this comment.
Why is there a base64 encoded file in the middle of this script? Can it be added to the git repo and retrieved from there during the install?
Side note: this will get flagged by security scanning tools.
There was a problem hiding this comment.
actually found the way of not escaping the world, its pretty easy - just didnt realize you colud do it that way. So now the whole script is clear-text, testing now.
actually time is more "time of day" on azure, ie if u hit it right, its really fast.
45 minutes
Also suspect it would be quicker if I cached the rpms.
There was a problem hiding this comment.
actually its azure getting there vm's up, after paper I suspect I can speed things up. Have two concepts that could make it alot faster.
|
Then store host is done differently than all the host. And the script is only used on store.sh I prefer a consistency in ways things are done Each host type has one script Sent from my iPhone
|
|
Thought u had a Favourite one vs the ones already in my documentation Sent from my iPhone
|
|
Actually my intention to limit single greenfield install to 30 nodes and the. Grow beyond with add node at ten at a time This is based on feedback from scot on Openstack testing Sent from my iPhone
|
|
@glennswest what are your thoughts on me merging #125 into this pr that way we have a burn down list with the concerns |
|
I did try to do a brief summary from this PR point of view. Not sure long term it helps merging it, as there so many changes already cooked in, and like to have a clean starting point in contrib. So, in summary, I'd like to have a nice clean base to work from, and test, and then layer any additional small changes on top. |
No description provided.