Skip to content

Conversation

@staebler
Copy link
Contributor

@staebler staebler commented Mar 8, 2019

Add a hyperthreading field to machine pools with options of Enabled or Disabled. The default is Enabled.

When a machine pool has hyperthreading disabled, the Machines asset for that pool creates a MachineConfig manifest for that pool. The ignition config lays down a file at /etc/default/rhcos/karg/nosmt with no contents. The machine-config-daemon is responsible for reading that file and setting the nosmt karg for the machine.

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 8, 2019
@runcom
Copy link
Member

runcom commented Mar 12, 2019

is there a design doc for this and/or has this been discussed somewhere with the MCO team? Could you also CC us for work related to the MCO

@staebler
Copy link
Contributor Author

is there a design doc for this and/or has this been discussed somewhere with the MCO team? Could you also CC us for work related to the MCO

There is a diagram that Steven Milner created after the design meeting last week with Ian McLeod and Abhinav Dahiya. You received and responded to the email with that diagram. If there are comments on the design, I suppose that that diagram would be the best place to put them. I will adjust this PR to reflect any changes made to that design.

@runcom
Copy link
Member

runcom commented Mar 12, 2019

There is a diagram that Steven Milner created after the design meeting last week with Ian McLeod and Abhinav Dahiya. You received and responded to the email with that diagram. If there are comments on the design, I suppose that that diagram would be the best place to put them. I will adjust this PR to reflect any changes made to that design.

Thanks! I have that in my backlog still...

@cgwalters
Copy link
Member

We have an MCD issue here openshift/machine-config-operator#388 - I would have expected any work on this to show up there personally.

Copy link
Member

@ashcrow ashcrow Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be /etc/default/rhcos/kargs ... where kargs is a file with requested changes to operate on.

Out of date.

Copy link
Member

@ashcrow ashcrow Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL https://url.corp.redhat.com/0d85a22 for what's expected in terms of the local service. There's a POC there that's turning into a usable service.

Out of date

abhinavdahiya added a commit to abhinavdahiya/machine-config-operator that referenced this pull request Mar 26, 2019
Users can push manifests during bootstrap that of the form:

```yaml
---
```

Especially for the installer: setting authorizes_keys [1] and setting hyperthreading [2] will push a manifest that includes multiple machineconfig objects for
control-plane (master) and compute (worker) roles.

Single file with multiple k8s objects separated by `---` is also a supported structure for `oc create|apply` ie. there is a high chance that users trying to push machineconfigs at
install time might create such files.

This commit allows bootstrap controller to read all k8s objects, even ones described above to find all the `machineconfiguration.openshift.io` Objects.

[1]: openshift/installer#1150
[2]: openshift/installer#1392
@abhinavdahiya
Copy link
Contributor

@ashcrow
Copy link
Member

ashcrow commented Mar 26, 2019

See: openshift/pivot#47

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 2, 2019
@abhinavdahiya
Copy link
Contributor

@staebler @wking @ashcrow rebased and ready for review.

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To naming nits, otherwise this looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decodes to: ADD nosmt

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decodes to: ADD nosmt

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decodes to: ADD nosmt

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decodes to: ADD nosmt

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these assembled if other folks add MachineConfigs with their own kernel options? Do we need to set Append on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/pivot/kernel-args is a RHCOS impl and it only accepts the ADD nosmt so no append.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/pivot/kernel-args is a RHCOS impl and it only accepts the ADD nosmt so no append.

Oh, so there's no ability for ADD some-other-flag at the moment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone attempts to ADD or DELETE any other argument it will skip it and log. Only items in the whitelist inside pivot are allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File writing and content looks good to me!

@ashcrow
Copy link
Member

ashcrow commented Apr 3, 2019

--- FAIL: TestMasterGenerateMachineConfigs (0.04s)
    --- FAIL: TestMasterGenerateMachineConfigs/no_key_hyperthreading_disabled (0.00s)

@wking
Copy link
Member

wking commented Apr 4, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 4, 2019
@abhinavdahiya
Copy link
Contributor

/hold

need to get approval on the hyperthreading exception.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 5, 2019
@eparis
Copy link
Member

eparis commented Apr 8, 2019

When this PR is ready, @abhinavdahiya you can remove the hold.

@wking
Copy link
Member

wking commented Apr 8, 2019

I'm still happy with this as it stands. But I'll leave hold-pulling to @abhinavdahiya ;).

@abhinavdahiya
Copy link
Contributor

Waiting on RHCOS bootimage that contains the pivot changes mentioned here https://github.com/openshift/pivot/blob/master/cmd/root.go#L43-L45

This PR only works if we bump rhcos.json to correct RHCOS version that includes the changes mentioned ^^

Waiting on OS team
cc @ashcrow @cgwalters

@ashcrow
Copy link
Member

ashcrow commented Apr 9, 2019

cc @darkmuggle

@ashcrow
Copy link
Member

ashcrow commented Apr 9, 2019

In chat I passed over an updated rhcos.json from ART builds that contain the latest pivot, rpm-ostree, and cri-o.

@cgwalters
Copy link
Member

We must have a publicly-accessible location for the images (currently libvirt but really we need all of the openstack, bare metal etc.), i.e. something like this patch:

git diff
diff --git a/data/data/rhcos.json b/data/data/rhcos.json
index 60878c249..04224bb09 100644
--- a/data/data/rhcos.json
+++ b/data/data/rhcos.json
@@ -46,7 +46,7 @@
             "hvm": "ami-0eac581fbaa9fa9c6"
         }
     },
-    "baseURI": "https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/410.8.20190325.0/",
+    "baseURI": "https://download.redhat.com/rhel/coreos/410.8.20190325.0/",
     "buildid": "410.8.20190325.0",
     "images": {
         "metal-bios": {

Or replace download.redhat.com with try.openshift.com/rhcos-bootimages/ or whatever.

@ashcrow
Copy link
Member

ashcrow commented Apr 9, 2019

@imcleod is helping make this available.

@ashcrow
Copy link
Member

ashcrow commented Apr 9, 2019

FWIW AMI's can be used to test now, and the payload has the proper content if one wants to test manually.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2019
staebler and others added 2 commits April 10, 2019 17:17
enable or disable hyperthreading for machines. The default is for
hyperthreading to be enabled.

RHCOS ships with pivot.service that uses the `/etc/pivot/kernel-args` to override the kernel arguments for hosts. Adding `nosmt` kernel argument switches hyperthreading off.

Add MachineConfig to disable hyperthreading for control plane and compute that have the hyperthreading option disabled.
this new version of RHCOS include the latest pivot `0.0.4-2.el8` that includes the changes required to act on the hyperthreading disabled
kernel arg at /etc/pivot/kernel-args
@sdodson
Copy link
Member

sdodson commented Apr 11, 2019

/lgtm
only change was the addition of RHCOS data

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson, staebler, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Contributor

created an AWS cluster with:

apiVersion: v1beta4
baseDomain: devcluster.openshift.com
controlPlane:
  name: master
  hyperthreading: Disabled
  replicas: 3
  platform:
    aws:
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
compute:
- name: worker
  replicas: 3
  platform:
    aws:
      zones:
      - us-east-1a
      - us-east-1c
      - us-east-1d
metadata:
  name: adahiya-1
networking:
  clusterNetworks:
  - cidr: 10.128.0.0/14
    hostSubnetLength: 9
  machineCIDR: 10.0.0.0/16
  serviceCIDR: 172.30.0.0/16
  type: OpenShiftSDN
platform:
  aws:
    region: us-east-1

on master machine:

[core@ip-10-0-130-101 ~]$ sudo cat /etc/pivot/kernel-args
ADD nosmt[core@ip-10-0-130-101 ~]$ cat /proc/cmdline
BOOT_IMAGE=/ostree/rhcos-59aadfa277d7d06047d323e53ccf6e937589fc3412e3d1ef7a52c2f2086ac29d/vmlinuz-4.18.0-80.el8.x86_64 no_timer_check console=tty0 console=ttyS0,115200n8 rootflags=defaults,prjquota rw root=UUID=0dc164db-5750-41e7-ba19-b9489acc69b6 ostree=/ostree/boot.1/rhcos/59aadfa277d7d06047d323e53ccf6e937589fc3412e3d1ef7a52c2f2086ac29d/0 coreos.oem.id=ec2 ignition.platform.id=ec2 nosmt
[core@ip-10-0-130-101 ~]$

seems to me as working :yay:

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2019
@abhinavdahiya
Copy link
Contributor

e2e-failure

FLAG: --version=\"false\"\nI0411 23:01:04.426031       1 flags.go:33] FLAG: --vmodule=\"\"\nI0411 23:01:04.426038       1 flags.go:33] FLAG: --write-config-to=\"\"\nI0411 23:01:05.621527       1 serving.go:312] Generated self-signed cert (/var/run/kubernetes/kube-scheduler.crt, /var/run/kubernetes/kube-scheduler.key)\nfailed to create listener: failed to listen on 0.0.0.0:10251: listen tcp 0.0.0.0:10251: bind: address already in use\n"

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

Writing JUnit report to /tmp/artifacts/junit/junit_e2e_20190411-230420.xml

error: 2 fail, 569 pass, 548 skip (26m28s)

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.