Skip to content

Conversation

@cgwalters
Copy link
Member

Our new thought around this is that really FIPS should be a "day 1"
operation, and we don't want to make it really easy to undo.
See also openshift/installer#2594
Anyone who wants to force this can change the MC flag, then
oc debug node and run the disable command by hand, then reboot.

Our MachineConfig merge semantics should make it hard for this
to happen unless the admin explicitly deletes the installer-generated MC,
but still.

Since we don't support it and don't want customers
to do it by accident, let's disable it and also stop wasting compute
hours testing it.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 31, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2019
@kikisdeliveryservice kikisdeliveryservice requested review from runcom and removed request for LorbusChris October 31, 2019 18:23
@ashcrow
Copy link
Member

ashcrow commented Oct 31, 2019

This makes sense to me. Should we refuse to let them change FIPS at all since Day 2 wouldn't be valid?

@runcom
Copy link
Member

runcom commented Oct 31, 2019

This makes sense to me. Should we refuse to let them change FIPS at all since Day 2 wouldn't be valid?

yeah, I was actually thinking if day2 isn't valid at all we might want to go the route to completely drop this FIPS from MC and either you install with fips or not. That would be the safest option to me w/o providing any way to disable it through the MCO

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself is 👍 though I still wonder if we should remove the flag (or make it 100% read only).

@cgwalters
Copy link
Member Author

Should we refuse to let them change FIPS at all since Day 2 wouldn't be valid?

Possibly but we currently rely on this code since we haven't landed an initramfs replacement yet.

@ashcrow
Copy link
Member

ashcrow commented Oct 31, 2019

/retest

@kikisdeliveryservice
Copy link
Contributor

/skip

@mrunalp
Copy link
Member

mrunalp commented Nov 1, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2019
@ashcrow
Copy link
Member

ashcrow commented Nov 1, 2019

/override ci/prow/e2e-gcp-op

I've been told there are currently some issues being looked at for the above job. Skipping.

@openshift-ci-robot
Copy link
Contributor

@ashcrow: Overrode contexts on behalf of ashcrow: ci/prow/e2e-gcp-op

Details

In response to this:

/override ci/prow/e2e-gcp-op

I've been told there are currently some issues being looked at for the above job. Skipping.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ashcrow
Copy link
Member

ashcrow commented Nov 1, 2019

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 1, 2019
@cgwalters
Copy link
Member Author

I just noticed this in the logs:

I1031 20:38:14.869340       1 node_controller.go:433] Pool worker: node ci-op--bgjrr-w-d-lpplg.c.openshift-gce-devel-ci.internal is now reporting unready: node ci-op--bgjrr-w-d-lpplg.c.openshift-gce-devel-ci.internal is reporting OutOfDisk

@kikisdeliveryservice
Copy link
Contributor

@cgwalters i noticed in the fips e2e they were hitting OOD on 2 nodes in the failed runs as well. sent you the link to chat convo

@crawford
Copy link
Contributor

crawford commented Nov 1, 2019

Are we concerned about transitions in the opposite direction? Going from non-FIPS compliant to (technically still not) FIPS compliant seems more dangerous.

@cgwalters
Copy link
Member Author

@crawford Yes but see
#1233 (comment)
(We need this right now to test FIPS at all until we land the updated RHCOS)

@cgwalters
Copy link
Member Author

@ashcrow You added a hold - can you explain why?

@ashcrow
Copy link
Member

ashcrow commented Nov 1, 2019

@cgwalters I added it as @kikisdeliveryservice wanted to review why the GCE job failed to ensure it was a flake.

@kikisdeliveryservice
Copy link
Contributor

@ashcrow You added a hold - can you explain why?

held bc we saw the failed FIPs test and wanted to investigate it (ie the stuff we've been looking at)

@cgwalters If you are fine with merging it without it passing that job at all, I'm fine with removing hold. I dont know enough about FIPs to say so myself... just lmk

@cgwalters
Copy link
Member Author

Hmm. I think this PR is quite safe, and will actually help our issues with the e2e-gcp-op test suite since it'll shorten it a bit.

But, it can also probably wait until we've figured out that suite.

kikisdeliveryservice added a commit to kikisdeliveryservice/machine-config-operator that referenced this pull request Nov 5, 2019
Day 2 FIPS is broken and this test is consistently failing.  Day 2 FIPS
will be dropped in openshift#1233 so this test will be unneeded.

Related-to: openshift#1233
@kikisdeliveryservice
Copy link
Contributor

It's late for me, but I've figured out that our test is failing precisely bc FIPS day2 is broken. I have a PR here removing the test to unblock the PRs that have been stuck since Friday:
See #1244

Can someone in the morning just double check and then merge this (and then approve my PR)? We have features blocked so it would be great to get the test gone and things back to working.

@ashcrow
Copy link
Member

ashcrow commented Nov 5, 2019

/lgtm

@ashcrow
Copy link
Member

ashcrow commented Nov 5, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2019
@ashcrow
Copy link
Member

ashcrow commented Nov 5, 2019

Needs a quick rebase

Our new thought around this is that really FIPS should be a "day 1"
operation, and we don't want to make it really easy to undo.
See also openshift/installer#2594
 Anyone who wants to force this can change the MC flag, then
`oc debug node` and run the disable command by hand, then reboot.

Our MachineConfig merge semantics should make it hard for this
to happen unless the admin explicitly deletes the installer-generated MC,
but still.

Since we don't support it and don't want customers
to do it by accident, let's disable it and also stop wasting compute
hours testing it.

Further, a pending RHCOS change will delete the FIPS command entirely
and move it into the initramfs.  Cleanly handle that case by also refusing
to enable FIPS "day 2" - what we expect to be the future.

But we still support enabling day 2 for testing until that RHCOS change
lands.
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 5, 2019
@cgwalters
Copy link
Member Author

OK rebased 🏄‍♂️ and I also added some code to more cleanly handle the pending RHCOS change to drop rhcos-tools.

@ashcrow
Copy link
Member

ashcrow commented Nov 5, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ashcrow
Copy link
Member

ashcrow commented Nov 5, 2019

/override ci/prow/e2e-aws-scaleup-rhel7

My understanding is the above failing test is unrelated

@openshift-ci-robot
Copy link
Contributor

@ashcrow: Overrode contexts on behalf of ashcrow: ci/prow/e2e-aws-scaleup-rhel7

Details

In response to this:

/override ci/prow/e2e-aws-scaleup-rhel7

My understanding is the above failing test is unrelated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 6, 2019

@cgwalters: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 a42cb0c link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kikisdeliveryservice
Copy link
Contributor

GCP errors

/test e2e-gcp-upgrade

@openshift-merge-robot openshift-merge-robot merged commit 3e1d675 into openshift:master Nov 6, 2019
cgwalters added a commit to cgwalters/release that referenced this pull request Jul 28, 2020
This effectively reverts
fb1c4b4

e2e-fips is currently failing with
`/bin/bash: line 15: nodes[i]: unbound variable`

Looking at this...we already have code to validate
the state of FIPS in the MCO, see:

https://github.com/openshift/machine-config-operator/blob/091afde36ac117ef8b782a85b38ae8783ddf4b70/pkg/daemon/update.go#L571

openshift/machine-config-operator#1252
openshift/machine-config-operator#1233

I think these types of checks should be the MCO's role,
or if we choose not to do that, let's at least implement
them in Go in the existing e2e suite and avoid
nontrivial shell-in-YAML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants