Skip to content

Conversation

@haircommander
Copy link
Member

@haircommander haircommander commented Feb 6, 2020

What I did

ctrcfgs have been broken since this PR, possibly before. There are a couple of reasons behind this:

  • Update and reduce templates for CRI-O 1.16 #1216 and ignition apparently don't play well together. When there are empty values (i.e. commented out ones) in the template, they are rendered as the empty value for whatever type it is (0, "", []). In a lot of cases, CRI-O can't distinguish between a value that is legitimately empty (stream_address is specifically set as so in openshift), and one that is empty due to an error. Thus, we need to populate the template until a better approach to the ContainerRuntimeConfig controller is reached. (this is done here)
  • Next, CRI-O 1.16 dropped support for some config values. The config that was being rendered was from 1.9. This caused some values to be omitted (version_file_location) after the controller did an update. This commit updates from vendoring cri-o 1.9 to 1.16 to make sure we're rendering a config file that is compatible with the version of CRI-O. Until we move away from specifying the full template as aformentioned, we'll want to bump CRI-O each openshift version.
  • After that was fixed, it was found that comparing quantity.Resource{} against the values sent to the controller caused spurious failures, as an empty Resource structure was sent, yet the comparison between the empty struct didn't succeed. Now, we compare Resource.Value() == 0, which is a more deterministic way of finding whether the resource was specified (and prevent a fatally empty value from showing up in the resulting crio.conf). This and this commit make these changes.
  • This was cherry-picked to this branch. Without this patch, a ctrcfg would be rendered, but then fail to be discovered because the controller wasn't looking for the right pool.
  • Finally, and perhaps most importantly, an e2e test was added. Now, we will test changes to the MCO against the workflow of a ctrcfg: create, watch and check for an update, remove, watch and check for a reversion.

Risks

The risk of this PR is relatively low. The biggest worry I have is around cherry-picking this.
However, the existence of an e2e test, and the clear states of "before these changes, the e2e test failed" and "after these changes, the e2e test passes" makes me believe there is no risk to merging this PR mid-z-stream.

How to verify it

go test -v -run="TestContainerRuntimeConfigPidsLimit" ./test/e2e/

Description for the changelog

added simple e2e test for ContainerRuntimeConfigs
revendor cri-o to have an updated config structure as well as fix problems with template rendering

cherry-pick of: #1414
also includes: d8a6fa6

umohnani8 and others added 3 commits February 6, 2020 12:30
This reverts commit 69025e8.
as well as updates crio.yaml templates with correct values and fixes whitespace

Signed-off-by: Urvashi Mohnani <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
before, we weren't properly checking LogSizeMax was an empty value. As such, we incorrectly merged the config changes, which bricked the runtime when LogSizeMax wasn't defined.

Fix this by using LogSizeMax uniformly by checking against its Value() method

Signed-off-by: Peter Hunt <[email protected]>
@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1794495: [release-4.3] add ctrcfg e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 6, 2020
@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1794495: [release-4.3] add ctrcfg e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

/test e2e-gcp-op

instead of comparing directly against Quantity{}, as I've seen weirdness with it

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Member Author

/retest

@runcom
Copy link
Member

runcom commented Feb 7, 2020

@haircommander there seems to be some issue in op here instead :(

@haircommander
Copy link
Member Author

yeah the tests seem legit 😕

@runcom
Copy link
Member

runcom commented Feb 7, 2020

yeah the tests seem legit 😕

maybe something is different between 4.4/master and 4.3 🤔

@haircommander
Copy link
Member Author

yeah the tests seem legit confused

maybe something is different between 4.4/master and 4.3 thinking

yeah but it's a pretty confusing error. I get

Status:
  Conditions:
    Last Transition Time:  2020-02-06T21:19:35Z
    Message:               could not generate origin ContainerRuntime Configs: could not generate origin ContainerRuntime Configs: %v
    Status:                False
    Type:                  Failure
  Observed Generation:     1
Events:                    <none>

after deploying a ctrcfg, which tells me nothing about what went wrong

@haircommander
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

runcom and others added 2 commits February 7, 2020 10:33
…ools

Having kubelet config or runtime MCs for custom pools isn't possible today.
The reason for that was to avoid risking drift between workers when it comes
to kubelet and runtime configs.
This patches changes that behavior by allowing custom pools to use the worker
base templates in order to generate MCs for kubelet and runtime configs.

Signed-off-by: Antonio Murdaca <[email protected]>
this change includes:
add test/e2e/ctrcfg_test that includes three basic tests for deploying a ctrcfg and seeing it correctly changed the runtime configuration
move/refactor some code from test/e2e/mcd_test to test/e2e/utils_test to prevent duplication between the two files, while also being clear about what tests use what (utils currently houses functions used in both mcd and ctrcfg tests)

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Member Author

level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

/retest

@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1794495: [release-4.3] add ctrcfg e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

😎

@runcom
Copy link
Member

runcom commented Feb 8, 2020

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 8, 2020
@umohnani8
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is invalid:

  • expected dependent Bugzilla bug 1794493 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

(lol I only just saw you just did that 10 minutes ago @umohnani8)

@umohnani8
Copy link
Contributor

@runcom so until https://bugzilla.redhat.com/show_bug.cgi?id=1794493 is not verified (the 4.4 bz), we can't move ahead with this PR?

@kikisdeliveryservice
Copy link
Contributor

@runcom so until https://bugzilla.redhat.com/show_bug.cgi?id=1794493 is not verified (the 4.4 bz), we can't move ahead with this PR?

correct @umohnani8

@umohnani8
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Feb 12, 2020
@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Bugzilla bug 1794495, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

@eparis @crawford @mfojtik PTAL

@haircommander haircommander changed the title Bug 1794495: [release-4.3] add ctrcfg e2e test Bug 1794495: [release-4.3] fix ctrcfg and add e2e test Feb 12, 2020
@openshift-ci-robot
Copy link
Contributor

@haircommander: This pull request references Bugzilla bug 1794495, which is valid.

Details

In response to this:

Bug 1794495: [release-4.3] fix ctrcfg and add e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member

Overall...
/approve
from me. The thing that makes me most nervous here is that the relationship between the MCO and cri-o version is inherently fragile. Particularly as we're doing things like cherry-picking across streams.

What we really want again I think is to have /etc/crio.conf.d/mco.conf or so, so that the MCO only generates a config for values it changes, and otherwise the defaults live in the crio code.

If the possible fallout from this was only confined to clusters with an actually customized containerruntimeconfig, that'd be one thing but this will roll out a big crio.conf change to everyone.
The changes there look OK, I didn't exhaustively review it but did read through a good bit of it.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, haircommander, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice
Copy link
Contributor

Overall...
/approve
from me. The thing that makes me most nervous here is that the relationship between the MCO and cri-o version is inherently fragile. Particularly as we're doing things like cherry-picking across streams.

What we really want again I think is to have /etc/crio.conf.d/mco.conf or so, so that the MCO only generates a config for values it changes, and otherwise the defaults live in the crio code.

If the possible fallout from this was only confined to clusters with an actually customized containerruntimeconfig, that'd be one thing but this will roll out a big crio.conf change to everyone.
The changes there look OK, I didn't exhaustively review it but did read through a good bit of it.

@cgwalters I can move this to a new issue/jira to be done as future work you'd like?

@haircommander
Copy link
Member Author

What we really want again I think is to have /etc/crio.conf.d/mco.conf or so, so that the MCO only generates a config for values it changes, and otherwise the defaults live in the crio code.

we submitted an epic for 4.5 to do just that 😄

@bparees bparees added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Feb 19, 2020
@openshift-merge-robot openshift-merge-robot merged commit 8e56cd8 into openshift:release-4.3 Feb 20, 2020
@openshift-ci-robot
Copy link
Contributor

@haircommander: All pull requests linked via external trackers have merged. Bugzilla bug 1794495 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1794495: [release-4.3] fix ctrcfg and add e2e test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants