Skip to content

Conversation

@cgwalters
Copy link
Member

The way CI works, a release payload is generated in the CI namespace
but reusing the digested pull of existing components. This means
the pull specs for everything change format - but the actual
content stays the same.

Let's handle this for the OS image at least.

In practice...I feel like we're going to want higher level special
handling for this - maybe the CVO does pull-through to the registry
and rewrites to an internal canonical form, but this should help
us for now.

Closes: #462

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 20, 2019
@cgwalters
Copy link
Member Author

Not tested locally though beyond new unit tests; looks like the UBI work broke make image-daemon for me with:

Loaded plugins: ovl, product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
http://base-4-0.ocp.svc/rhel-fast-datapath/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: base-4-0.ocp.svc; Unknown error"
Trying other mirror.

@runcom
Copy link
Member

runcom commented Feb 20, 2019

/approve

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, otherwise LGTM!

@jlebon
Copy link
Member

jlebon commented Feb 20, 2019

/approve

@jlebon
Copy link
Member

jlebon commented Feb 20, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@vrutkovs
Copy link
Contributor

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

cgwalters commented Feb 20, 2019

Just for general interest, here's a workflow I have for testing PRs like this locally. As part of submitting a PR, a release image will be generated. You can log into the CI namespace (see HACKING.md) and from there run: oc logs -c release release-latest

In those logs you'll see a message like:
Pushed image sha256:3b3ab2bfd9301f11ae0130c47010828e25a5f26f9caf7322f8d9b14a2571a881 to registry.svc.ci.openshift.org/ci-op-6br3phj4/release:latest

Now, because the CI namespaces only last ~1h, if you want to do testing you'll need to mirror it, e.g.:

oc adm release mirror --from registry.svc.ci.openshift.org/ci-op-6br3phj4/release:latest --to quay.io/cgwalters/ostest

Now, you can pass it to the installer to create a fresh cluster with it:

env OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/cgwalters/ostest:release openshift-install ...

Or you can use it as a target for oc adm upgrade --to-image.

@ashcrow
Copy link
Member

ashcrow commented Feb 20, 2019

level=fatal msg="waiting for Kubernetes API: context deadline exceeded"
2019/02/20 18:37:00 Container setup in pod e2e-aws failed, exit code 1, reason Error

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

Hmm, that job failed with:

0221 02:06:34.537828    5053 update.go:647] Updating OS to registry.svc.ci.openshift.org/ci-op-39zm8d3x/stable@sha256:660061d6eae3ee6d93ca836cd52e6033f1d611c629c1ce47cf272c9e9bda2488
E0221 02:06:34.557315    5053 daemon.go:436] Fatal error checking initial state of node: Checking initial state: Failed to run pivot: error starting job: Transaction is destructive.
E0221 02:06:34.557411    5053 writer.go:90] Marking degraded due to: Checking initial state: Failed to run pivot: error starting job: Transaction is destructive.

Hmm. In non-ancient systemd looks like that error has a more useful message which would have told us what was conflicting.

I feel like this is another case where we should have retried instead of going degraded maybe?

But clearly need to figure out why this happened.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member Author

Not 100% certain but I think what happened here is since we don't have this pivot commit we ended up in a race where the system was shutting down for reboot, but the MCD had come up and tried to start pivot again.

/retest

@runcom
Copy link
Member

runcom commented Feb 21, 2019

/retest

@runcom
Copy link
Member

runcom commented Feb 21, 2019

/retest

@cgwalters
Copy link
Member Author

(Any reason not to drop the lgtm?)

@runcom
Copy link
Member

runcom commented Feb 21, 2019

/lgtm

@jlebon ptal

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 21, 2019
@ashcrow
Copy link
Member

ashcrow commented Feb 21, 2019

/retest

@jlebon
Copy link
Member

jlebon commented Feb 21, 2019

@jlebon ptal

Yup, this is still
/lgtm
to me. :)

Not 100% certain but I think what happened here is since we don't have this pivot commit we ended up in a race where the system was shutting down for reboot, but the MCD had come up and tried to start pivot again.

Yeah, that sounds plausible to me.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,jlebon,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member Author

/retest

1 similar comment
@runcom
Copy link
Member

runcom commented Feb 22, 2019

/retest

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Feb 22, 2019

level=error msg="\t* aws_route.to_nat_gw.5: Error creating route: timeout while waiting for state to become 'success' (timeout: 2m0s)"

/retest

@runcom
Copy link
Member

runcom commented Feb 22, 2019

aws limit hit

/retest

@kikisdeliveryservice
Copy link
Contributor

I'll retest in a bit, we keep hitting aws rate limits errors ☹️

@runcom
Copy link
Member

runcom commented Feb 22, 2019

/retest

2 similar comments
@ashcrow
Copy link
Member

ashcrow commented Feb 22, 2019

/retest

@cgwalters
Copy link
Member Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit b25616a into openshift:master Feb 23, 2019
@smarterclayton
Copy link
Contributor

There is a reason we do this - we will almost certainly migrate our content to another registry location in the future. It’s good to be smart, but don’t assume that if a new url comes in that it will always be at that location in the future.

cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 8, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 11, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
cgwalters added a commit to cgwalters/pivot that referenced this pull request Mar 11, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
ashcrow pushed a commit to openshift/pivot that referenced this pull request Mar 11, 2019
This is a lowering of
openshift/machine-config-operator#463
to pivot.  We need it for the case of doing an early pivot
before the MCD comes up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants