Skip to content

Conversation

@openshift-bot
Copy link
Contributor

Please merge as soon as https://errata.devel.redhat.com/advisory/55222 is shipped live OR if a Cincinnati-first release is approved.

This PR will also enable upgrades from 4.2.36 to releases in fast-4.3

@openshift-bot openshift-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2020
@wking
Copy link
Member

wking commented Jul 1, 2020

Not much traffic in 4.2 in the wild anymore, so close to end-of-life. Reviewing CI, there is a lot of green (hooray!), a fair number of setup-time errors (too bad, but not 4.2.36 issues), and three that got past setup and still failed:

  • 4.2.7 -> 4.2.36 nominally failed, but actually completed the update. Possibly a CI-cluster/ci-operator hiccup.
  • 4.2.12 -> 4.2.36 hung on Cluster did not complete upgrade: timed out waiting for the condition: Could not update deployment \"openshift-console/downloads\" (291 of 435).
  • 4.2.34 -> 4.2.36 died on Error from server (NotFound): the server could not find the requested resource (get infrastructures.config.openshift.io cluster).

@wking
Copy link
Member

wking commented Jul 1, 2020

I think the secret/support created bit is from the pre-test insights-live.yaml push. openshift/release#8953 will make that easier to debug (for step-based tests) going forward, if/when it lands.

@wking
Copy link
Member

wking commented Jul 1, 2020

Error from server (NotFound): the server could not find the requested resource (get infrastructures.config.openshift.io cluster). is probably a 4.2.34 flake from trying to configure TEST_PROVIDER, so also not directly a 4.2.34 issue. And while it is concerning, it also seems to have resolved itself by the time we gather:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1275698065650487296/artifacts/e2e-aws-upgrade/must-gather.tar | tar -xOz ./quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-29252db98914a7f695df58bd75afe5d0f1fc698c4463e00a9cc7dee0c0ed6ab9/cluster-scoped-resources/config.openshift.io/infrastructures/cluster.yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: 2020-06-24T08:11:17Z
  generation: 1
  name: cluster
  resourceVersion: "424"
  selfLink: /apis/config.openshift.io/v1/infrastructures/cluster
  uid: 47a9c95f-b5f2-11ea-a008-024ba001e9f6
spec:
  cloudConfig:
    name: ""
status:
  apiServerInternalURI: https://api-int.ci-op-99ryp3bw-77109.origin-ci-int-aws.dev.rhcloud.com:6443
  apiServerURL: https://api.ci-op-99ryp3bw-77109.origin-ci-int-aws.dev.rhcloud.com:6443
  etcdDiscoveryDomain: ci-op-99ryp3bw-77109.origin-ci-int-aws.dev.rhcloud.com
  infrastructureName: ci-op-99ryp3bw-77109-lfffr
  platform: AWS
  platformStatus:
    aws:
      region: us-west-2
    type: AWS

@wking
Copy link
Member

wking commented Jul 1, 2020

For the update that hung on the downloads deployment:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275888418018037760/artifacts/launch/pods/openshift-cluster-version_cluster-version-operator-5d8569978b-4t5rf_cluster-version-operator.log | grep 'Running sync.*in state\|Result of work' | tail -n6
I0624 22:39:33.613515       1 sync_worker.go:453] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:449b9b839d2cdf33ff2a494a5863d24dc1b436f995c286d8f8d58821ae293b82 (force=true) on generation 2 in state Updating at attempt 7
I0624 22:45:18.668491       1 task_graph.go:611] Result of work: [Cluster operator image-registry has not yet reported success Could not update deployment "openshift-console/downloads" (291 of 435)]
I0624 22:48:25.328312       1 sync_worker.go:453] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:449b9b839d2cdf33ff2a494a5863d24dc1b436f995c286d8f8d58821ae293b82 (force=true) on generation 2 in state Updating at attempt 8
I0624 22:54:10.384249       1 task_graph.go:611] Result of work: [Cluster operator image-registry has not yet reported success Could not update deployment "openshift-console/downloads" (291 of 435)]
I0624 22:57:11.449162       1 sync_worker.go:453] Running sync registry.svc.ci.openshift.org/ocp/release@sha256:449b9b839d2cdf33ff2a494a5863d24dc1b436f995c286d8f8d58821ae293b82 (force=true) on generation 2 in state Updating at attempt 9
I0624 23:02:56.504253       1 task_graph.go:611] Result of work: [Cluster operator image-registry has not yet reported success Could not update deployment "openshift-console/downloads" (291 of 435)]
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275888418018037760/artifacts/launch/deployments.json | gunzip | jq -r '.items[] | select(.metadata.namespace == "openshift-console" and .metadata.name == "downloads").status.unavailableReplicas'
1
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275888418018037760/artifacts/launch/pods.json | jq -r '.items[] | select(.metadata.namespace == "openshift-console" and (.metadata.name | startswith("downloads-")) and .status.containerStatuses[0].restartCount > 1).status.containerStatuses[0]'{
  "containerID": "cri-o://9e56f9fa9d5907732603847613bbdd97f6d76d150f63cddf91c5f32cac115185",
  "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33fb126ca8ca5ae64ee94fa2c570ac1ec1d62feba42830f46fb18287e7c432ed",
  "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33fb126ca8ca5ae64ee94fa2c570ac1ec1d62feba42830f46fb18287e7c432ed",
  "lastState": {
    "terminated": {
      "containerID": "cri-o://9e56f9fa9d5907732603847613bbdd97f6d76d150f63cddf91c5f32cac115185",
      "exitCode": 137,
      "finishedAt": "2020-06-24T23:01:26Z",
      "reason": "Error",
      "startedAt": "2020-06-24T23:01:01Z"
    }
  },
  "name": "download-server",
  "ready": false,
  "restartCount": 25,
  "state": {
    "waiting": {
      "message": "Back-off 5m0s restarting failed container=download-server pod=downloads-695c445d46-7gcwl_openshift-console(f9e122b2-b664-11ea-b237-02458479416e)",
      "reason": "CrashLoopBackOff"
    }
  }
}

Dunno what's up with that yet...

@wking
Copy link
Member

wking commented Jul 1, 2020

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1275888418018037760/artifacts/launch/events.json | jq -r '[.items[] | select(.metadata.namespace == "openshift-console" and .involvedObject.name == "downloads-695c445d46-7gcwl") | .timePrefix = if .firstTimestamp == null or .firstTimestamp == "null" then .eventTime else .firstTimestamp + " - " + .lastTimestamp + " (" + (.count | tostring) + ")" end] | sort_by(.timePrefix)[] | .timePrefix + " " + .metadata.namespace + " " + .message' 
2020-06-24T21:52:19Z - 2020-06-24T21:52:19Z (1) openshift-console Successfully assigned openshift-console/downloads-695c445d46-7gcwl to ip-10-0-146-41.us-west-2.compute.internal
2020-06-24T21:52:29Z - 2020-06-24T21:52:29Z (1) openshift-console Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33fb126ca8ca5ae64ee94fa2c570ac1ec1d62feba42830f46fb18287e7c432ed"
2020-06-24T21:52:47Z - 2020-06-24T21:52:47Z (1) openshift-console Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33fb126ca8ca5ae64ee94fa2c570ac1ec1d62feba42830f46fb18287e7c432ed"
2020-06-24T21:52:47Z - 2020-06-24T21:52:47Z (1) openshift-console Created container download-server
2020-06-24T21:52:47Z - 2020-06-24T21:52:47Z (1) openshift-console Started container download-server
2020-06-24T21:56:04Z - 2020-06-24T22:42:34Z (60) openshift-console Liveness probe failed: Get http://10.128.2.22:8080/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020-06-24T21:56:07Z - 2020-06-24T21:56:47Z (5) openshift-console Readiness probe failed: Get http://10.128.2.22:8080/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2020-06-24T21:56:24Z - 2020-06-24T21:56:49Z (13) openshift-console network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
2020-06-24T21:56:57Z - 2020-06-24T22:22:34Z (12) openshift-console Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33fb126ca8ca5ae64ee94fa2c570ac1ec1d62feba42830f46fb18287e7c432ed" already present on machine
2020-06-24T21:57:02Z - 2020-06-24T21:57:02Z (1) openshift-console Cancelling deletion of Pod openshift-console/downloads-695c445d46-7gcwl
2020-06-24T21:59:17Z - 2020-06-24T23:02:40Z (237) openshift-console Back-off restarting failed container
2020-06-24T22:00:51Z - 2020-06-24T22:00:51Z (1) openshift-console Cancelling deletion of Pod openshift-console/downloads-695c445d46-7gcwl

Addressing the opaque Missing CNI default network message was WONTFIXED for 4.3, so I guess not a release blocker. And errata is public, so:

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-bot, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 1, 2020
@wking
Copy link
Member

wking commented Jul 1, 2020

Oh, also for outgoing edges:

$ CHANNEL=candidate-4.3 ~/src/openshift/cincinnati/hack/available-updates.sh 4.2.36
4.3.27	quay.io/openshift-release-dev/ocp-release@sha256:a2bdd3b4516e05760d01e2589fc0866f7386c1c10c866b29fea137067e76f2ae	https://access.redhat.com/errata/RHBA-2020:2628

I launched AWS, GCP, and Azure jobs for 4.2.36 -> 4.3.27. Two passed, and GCP failed on availability interruptions, which are unfortunate, but not release blockers.

@openshift-merge-robot openshift-merge-robot merged commit 4066a82 into master Jul 1, 2020
@wking wking deleted the pr-fast-4.2.36 branch July 2, 2020 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants