cmd/openshift-install/create: Add metadata.json to ignition-configs #1070

wking · 2019-01-15T18:44:29Z

The two uses cases were (1) service delivery will start receiving telemetry for the cluster while it's installing, but they have no knowledge of the UUID which is a problem for them, and (2) if Hive fails to upload that UUID after install we have an orphaned cluster that can't be cleaned up automatically. Writing the metadata.json as an asset is a perfect solution, we can upload once ready and if it fails, no harm done, we'll just keep retrying.

/hold

dgoodwin · 2019-01-15T18:56:09Z

Looks like it will solve the problem nicely thanks. If I can help lend a hand in requesting the exception let me know.

abhinavdahiya · 2019-01-15T18:56:27Z

with metadata being a separate asset, we need this to be part of create cluster target too, right?

wking · 2019-01-15T19:00:59Z

with metadata being a separate asset, we need this to be part of create cluster target too, right?

Should be fixed with b09f8c8 -> 76e67a4.

pkg/asset/cluster/metadata.go

staebler

Some changes needed to support None platform.

pkg/asset/cluster/metadata.go

dgoodwin · 2019-01-16T15:57:56Z

Looks like this crashes when you move on to "create cluster":

level=debug msg="Reusing previously-fetched \"Metadata\""                                                                                                                                  [6/4989]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf0e9e5]                                                                                                                            
goroutine 1 [running]:
github.com/openshift/installer/pkg/asset.PersistToFile(0x5151a40, 0x8556248, 0x7fffcef0d631, 0x7, 0x0, 0x0)                                                                                       
        /go/src/github.com/openshift/installer/pkg/asset/asset.go:50 +0xb5                                                                                                                        
main.runTargetCmd.func1(0x7fffcef0d631, 0x7, 0xc42092eb80, 0xc420891c00)                                                                                                                          
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:157 +0x194
main.runTargetCmd.func2(0x8533be0, 0xc4202699c0, 0x0, 0x4)
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:177 +0x81
github.com/openshift/installer/vendor/github.com/spf13/cobra.(*Command).execute(0x8533be0, 0xc420269980, 0x4, 0x4, 0x8533be0, 0xc420269980)                                                       
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:766 +0x2c1                                                                                                
github.com/openshift/installer/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4206bd180, 0x0, 0xc4207d6500, 0xc4206bd2d0)                                                                   
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:852 +0x30a                                                                                                
github.com/openshift/installer/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4206bd180, 0xc420891ec8, 0x1)                                                                                  
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:50 +0x1ba
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:34 +0x39

From Devan Goodwin [1]: The two uses cases were (1) service delivery will start receiving telemetry for the cluster while it's installing, but they have no knowledge of the UUID which is a problem for them, and (2) if Hive fails to upload that UUID after install we have an orphaned cluster that can't be cleaned up automatically. Writing the metadata.json as an asset is a perfect solution, we can upload once ready and if it fails, no harm done, we'll just keep retrying. Matthew recommended the no-op load [2]: My suggestion is that, for now, Load should return false always. The installer will ignore any changes to metadata.json. In the future, perhaps we should introduce a read-only asset that would cause the installer to warn (or fail) in the face of changes. [1]: openshift#1057 (comment) [2]: openshift#1070 (comment)

wking · 2019-01-16T19:00:48Z

panic: runtime error: invalid memory address or nil pointer dereference

I think I fixed this with 76e67a4 -> a9afdc9. At least, I can no longer reproduce your panic. Can you check to confirm?

aaronlevy · 2019-01-16T19:29:48Z

Approved from perspective of feature freeze (this causes significant bug for Hive). I'll let others on team do code approval.

dgoodwin · 2019-01-16T19:34:09Z

Panic is fixed, thx!

wking · 2019-01-16T20:29:39Z

e2e-aws:

level=error msg="1 error occurred:"
level=error msg="\t* module.vpc.aws_lb.api_external: 1 error occurred:"
level=error msg="\t* aws_lb.api_external: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"

There's suspicion that these are openshift/cluster-ingress-operator#105 from other clusters in the account.

/retest

abhinavdahiya · 2019-01-16T22:09:32Z

/hold cancel

Approved from perspective of feature freeze (this causes significant bug for Hive). I'll let others on team do code approval.

defering to @staebler for /lgtm

wking · 2019-01-17T05:03:40Z

e2e-aws:

level=error msg="Error: Error applying plan:"
level=error
level=error msg="1 error occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.route_net[2]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.2: timeout while waiting for state to become 'success' (timeout: 5m0s)"

/retest

wking · 2019-01-17T06:17:15Z

e2e-aws:

    expected pod "pod-subpath-test-hostpath-cwdd" success: pod "pod-subpath-test-hostpath-cwdd" failed with status: {Phase:Failed Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-01-17 05:39:12 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-01-17 05:38:59 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [test-container-subpath-hostpath-cwdd test-container-volume-hostpath-cwdd]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [test-container-subpath-hostpath-cwdd test-container-volume-hostpath-cwdd]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-01-17 05:38:59 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.151.63 PodIP:10.128.2.12 StartTime:2019-01-17 05:38:59 +0000 UTC InitContainerStatuses:[{Name:init-volume-hostpath-cwdd State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2019-01-17 05:39:10 +0000 UTC,FinishedAt:2019-01-17 05:39:10 +0000 UTC,ContainerID:cri-o://482e7aef17d5c14162207495faade7cb317817d62209d7e0ce5d4899a6efe9a6,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:true RestartCount:0 Image:docker.io/library/busybox:latest ImageID:docker.io/library/busybox@sha256:bbb143159af9eabdf45511fd5aab4fd2475d4c0e7fd4a5e154b98e838488e510 ContainerID:cri-o://482e7aef17d5c14162207495faade7cb317817d62209d7e0ce5d4899a6efe9a6}] ContainerStatuses:[{Name:test-container-subpath-hostpath-cwdd State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2019-01-17 05:39:16 +0000 UTC,FinishedAt:2019-01-17 05:39:26 +0000 UTC,ContainerID:cri-o://73c308be4d4c317f9ca4099c5e0ec65d39b2d821b4a75ce16cf6101b8a7b0e30,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:gcr.io/kubernetes-e2e-test-images/mounttest-amd64:1.0 ImageID:gcr.io/kubernetes-e2e-test-images/mounttest-amd64@sha256:e3e75014e6df02dc21e6fb95f93b989a2ff8a91f36ae88d74eccbabaa21fc211 ContainerID:cri-o://73c308be4d4c317f9ca4099c5e0ec65d39b2d821b4a75ce16cf6101b8a7b0e30} {Name:test-container-volume-hostpath-cwdd State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2019-01-17 05:39:24 +0000 UTC,FinishedAt:2019-01-17 05:39:24 +0000 UTC,ContainerID:cri-o://6899ca1973f793b17a373449cf3cec5c1c528e7cd7d825e881571138cc003254,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:gcr.io/kubernetes-e2e-test-images/mounttest-amd64:1.0 ImageID:gcr.io/kubernetes-e2e-test-images/mounttest-amd64@sha256:e3e75014e6df02dc21e6fb95f93b989a2ff8a91f36ae88d74eccbabaa21fc211 ContainerID:cri-o://6899ca1973f793b17a373449cf3cec5c1c528e7cd7d825e881571138cc003254}] QOSClass:BestEffort}
not to have occurred

Jan 17 05:38:40.287 I ns=openshift-operator-lifecycle-manager pod=packageserver-fdb989b6b-tc72b Successfully pulled image "registry.svc.ci.openshift.org/ci-op-q432vnpn/stable@sha256:aee3a3f6325e61597f98d63da4bfb047e248f777f8d87c780073e093dee04d18" count(1)
Jan 17 05:38:40.588 I ns=openshift-operator-lifecycle-manager pod=packageserver-fdb989b6b-tc72b Created container count(1)
Jan 17 05:38:40.588 I ns=openshift-operator-lifecycle-manager pod=packageserver-fdb989b6b-tc72b Started container count(1)

failed: (1m13s) 2019-01-17T05:39:51 "[sig-storage] Subpath [Volume type: hostPath] should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]"
...
Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs with custom deployments [Conformance] should run the custom deployment steps [Suite:openshift/conformance/parallel/minimal]
[Feature:DeploymentConfig] deploymentconfigs with test deployments [Conformance] should run a deployment to completion and then scale to zero [Suite:openshift/conformance/parallel/minimal]
[sig-storage] Subpath [Volume type: hostPath] should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[Feature:Builds] build with empty source  started build should build even with an empty source in build config [Suite:openshift/conformance/parallel]

/retest

staebler · 2019-01-17T17:31:46Z

pkg/asset/cluster/metadata.go

+	installConfig := &installconfig.InstallConfig{}
+	parents.Get(clusterID, installConfig)
+
+	if installConfig.Config.Platform.None != nil {


With these changes, metadata.json has become information and not just a file used by the destroy command. It would rather see consistency where the file is generated even when the platform is None. With that said, I can live with the changes as they are, and we can address what to do about the None platform for later releases.

staebler · 2019-01-17T17:32:00Z

/lgtm

openshift-ci-robot · 2019-01-17T17:32:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [staebler,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2019-01-17T20:22:14Z

e2e-aws:

Failing tests:

The bootstrap user should successfully login with password decoded from kubeadmin secret [Suite:openshift/conformance/parallel]
[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce multiple, stacked policies with overlapping podSelectors [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]
[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce policy based on NamespaceSelector [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]
[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce policy based on NamespaceSelector and PodSelector [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]
[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce policy based on PodSelector [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]
...

and many, many more. I think something is busted in CI, so I'm not going to kick this again yet.

wking · 2019-01-17T20:25:58Z

Actually, I must just be misreading that summary, because:

passed: (53.9s) 2019-01-17T18:25:15 "[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce multiple, stacked policies with overlapping podSelectors [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]"

/retest

wking · 2019-01-17T22:46:41Z

e2e-aws:

fail [github.com/openshift/origin/test/extended/deployments/deployments.go:391]: Expected
    <string>: --> pre: Running hook pod ...
    test pre hook executed
    --> pre: Success
    --> Scaling up deployment-test-3 from 0 to 1, scaling down deployment-test-2 from 0 to 0 (keep 1 pods available, don't exceed 2 pods)
        Scaling deployment-test-3 up to 1
to contain substring
    <string>: --> Success

...
failed: (2m45s) 2019-01-17T21:06:54 "[Feature:DeploymentConfig] deploymentconfigs with test deployments [Conformance] should run a deployment to completion and then scale to zero [Suite:openshift/conformance/parallel/minimal]"

and more, although that one has been killing us in CI recently.

/retest

wking · 2019-01-18T05:18:26Z

e2e-aws:

fail [k8s.io/kubernetes/test/e2e/kubectl/portforward.go:515]: Jan 17 23:27:44.956: Missing "^Accepted client connection$" from log: 

...

failed: (41.1s) 2019-01-17T23:28:12 "[sig-cli] Kubectl Port forwarding [k8s.io] With a server listening on 0.0.0.0 should support forwarding over websockets [Suite:openshift/conformance/parallel] [Suite:k8s]"

and more.

/retest

wking · 2019-01-18T07:04:59Z

e2e-aws:

fail [k8s.io/kubernetes/test/e2e/storage/persistent_volumes-local.go:1257]: Expected error:
    <*errors.errorString | 0xc420a73bc0>: {
        s: "failed running \"mkdir -p /tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6 && dd if=/dev/zero of=/tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6/file bs=512 count=20480 && E2E_LOOP_DEV=$(sudo losetup -f) && echo ${E2E_LOOP_DEV} && sudo losetup ${E2E_LOOP_DEV} /tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6/file\": <nil> (exit code 1)",
    }
    failed running "mkdir -p /tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6 && dd if=/dev/zero of=/tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6/file bs=512 count=20480 && E2E_LOOP_DEV=$(sudo losetup -f) && echo ${E2E_LOOP_DEV} && sudo losetup ${E2E_LOOP_DEV} /tmp/local-volume-test-ea3bbe98-1ae6-11e9-9337-0a58ac1064d6/file": <nil> (exit code 1)
not to have occurred

failed: (31.7s) 2019-01-18T06:04:40 "[sig-storage] PersistentVolumes-local  [Volume type: blockfs] Set fsGroup for local volume should set different fsGroup for second pod if first pod is deleted [Suite:openshift/conformance/parallel] [Suite:k8s]"

/retest

openshift-bot · 2019-01-18T14:16:21Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2019-01-18T16:15:35Z

/retest

Please review the full test history for this PR and help us cut down flakes.

wking · 2019-01-19T03:54:45Z

e2e-aws:

level=error msg="\t* aws_route53_record.etcd_a_nodes[0]: 1 error occurred:"
level=error msg="\t* aws_route53_record.etcd_a_nodes.0: [ERR]: Error building changeset: timeout while waiting for state to become 'accepted' (timeout: 5m0s)"

/retest

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 15, 2019

openshift-ci-robot requested review from hardys and staebler January 15, 2019 18:44

wking force-pushed the metadata branch from 7934563 to b09f8c8 Compare January 15, 2019 18:46

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 15, 2019

wking force-pushed the metadata branch from b09f8c8 to 76e67a4 Compare January 15, 2019 19:00

staebler reviewed Jan 15, 2019

View reviewed changes

pkg/asset/cluster/metadata.go Outdated Show resolved Hide resolved

pkg/asset/cluster/metadata.go Outdated Show resolved Hide resolved

wking added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2019

wking force-pushed the metadata branch from 76e67a4 to a9afdc9 Compare January 15, 2019 22:54

staebler suggested changes Jan 16, 2019

View reviewed changes

pkg/asset/cluster/metadata.go Outdated Show resolved Hide resolved

pkg/asset/cluster/metadata.go Outdated Show resolved Hide resolved

wking force-pushed the metadata branch from a9afdc9 to e523949 Compare January 16, 2019 18:41

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 16, 2019

staebler reviewed Jan 17, 2019

View reviewed changes

openshift-ci-robot assigned staebler Jan 17, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2019

openshift-merge-robot merged commit 3711aae into openshift:master Jan 19, 2019

wking deleted the metadata branch January 19, 2019 07:26

zaneb mentioned this pull request Jan 21, 2025

CORS-3855: Remove ARO build flag from installer #9124

Merged

cmd/openshift-install/create: Add metadata.json to ignition-configs #1070

cmd/openshift-install/create: Add metadata.json to ignition-configs #1070

Conversation

wking commented Jan 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgoodwin commented Jan 15, 2019

Uh oh!

abhinavdahiya commented Jan 15, 2019

Uh oh!

wking commented Jan 15, 2019

Uh oh!

Uh oh!

Uh oh!

staebler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dgoodwin commented Jan 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Jan 16, 2019

Uh oh!

aaronlevy commented Jan 16, 2019

Uh oh!

dgoodwin commented Jan 16, 2019

Uh oh!

wking commented Jan 16, 2019

Uh oh!

abhinavdahiya commented Jan 16, 2019

Uh oh!

wking commented Jan 17, 2019

Uh oh!

wking commented Jan 17, 2019

Uh oh!

staebler Jan 17, 2019

Choose a reason for hiding this comment

Uh oh!

staebler commented Jan 17, 2019

Uh oh!

openshift-ci-robot commented Jan 17, 2019

Uh oh!

wking commented Jan 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Jan 17, 2019

Uh oh!

wking commented Jan 17, 2019

Uh oh!

wking commented Jan 18, 2019

Uh oh!

wking commented Jan 18, 2019

Uh oh!

openshift-bot commented Jan 18, 2019

Uh oh!

openshift-bot commented Jan 18, 2019

Uh oh!

wking commented Jan 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

wking commented Jan 15, 2019 •

edited

Loading

dgoodwin commented Jan 16, 2019 •

edited

Loading

wking commented Jan 17, 2019 •

edited

Loading