pkg/asset/machines: Convert Master to a WriteableAsset #1211

wking · 2019-02-08T07:07:24Z

Since 3326b6b (#792), we've been consuming structured master configurations to generate Terraform variables. But before this commit, the Openshift asset was the WriteableAsset responsible for the master configs, giving you the following issue:

openshift-install create manifests
1. Master asset populated with structured data
2. Openshift asset pulls in Master, flattens to YAML, and writes to disk.
openshift-install create cluster
1. Openshift asset reads master YAML from disk.
2. TerraformVariables asset pulls in Master, but it's empty.
3. Panic (create manifests then create cluster panic #1205).

With this commit, responsibility for reading and writing master manifests to disk gets shifted to the Master asset itself.

Unpacking the YAML data into the appropriate structures is a bit hairy. I'm not familiar with this code though, maybe there's a better way. It will help once we complete the shift to the OpenShift machine-API types started in cf60daa (#1175).

I've also taken the opportunity to drop the list. It saves us a few lines of code for (de)serializing, and there's no upside to collecting the Machine configs together in a single file.

The "glob, but not the master files" logic in the Openshift loader is also a bit ugly. Moving forward, I expect us to push the remaining Openshift assets out into their own assets as well, which would allow us to tighten down on the wildcard there.

@staebler, can you check the asset logic here? I think I have it right, but I'm doing more pattern-matching than understanding ;).

Fixes #1205

dgoodwin · 2019-02-08T16:20:05Z

Is there anywhere in the code you could slip in a unit test that covered the installconfig on disk use case without requiring an actual provision?

wking · 2019-02-08T17:09:39Z

Is there anywhere in the code you could slip in a unit test that covered the installconfig on disk use case without requiring an actual provision?

#552 has a framework for something like that, although the issue here is manifests on disk, not install-config on disk.

wking · 2019-02-08T17:51:00Z

I've pushed 7aaa902 -> 7f61d57 to fix the golint issue. I'm not sure how to handle the unit error:

--- FAIL: TestCreatedAssetsAreNotDirty (10.29s)
    --- FAIL: TestCreatedAssetsAreNotDirty/manifests (2.99s)
    	assetcreate_test.go:108: 
    			Error Trace:	assetcreate_test.go:108
    			Error:      	Not equal: 
    			            	expected: &machines.Master{Machines:[]v1beta1.Machine{}, MachinesDeprecated:[]v1alpha1.Machine(nil), FileList:[]*asset.File{(*asset.File)(0xc4207303f0)}}
    			            	actual  : &machines.Master{Machines:[]v1beta1.Machine(nil), MachinesDeprecated:[]v1alpha1.Machine(nil), FileList:[]*asset.File{(*asset.File)(0xc420b292c0)}}
    			            	
    			            	Diff:
    			            	--- Expected
    			            	+++ Actual
    			            	@@ -1,4 +1,3 @@
    			            	 (*machines.Master)({
    			            	- Machines: ([]v1beta1.Machine) {
    			            	- },
    			            	+ Machines: ([]v1beta1.Machine) <nil>,
    			            	  MachinesDeprecated: ([]v1alpha1.Machine) <nil>,
    			Test:       	TestCreatedAssetsAreNotDirty/manifests
    			Messages:   	fetched and generated asset "Master Machines" are not equal

More on that issue here, but I'm currently distinguishing between "set to the empty list" and "unset" when consuming the data while we have both Machines and MachinesDeprecated. Maybe I have to give up on that and switch to len(master.Machines) instead, unless @staebler has a way to coerce to nil on load?

staebler · 2019-02-08T18:00:31Z

unless @staebler has a way to coerce to nil on load?

I don't have a better solution for you. I had the same problem.

installer/pkg/asset/machines/master.go

Lines 99 to 102 in ad65a3a

    
           // This is needed to ensure that roundtrip generate-load tests pass when 
        
           // comparing this value. Otherwise, generate will use a nil value while 
        
           // load will use an empty slice. 
        
           m.Machines = []machineapi.Machine{}

wking · 2019-02-08T18:25:11Z

I don't have a better solution for you...

Ok. Should be fixed using your approach with 7f61d57 -> d5c01c0.

staebler

I agree with this approach. I was planning on doing this same thing to close the loop on the https://bugzilla.redhat.com/show_bug.cgi?id=1662119.

pkg/asset/machines/master.go

staebler · 2019-02-08T21:02:02Z

Wondering why hive was having problems with this panic led me to the realization that this PR does not completely solve the problem. The following will still panic, because loading from the state file will keep the provider config as raw json.

openshift-install create ignition-config
openshift-install create cluster

I am disheartened that this isn't caught by the unit tests that verify round-trip loading. And I am not sure why it isn't caught.

wking · 2019-02-08T21:10:20Z

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

staebler · 2019-02-08T21:17:04Z

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

No hooks.

staebler · 2019-02-08T21:23:04Z

The following will still panic, because loading from the state file will keep the provider config as raw json.

Do we have hooks to deserialize when we load from the state file? If not, maybe we just need to throw in the towel on structured asset properties (which will make @abhinavdahiya sad ;).

What if we remove the Machines and MachinesDeprecated fields and instead have Machines and MachinesDeprecated functions that do the loading from the raw bytes when needed? It is maybe not as neat, but it should be roughly the same amount of code.

wking · 2019-02-08T21:38:18Z

What if we remove the Machines and MachineDeprecated fields and instead have Machines and MachineDeprecated functions that do the loading from the raw bytes when needed?

I don't have strong opinions on whether the deserializing code lives in Master methods or locally in the TerraformVariables asset or the TFVars conversion code that is our only consumer of the structured data. And I'm personally fine with this approach, but @abhinavdahiya had previous concerns about the inefficiency of washing to bytes and back for the case when the user is going straight from the structured generated code into Terraform variable generation (e.g. here).

wking · 2019-02-09T00:33:01Z

Ok, I've pushed 1e32e76 -> 3fd7e2b shifting to unmarshalling Master methods. I still need to work in some runtime.Decode changes @abhinavdahiya suggested, but I thought I'd push so folks could look at the new approach (runtime.Decode will just be a local change in the unmarshalling methods).

wking · 2019-02-09T01:50:49Z

e2e-aws was close:

Flaky tests:

[Feature:Builds][Conformance] s2i build with a quota  Building from a template should create an s2i build with a quota and run it [Suite:openshift/conformance/parallel/minimal]

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

I've just pushed 3fd7e2b -> 6b0bd4b with the pivot to runtime.Decode, CC @abhinavdahiya.

dgoodwin · 2019-02-11T13:25:48Z

/test e2e-aws

abhinavdahiya · 2019-02-11T18:51:03Z

will #1223 affect the Decode ?

wking · 2019-02-11T22:04:04Z

will #1223 affect the Decode ?

Sigh. Maybe? I guess for that I need to bump our vendor around openshift/cluster-api-provider-aws#152.

Since 3326b6b (pkg/tfvars: Pull from []Machine instead of InstallConfig, 2018-11-28, openshift#792), we've been consuming structured master configurations to generate Terraform variables. But before this commit, the Openshift asset was the WriteableAsset responsible for the master configs, giving you the following issue: 1. openshift-install create manifests i. Master asset populated with structured data ii. Openshift asset pulls in Master, flattens to YAML, and writes to disk. 2. openshift-install create cluster i. Openshift asset reads master YAML from disk. ii. TerraformVariables asset pulls in Master, but it's empty. iii. Panic [1]. With this commit, responsibility for reading and writing master manifests to disk gets shifted to the Master asset itself. I've dropped the structured Master properties (Machines and MachineDeprecated) in favor of new methods on the asset. There are three possible paths towards recovering the asset that we feed in to TerraformVariables: a. After rendering it with Generate. This information had been structured before this commit, so no problems here. b. After reading from the disk. This information could be unmarshalled in Master.Load(), but... c. After loading it from the state file. There are no hooks for custom unmarshalling in this pathway, so while the generic YAML unmarshal would give us a structured machine object, it would not unmarshal the RawExtension that holds the provider spec. The lack of custom-unmarshal hooks for (c) means there's no reliable way to use Master properties to feed TerraformVariables structured provider information. With this commit, we use Master methods instead. That's just as efficient for the (b) and (c) cases, and while it's an uneccessary (de)serialize cycle for (a), that's not too expensive. Unpacking the YAML data into the appropriate structures is a bit hairy. I'm not familiar with this code though, maybe there's a better way. It will help once we complete the shift to the OpenShift machine-API types started in cf60daa (Pivot aws from cluster.k8s.io into machine.openshift.io, 2019-02-01, openshift#1175). I've also taken the opportunity to drop the list. It saves us a few lines of code for (de)serializing, and there's no upside to collecting the Machine configs together in a single file. The "glob, but not the master files" logic in the Openshift loader is also a bit ugly. Moving forward, I expect us to push the remaining Openshift assets out into their own assets as well, which would allow us to tighten down on the wildcard there. There's also a bit of dancing to ensure our Machine filenames are ordered by increasing index. I'm padding the filenames with %02d (for example) to get ...-01.yaml, etc. so they come back in the right order on load (filepath.Glob sorts its output [2]). To avoid hard-coding a pad size, I format the largest index, measure its length, and use that length to construct padFormat. [1]: openshift#1205 [2]: https://github.com/golang/go/blob/go1.11.5/src/path/filepath/match.go#L325

And shift the AWS provider's commit to land after they fixed registration [1] and also after their pivot to v1beta1 [2] (because 3a125a6, Change awsproviderconfig.k8s.io/v1alpha1 API version to awsproviderconfig.openshift.io/v1beta1, 2019-02-11, openshift#1223 already started us along that pivot). Generated (after the Gopkg.toml bump) with: $ dep ensure using: $ dep version dep: version : v0.5.0-31-g73b3afe build date : 2019-02-08 git hash : 73b3afe go version : go1.10.3 go compiler : gc platform : linux/amd64 features : ImportDuringSolve=false [1]: openshift/cluster-api-provider-aws#149 [2]: openshift/cluster-api-provider-aws#152

wking · 2019-02-11T22:52:11Z

Rebased around #1223 and bumped our vendored AWS-provider code from v1alpha1 to v1beta1 with 2f16fa5 -> d983eae. I haven't tested it though, although I guess CI will help with that.

abhinavdahiya · 2019-02-11T23:42:28Z

api requests getting rate-limited

level=error msg="Error: Error applying plan:"
level=error
level=error msg="5 errors occurred:"
level=error msg="\t* module.vpc.aws_lb.api_external: 1 error occurred:"
level=error msg="\t* aws_lb.api_external: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_lb.api_internal: 1 error occurred:"
level=error msg="\t* aws_lb.api_internal: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.route_net[2]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.2: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.route_net[3]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.3: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route.igw_route: 1 error occurred:"
level=error msg="\t* aws_route.igw_route: Error creating route: timeout while waiting for state to become 'success' (timeout: 2m0s)"

will retest in some time.

dgoodwin · 2019-02-12T12:26:10Z

/test e2e-aws

Hive CI still totally blocked on this. :(

abhinavdahiya · 2019-02-12T14:29:57Z

/retest

abhinavdahiya · 2019-02-12T17:30:51Z

The tests are green..

/lgtm

openshift-ci-robot · 2019-02-12T17:31:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Like the earlier 7a396d9 (pkg/asset/machines: Convert Master to a WriteableAsset, 2019-02-07, openshift#1211), but for Worker. This is groundwork for extracting configured availability zones when generating AWS Terraform variables, which will in turn allow us to avoid provisioning availability zones unless we are going to populate them with machines.

openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 8, 2019

openshift-ci-robot requested review from crawford and tomassedovic February 8, 2019 07:07

wking force-pushed the fix-machine-loads branch from 7aaa902 to 7f61d57 Compare February 8, 2019 17:47

wking force-pushed the fix-machine-loads branch from 7f61d57 to d5c01c0 Compare February 8, 2019 18:24

staebler reviewed Feb 8, 2019

View reviewed changes

wking force-pushed the fix-machine-loads branch from d5c01c0 to 1c0796a Compare February 8, 2019 18:58

staebler reviewed Feb 8, 2019

View reviewed changes

pkg/asset/machines/master.go Outdated Show resolved Hide resolved

wking force-pushed the fix-machine-loads branch 3 times, most recently from 955cf95 to 1e32e76 Compare February 8, 2019 20:52

wking force-pushed the fix-machine-loads branch from 1e32e76 to 3fd7e2b Compare February 9, 2019 00:31

wking force-pushed the fix-machine-loads branch from 3fd7e2b to 6b0bd4b Compare February 9, 2019 01:50

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 9, 2019

wking force-pushed the fix-machine-loads branch from 6b0bd4b to e8cead9 Compare February 9, 2019 05:09

openshift-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 9, 2019

wking force-pushed the fix-machine-loads branch 2 times, most recently from 3e19bea to 2f16fa5 Compare February 9, 2019 08:08

wking added 2 commits February 11, 2019 14:49

wking force-pushed the fix-machine-loads branch from 2f16fa5 to d983eae Compare February 11, 2019 22:51

openshift-ci-robot assigned abhinavdahiya Feb 12, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 12, 2019

openshift-merge-robot merged commit 0f8e6b8 into openshift:master Feb 12, 2019

wking deleted the fix-machine-loads branch February 13, 2019 22:28

wking mentioned this pull request Mar 27, 2019

machines: add the authorized keys for a pool using a machine config #1150

Merged

wking mentioned this pull request Mar 27, 2019

pkg/asset/machines/worker: Structured workers #1481

Merged

wking mentioned this pull request Jan 18, 2020

Installer aws-upi template to set mastersScheduable false openshift/release#6741

Merged

pkg/asset/machines: Convert Master to a WriteableAsset #1211

pkg/asset/machines: Convert Master to a WriteableAsset #1211

Uh oh!

Conversation

wking commented Feb 8, 2019

Uh oh!

dgoodwin commented Feb 8, 2019

Uh oh!

wking commented Feb 8, 2019

Uh oh!

wking commented Feb 8, 2019

Uh oh!

staebler commented Feb 8, 2019

Uh oh!

wking commented Feb 8, 2019

Uh oh!

staebler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

staebler commented Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staebler commented Feb 8, 2019

Uh oh!

staebler commented Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Feb 8, 2019

Uh oh!

wking commented Feb 9, 2019

Uh oh!

wking commented Feb 9, 2019

Uh oh!

dgoodwin commented Feb 11, 2019

Uh oh!

abhinavdahiya commented Feb 11, 2019

Uh oh!

wking commented Feb 11, 2019

Uh oh!

wking commented Feb 11, 2019

Uh oh!

abhinavdahiya commented Feb 11, 2019

Uh oh!

dgoodwin commented Feb 12, 2019

Uh oh!

abhinavdahiya commented Feb 12, 2019

Uh oh!

abhinavdahiya commented Feb 12, 2019

Uh oh!

openshift-ci-robot commented Feb 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

staebler commented Feb 8, 2019 •

edited

Loading

wking commented Feb 8, 2019 •

edited

Loading

staebler commented Feb 8, 2019 •

edited

Loading