Skip to content

Add ValidateForAdmission function#581

Closed
honza wants to merge 2 commits intometal3-io:masterfrom
honza:validate-unique
Closed

Add ValidateForAdmission function#581
honza wants to merge 2 commits intometal3-io:masterfrom
honza:validate-unique

Conversation

@honza
Copy link
Copy Markdown
Member

@honza honza commented Jul 7, 2020

This function determines if a new BareMetalHost can be added to the
cluster. It's suitable to be run from a ValidateCreate admission
webhook.

Currently, it checks for BMC address uniquess, and BootMACAddress
uniqueness.

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 7, 2020
var allErrs field.ErrorList

for _, existingHost := range hostList.Items {
if host.Spec.BMC.Address == existingHost.Spec.BMC.Address {
Copy link
Copy Markdown
Member

@stbenjam stbenjam Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check for .Address == "" earlier? For unmanaged hosts, the BMC address will be optional. If this lands first perhaps we defer adding that check in the unmanaged PR #569

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how think about uniqueness in this context given that multiple fields are nullable. Do we have a list of hard requirements/rules?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the BMC.Address to be unique, if it is set, to avoid having 2 resources managing the same physical host.

We want the BMC.CredentialsName to be unique, if it is set, to avoid having 2 host resources taking ownership of the same Secret resource.

We want the BootMACAddress to be unique to avoid having 2 resources trying to claim the same data for a discovered node. The current draft of the code says the field can be empty, but I don't think we want to support that given that we've seen bugs recently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a check for the credentials name field.

Namespace: host.Namespace,
}

err := c.List(ctx, &hostList, opts)
Copy link
Copy Markdown
Member

@stbenjam stbenjam Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use field selectors to do a query for existing BMH with the same MAC or BMC address instead of listing all the hosts (of which there could be thousands) e.g. https://godoc.org/sigs.k8s.io/controller-runtime/pkg/client#ListOptions

Not sure if the API tools work that way to allow us to do such a query (or if it's even better), I haven't done anything like this before.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use field selectors to do a query for existing BMH with the same MAC or BMC address instead of listing all the hosts (of which there could be thousands)?

No. The field selector only operates on a small number of fields, think metadata.name.

There seem to be many fields we need to check, and doing a query for each one seems slow. This way we grab all of the resources at once, and can quickly run through them to validate them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the client used the same type of caching client that a controller uses?

@dhellmann
Copy link
Copy Markdown
Member

Is it possible to do this as a standalone helper function in its own package or does the web hook framework assume it is a method of the type? Adding it this way is going to cause anything consuming the host resource to pull in all of the new dependencies as well, even if they are not implementing the web hook.

@honza honza force-pushed the validate-unique branch from fb899e0 to b01589d Compare July 8, 2020 12:04
@honza
Copy link
Copy Markdown
Member Author

honza commented Jul 8, 2020

Is it possible to do this as a standalone helper function in its own package or does the web hook framework assume it is a method of the type? Adding it this way is going to cause anything consuming the host resource to pull in all of the new dependencies as well, even if they are not implementing the web hook.

Yes. The pattern that I have seen is that the webook handler contains a one-liner which is a method call on the resource type. But it's easy to move it to a new package to avoid the dependency problem. I added a new package called validation.

@honza honza force-pushed the validate-unique branch from b01589d to 24785e9 Compare July 8, 2020 13:19
Copy link
Copy Markdown
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good. I had one comment on test coverage.

In other cases where we queried the API for all resources of a type we assumed that the caching client would make that operation relatively quick. I wonder if that cache could introduce a race condition that would bypass the admission check, though. What do we know about the concurrency behavior of the admission web hooks?

Comment thread pkg/validation/admission_test.go
@honza honza force-pushed the validate-unique branch from 24785e9 to 3f36e00 Compare July 13, 2020 08:13
Comment thread pkg/validation/admission_test.go Outdated
@honza honza force-pushed the validate-unique branch from 3f36e00 to c3d817b Compare July 14, 2020 15:17
Copy link
Copy Markdown
Member

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2020
@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, honza

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2020
@dhellmann
Copy link
Copy Markdown
Member

/test-integration

Comment thread pkg/validation/admission.go Outdated
Namespace: host.Namespace,
}

err := c.List(ctx, &hostList, opts)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume we don't have a cache here, but even so this is still susceptible to race conditions. Are we saying we just don't care about that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the runtime creates a caching client.

Do we have any means of performing atomic operations on a set of resources?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only using the kubernetes API. So we'd have to have some sort of CRD in which MAC addresses and BMC addresses could be claimed. There's no way that I know of to make that atomic with creating the BareMetalHost though, so I'm not even sure how that would work.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the client code from the function, and changed it to accept a list of existing resources; we'll leave it up to the caller for now to deal with caching/atomic semantics.

Comment thread pkg/validation/admission.go Outdated
Comment thread pkg/validation/admission.go Outdated
Comment thread pkg/validation/admission.go Outdated
// ValidateHostForAdmission returns an error if the host fails uniqueness
// validation, and cannot be added to the cluster. This is ordinarily run by an
// admission webhook.
func ValidateHostForAdmission(c client.Client, host metal3v1alpha1.BareMetalHost) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a validating webhook we'll almost certainly want to use it for other things that don't involve uniqueness across the whole cluster. It might pay to group this stuff that does in a function with a more specific name.

@honza honza force-pushed the validate-unique branch from c3d817b to f817829 Compare July 17, 2020 14:37
@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2020
@metal3-io-bot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

Comment thread pkg/validation/admission.go Outdated
@honza honza force-pushed the validate-unique branch from f817829 to e873171 Compare July 18, 2020 10:16
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realised that I typed a comment on Friday and neglected to hit submit.

Comment thread pkg/validation/admission.go Outdated
}

// The BMC address is optional so only validate it if it's set
if host.Spec.BMC.Address != "" && host.Spec.BMC.Address == existingHost.Spec.BMC.Address {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to dig down a bit further rather than just comparing strings. e.g. redfish://192.168.122.1 and redfish-ilo5://192.168.122.1 are the same BMC.
I'm not sure how the path components come into it. Is it possible with e.g. redfish that multiple hosts share a BMC with the same IP address but different paths?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, chassis can have multiple hosts with 1 controller (think blade servers). Each host would have a different path, with (for example) the /Systems/1 part changing based on which host is being used.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, so we should compare the hostname, path (for redfish), and presumably port (for IPMI/iDRAC/iRMC), but not the scheme.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much knowledge of that sort of thing do we want to embed here, vs. using the AccessDetails implementations to do the comparison?

This function determines if a new BareMetalHost can be added to the
cluster.  It's suitable to be run from a `ValidateCreate` admission
webhook.

Currently, it checks for the uniqueness of the following fields:

* BMC address
* BMC credentials name
* Boot MAC address

Uniqueness of a field is only checked if set.
This patch adds a UniqueAccessPath() function to the AccessDetails
interface which asks implementers to indicate how the driver implements
uniqueness.
@honza
Copy link
Copy Markdown
Member Author

honza commented Sep 8, 2020

I just reimplemented the logic that performs the uniqueness check. I added a new function to the AccessDetails interface which asks implementers to indicate how to compare nodes for uniqueness.

The unit tests could use some work but I wanted to see first if this was in the general direction we wanted to go in.

continue
}

// If the two hosts don't share drivers, we can assume that they don't conflict
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a safe assumption. BMCs can support multiple protocols.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A safer assumption might be:

  • same IP but different driver -> reject
  • same IP and same driver -> compare unique access path
  • different IP -> accept

This has the advantage of blocking anyone trying to use different drivers for different blades in a blade server :D

We can resolve any hostnames to IP addresses to avoid aliasing at that level. Then the only thing we won't catch is things with multiple IP addresses (dual stack IPv4/IPv6 is probably the only likely one to happen in the real world, I'd expect).

@dhellmann
Copy link
Copy Markdown
Member

I would like to freeze go code changes for a few days to try to land #650 without having to rebase it, because rebasing will mean redoing the work from scratch.

/hold

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2020
@metal3-io-bot
Copy link
Copy Markdown
Contributor

@honza: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
generate 6eb5aab link /test generate
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dhellmann
Copy link
Copy Markdown
Member

#655 has merged

/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2020
@metal3-io-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2021
@honza honza closed this Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants