Skip to content

Change hfs controller to be level-triggered when handling bmh updates#995

Merged
metal3-io-bot merged 1 commit intometal3-io:masterfrom
bfournie:hfs-remove-edge-trigger
Dec 10, 2021
Merged

Change hfs controller to be level-triggered when handling bmh updates#995
metal3-io-bot merged 1 commit intometal3-io:masterfrom
bfournie:hfs-remove-edge-trigger

Conversation

@bfournie
Copy link
Copy Markdown
Member

@bfournie bfournie commented Oct 14, 2021

Change to use a poll cycle in the hfs reconciler to get data from Ironic instead of relying on state difference. In addition, the following changes were made to address follow-on comments in
#938.

  • changed name of condition UpdateRequested to FirmwareSettingsChangeDetected
  • add each hostFirmwareSettings as owner of its FirmwareSchema so that garbage collection will only be done when all removed
  • Change the SetOwnerReference/SetControllerReference calls to before Create so the Update can be removed
  • When building BIOS settings for cleaning make sure the hfs settings don't overwrite the vendor-agnostic settings
  • When generating the hash for the firmwareSchema include the schema values in addition to the names

@metal3-io-bot metal3-io-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 14, 2021
@bfournie
Copy link
Copy Markdown
Member Author

/test-integration

@elfosardo
Copy link
Copy Markdown
Member

/lgtm

@metal3-io-bot
Copy link
Copy Markdown
Contributor

@elfosardo: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bfournie
Copy link
Copy Markdown
Member Author

/cc @zaneb
/cc @andfasano

@metal3-io-bot
Copy link
Copy Markdown
Contributor

@bfournie: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test generate
  • /test gofmt
  • /test golint
  • /test gomod
  • /test gosec
  • /test manifestlint
  • /test markdownlint
  • /test shellcheck
  • /test unit

Use /test all to run the following jobs that were automatically triggered:

  • generate
  • gofmt
  • golint
  • gomod
  • gosec
  • manifestlint
  • unit
Details

In response to this:

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-ipv6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andfasano
Copy link
Copy Markdown
Member

After reviewing all the changes, on a second thought I'm getting convinced that probably the relationships between BMH and HFS could be relaxed a little bit, possibly leading to a simplified approach that will honor all the points previously discussed. Following the main general points:

  1. BMH needs only to wait, in the Preparing state, for the related HFS resource to be ready (and here using the conditions looks fine). If the HFS is not ready, it will re-enqueue a reconciliation request before proceeding. If it is, the cleaning steps could be taken as usual.
  2. HFS controller will stop listening BMH events, and it should take care only of filling (and validating) the HFS resource by fetching the hw settings at every reconcile loop, ensuring that its Conditions properly reflect the current situation. The reconciliation will be by default reenqueued on a reasonable time basis (ie 5 minutes, to be decided), to ensure that any external changes to the hw settings will be captured (for example, if any setting will be changed after the cleaning step in the Preparing state).
  3. Given that, the responsibility to create an (empty) HFS resource if not present will come back under the BMH controller (something to be performed at every reconcile loop, if not under deletion)

I understand that this means a significant change to the current design, but the overall idea is to look for a simplified approach that maybe could be reused also in future for other similar situations.

Wdyt?

@bfournie bfournie force-pushed the hfs-remove-edge-trigger branch from e8c86d4 to ca9f989 Compare November 19, 2021 00:21
@bfournie
Copy link
Copy Markdown
Member Author

After reviewing all the changes, on a second thought I'm getting convinced that probably the relationships between BMH and HFS could be relaxed a little bit, possibly leading to a simplified approach that will honor all the points previously discussed. Following the main general points:

  1. BMH needs only to wait, in the Preparing state, for the related HFS resource to be ready (and here using the conditions looks fine). If the HFS is not ready, it will re-enqueue a reconciliation request before proceeding. If it is, the cleaning steps could be taken as usual.
  2. HFS controller will stop listening BMH events, and it should take care only of filling (and validating) the HFS resource by fetching the hw settings at every reconcile loop, ensuring that its Conditions properly reflect the current situation. The reconciliation will be by default reenqueued on a reasonable time basis (ie 5 minutes, to be decided), to ensure that any external changes to the hw settings will be captured (for example, if any setting will be changed after the cleaning step in the Preparing state).
  3. Given that, the responsibility to create an (empty) HFS resource if not present will come back under the BMH controller (something to be performed at every reconcile loop, if not under deletion)

I understand that this means a significant change to the current design, but the overall idea is to look for a simplified approach that maybe could be reused also in future for other similar situations.

Wdyt?

I think its a good idea and will simplify the hfs controller. I've updated the PR to:

  1. Remove the watch of bmh resource in hfs controller
  2. Remove the hfs EventUpdate handler filtering
  3. Create the hfs resource in bmh controller when registering
  4. Add a requeue to the hfs reconciler to poll the data from Ironic and pick up any changes as a result of cleaning (or other events)
  5. In bmh controller, before entering the preparing state check that the hfs is ready

@bfournie
Copy link
Copy Markdown
Member Author

/test-v1b1-centos-integration

Comment thread controllers/metal3.io/baremetalhost_controller.go Outdated
Comment thread controllers/metal3.io/hostfirmwaresettings_controller.go Outdated
@bfournie bfournie force-pushed the hfs-remove-edge-trigger branch from ca9f989 to 29c9d2c Compare November 19, 2021 15:54
@bfournie
Copy link
Copy Markdown
Member Author

/test-v1b1-integration

1 similar comment
@andfasano
Copy link
Copy Markdown
Member

/test-v1b1-integration

Copy link
Copy Markdown
Member

@andfasano andfasano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bfournie for the patch, the code looks simpler and the responsibilities across the controllers well assigned, addressing the points previously discussed.

/approve

Comment thread controllers/metal3.io/hostfirmwaresettings_controller.go Outdated
@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2021
@bfournie bfournie force-pushed the hfs-remove-edge-trigger branch from 29c9d2c to 24bb040 Compare November 22, 2021 17:31
@bfournie
Copy link
Copy Markdown
Member Author

/test-v1b1-centos-integration
/test-v1b1-integration

@bfournie
Copy link
Copy Markdown
Member Author

/test-v1b1-centos-integration

@bfournie
Copy link
Copy Markdown
Member Author

/retest

@bfournie
Copy link
Copy Markdown
Member Author

/test-v1b1-centos-integration
/test-v1b1-integration

Comment thread controllers/metal3.io/hostfirmwaresettings_controller.go Outdated
@bfournie bfournie force-pushed the hfs-remove-edge-trigger branch from 24bb040 to 8a144dd Compare December 2, 2021 14:25
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 2, 2021
@bfournie
Copy link
Copy Markdown
Member Author

bfournie commented Dec 2, 2021

/test-v1b1-centos-integration
/test-v1b1-integration

Change to use the Preparing state end time as a trigger that cleaning
has just completed instead of relying on state difference. In addition,
the following changes were made to address follow-on comments in
metal3-io#938.
- added Preparing state to OperationalMetrics so its end time can be used for level-trigger
- split Status Settings into LocalData (ReadOnly or Unique fields) and writeable Settings
- changed name of condition `UpdateRequested` to `FirmwareSettingsChangeDetected`
- add each hostFirmwareSettings as owner of its FirmwareSchema so that garbage collection can be done
- Change the SetOwnerReference/SetControllerReference calls to before Create so the Update could be removed
- When building BIOS settings for cleaning make sure the hfs settings don't overwrite the vendor-agnostic settings
- When generating the hash for the firmwareSchema include the schema values in addition to the names in the hash
@bfournie bfournie force-pushed the hfs-remove-edge-trigger branch from 8a144dd to dab9035 Compare December 8, 2021 00:01
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 8, 2021
@bfournie
Copy link
Copy Markdown
Member Author

bfournie commented Dec 8, 2021

/test-v1b1-centos-integration
/test-v1b1-integration

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano, elfosardo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dtantsur
Copy link
Copy Markdown
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 10, 2021
@metal3-io-bot metal3-io-bot merged commit 3f19cf4 into metal3-io:master Dec 10, 2021
@bfournie bfournie deleted the hfs-remove-edge-trigger branch December 10, 2021 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants