Skip to content

Conversation

@dkhater-redhat
Copy link
Contributor

- What I did

- How to verify it

- Description for the changelog

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 20, 2024

@dkhater-redhat: This pull request references MCO-1092 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat dkhater-redhat marked this pull request as draft March 20, 2024 20:28
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2024
@dkhater-redhat dkhater-redhat marked this pull request as ready for review March 22, 2024 19:11
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2024
@openshift-ci openshift-ci bot requested review from cdoern and cheesesashimi March 22, 2024 19:13
@dkhater-redhat dkhater-redhat force-pushed the mco-1092 branch 2 times, most recently from 24ce75e to eeb92c5 Compare March 22, 2024 19:26
@dkhater-redhat
Copy link
Contributor Author

dkhater-redhat commented Mar 22, 2024

#4270

integrated api migration cleanup into this PR.

@dkhater-redhat
Copy link
Contributor Author

/retest-required

@rioliu-rh
Copy link

build a cluster based on this PR
check featuregate status

$ oc get featuregates/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2024-03-27T07:10:40Z"
  generation: 1
  name: cluster
  resourceVersion: "1694"
  uid: 4bb3004e-f7fa-46c3-8e59-8418d6f5ce06
spec: {}
status:
  featureGates:
  - disabled:
    - name: AdminNetworkPolicy
    - name: AlertingRules
    - name: AutomatedEtcdBackup
    - name: CSIDriverSharedResource
    - name: ClusterAPIInstall
    - name: DNSNameResolver
    - name: DisableKubeletCloudCredentialProviders
    - name: DynamicResourceAllocation
    - name: EventedPLEG
    - name: Example
    - name: ExternalOIDC
    - name: ExternalRouteCertificate
    - name: GCPClusterHostedDNS
    - name: GCPLabelsTags
    - name: GatewayAPI
    - name: HardwareSpeed
    - name: ImagePolicy
    - name: InsightsConfig
    - name: InsightsConfigAPI
    - name: InsightsOnDemandDataGather
    - name: InstallAlternateInfrastructureAWS
    - name: MachineAPIOperatorDisableMachineHealthCheckController
    - name: MachineAPIProviderOpenStack
    - name: MachineConfigNodes
    - name: ManagedBootImages
    - name: MaxUnavailableStatefulSet
    - name: MetricsServer
    - name: MixedCPUsAllocation
    - name: NewOLM
    - name: NodeDisruptionPolicy
    - name: NodeSwap
    - name: OnClusterBuild
    - name: PinnedImages
    - name: PlatformOperators
    - name: RouteExternalCertificate
    - name: SignatureStores
    - name: SigstoreImageVerification
    - name: TranslateStreamCloseWebsocketRequests
    - name: UpgradeStatus
    - name: ValidatingAdmissionPolicy
    - name: VolumeGroupSnapshot
    enabled:
    - name: AlibabaPlatform
    - name: AzureWorkloadIdentity
    - name: BareMetalLoadBalancer
    - name: BuildCSIVolumes
    - name: CloudDualStackNodeIPs
    - name: ExternalCloudProvider
    - name: ExternalCloudProviderAzure
    - name: ExternalCloudProviderExternal
    - name: ExternalCloudProviderGCP
    - name: KMSv1
    - name: NetworkLiveMigration
    - name: OpenShiftPodSecurityAdmission
    - name: PrivateHostedZoneAWS
    - name: VSphereControlPlaneMachineSet
    - name: VSphereStaticIPs
    version: 4.16.0-0.test-2024-03-27-070003-ci-ln-3mw3sct-latest

update featuregate/cluster to disable featuregate AlibabaPlatform

 $ oc get featuregate cluster -o yaml | yq -y '.spec'
customNoUpgrade:
  disabled:
    - AlibabaPlatform
featureSet: CustomNoUpgrade

the featuregate can be disabled successfully

 $ oc get featuregate cluster -o yaml | yq  '.status.featureGates[0].disabled' | grep -i alibaba
    "name": "AlibabaPlatform"

@dkhater-redhat @yuqi-zhang is there any other testing needed for this PR?

Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start can we bump to the current latest openshift/api? I am also missing the changes required to bring in the new crds. I believe you will need to update the tools.go and crds-sync.sh to do this. See an example in [1] cc @cdoern

[1] https://github.com/openshift/machine-config-operator/pull/4271/files#diff-fc3c5d029ebed75612ecdb5eabc69c5f5b3c091e7a0999510344547555171728

if len(mcs) > 0 {
t.Errorf("expected no machine config generated with the default feature gate, got %d configs", len(mcs))
if len(mcs) == 0 {
t.Errorf("expected machine configs to be generated, but none were produced")
Copy link
Contributor

@hexfusion hexfusion Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this it seems like the result should be the same? is there a reason why they are not the same? maybe add a note on what the later means

createNewDefaultFeatureGateAccess()
featuregates.NewHardcodedFeatureGateAccess(nil, nil)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the comparison-
If the function is expected to generate machine configs even with default or minimal configuration, then the original assertion is correct.

If the function should not generate any machine configs when default feature gate settings are applied, then the revised assertion would be the correct one.

running the test with > 0 fails the test, as it generates 2 machine configs

The original test checks if any machine configs were generated by evaluating the length of the returned slice of machine configs. The test expects no machine configs to be generated. If any are generated, the test fails, indicating that machine configs were unexpectedly produced under default feature gate settings.

But now, we do not have a concept of "default" feature gates, meaning that a machine config should be generated regardless.

It does not imply a "default" state in the sense of applying no configurations, it means that there's no specific guidance on which features should or should not be considered, leaving RunFeatureGateBootstrap to operate based on its internal logic or other inputs.

in reference to featuregates.NewHardcodedFeatureGateAccess(nil, nil), it creates a FeatureGateAccess object with no explicitly enabled or disabled features. the previous function created also did the same (output enabled and disabled featuregates), however its functionality depended on a predefined list of default features. this dependency broke with the new api bump, as our method of obtaining default feature was deprecated.

@dkhater-redhat dkhater-redhat force-pushed the mco-1092 branch 2 times, most recently from a86c601 to ec6598a Compare March 27, 2024 20:13
updateOriginalKubeConfigwithNodeConfig(nodeConfig, originalKubeConfig)
}

defaultFeatures, err := generateFeatureMap(createNewDefaultFeatureGateAccess(), openshiftOnlyFeatureGates...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic also exists in the kubeletconfig generation, so we should clean this up (but not as part of this PR)

We should also look to remove this whole sub-controller if that's possible with the new workflow

@yuqi-zhang
Copy link
Contributor

@dkhater-redhat @yuqi-zhang is there any other testing needed for this PR?

@rioliu-rh I think that looks correct from an API perspective. Would you be able to test whether the kubelet itself on-disk has the right featuregates? We are removing the hard-coded ones, I'm not sure how this shoud look like

@dkhater-redhat
Copy link
Contributor Author

/retest-required

@rioliu-rh
Copy link

compare the featuregates with kubelet.conf on cluster node
export featuregate from kubelet.conf

debug node/ip-10-0-60-48.us-west-2.compute.internal -- chroot /host cat /etc/kubernetes/kubelet.conf | yq '.featureGates' > /tmp/fg.json

$ cat /tmp/fg.json
{
  "AdminNetworkPolicy": false,
  "AlertingRules": false,
  "AlibabaPlatform": true,
  "AutomatedEtcdBackup": false,
  "AzureWorkloadIdentity": true,
  "BareMetalLoadBalancer": true,
  "BuildCSIVolumes": true,
  "CSIDriverSharedResource": false,
  "CloudDualStackNodeIPs": true,
  "ClusterAPIInstall": false,
  "DNSNameResolver": false,
  "DisableKubeletCloudCredentialProviders": false,
  "DynamicResourceAllocation": false,
  "EventedPLEG": false,
  "Example": false,
  "ExternalCloudProvider": true,
  "ExternalCloudProviderAzure": true,
  "ExternalCloudProviderExternal": true,
  "ExternalCloudProviderGCP": true,
  "ExternalOIDC": false,
  "ExternalRouteCertificate": false,
  "GCPClusterHostedDNS": false,
  "GCPLabelsTags": false,
  "GatewayAPI": false,
  "HardwareSpeed": false,
  "ImagePolicy": false,
  "InsightsConfig": false,
  "InsightsConfigAPI": false,
  "InsightsOnDemandDataGather": false,
  "InstallAlternateInfrastructureAWS": false,
  "KMSv1": true,
  "MachineAPIOperatorDisableMachineHealthCheckController": false,
  "MachineAPIProviderOpenStack": false,
  "MachineConfigNodes": false,
  "ManagedBootImages": false,
  "MaxUnavailableStatefulSet": false,
  "MetricsServer": false,
  "MixedCPUsAllocation": false,
  "NetworkLiveMigration": true,
  "NewOLM": false,
  "NodeDisruptionPolicy": false,
  "NodeSwap": false,
  "OnClusterBuild": false,
  "OpenShiftPodSecurityAdmission": true,
  "PinnedImages": false,
  "PlatformOperators": false,
  "PrivateHostedZoneAWS": true,
  "RouteExternalCertificate": false,
  "SignatureStores": false,
  "SigstoreImageVerification": false,
  "TranslateStreamCloseWebsocketRequests": false,
  "UpgradeStatus": false,
  "VSphereControlPlaneMachineSet": true,
  "VSphereStaticIPs": true,
  "ValidatingAdmissionPolicy": false,
  "VolumeGroupSnapshot": false
}

compare total featuregates number with exported file

$ oc get featuregate cluster -o yaml | yq -y '.status.featureGates' | egrep -v 'disabled|enabled|version' | wc -l
      56
$ cat /tmp/fg.json | egrep -v '{|}' | wc -l
      56

compare enabled featuregates with exported file

$ for fg in $( oc get featuregate cluster -o yaml | yq '.status.featureGates[].enabled[].name');do grep $fg /tmp/fg.json;done
  "AlibabaPlatform": true,
  "AzureWorkloadIdentity": true,
  "BareMetalLoadBalancer": true,
  "BuildCSIVolumes": true,
  "CloudDualStackNodeIPs": true,
  "ExternalCloudProvider": true,
  "ExternalCloudProviderAzure": true,
  "ExternalCloudProviderExternal": true,
  "ExternalCloudProviderGCP": true,
  "KMSv1": true,
  "NetworkLiveMigration": true,
  "OpenShiftPodSecurityAdmission": true,
  "PrivateHostedZoneAWS": true,
  "VSphereControlPlaneMachineSet": true,
  "VSphereStaticIPs": true,

compare disabled featuregates with exported file

$ for fg in $( oc get featuregate cluster -o yaml | yq '.status.featureGates[].disabled[].name');do grep $fg /tmp/fg.json;done
  "AdminNetworkPolicy": false,
  "AlertingRules": false,
  "AutomatedEtcdBackup": false,
  "CSIDriverSharedResource": false,
  "ClusterAPIInstall": false,
  "DNSNameResolver": false,
  "DisableKubeletCloudCredentialProviders": false,
  "DynamicResourceAllocation": false,
  "EventedPLEG": false,
  "Example": false,
  "ExternalOIDC": false,
  "ExternalRouteCertificate": false,
  "GCPClusterHostedDNS": false,
  "GCPLabelsTags": false,
  "GatewayAPI": false,
  "HardwareSpeed": false,
  "ImagePolicy": false,
  "InsightsConfig": false,
  "InsightsConfigAPI": false,
  "InsightsOnDemandDataGather": false,
  "InstallAlternateInfrastructureAWS": false,
  "MachineAPIOperatorDisableMachineHealthCheckController": false,
  "MachineAPIProviderOpenStack": false,
  "MachineConfigNodes": false,
  "ManagedBootImages": false,
  "MaxUnavailableStatefulSet": false,
  "MetricsServer": false,
  "MixedCPUsAllocation": false,
  "NewOLM": false,
  "NodeDisruptionPolicy": false,
  "NodeSwap": false,
  "OnClusterBuild": false,
  "PinnedImages": false,
  "PlatformOperators": false,
  "RouteExternalCertificate": false,
  "SignatureStores": false,
  "SigstoreImageVerification": false,
  "TranslateStreamCloseWebsocketRequests": false,
  "UpgradeStatus": false,
  "ValidatingAdmissionPolicy": false,
  "VolumeGroupSnapshot": false

@djoshy
Copy link
Contributor

djoshy commented Mar 28, 2024

/retest-required

1 similar comment
@hexfusion
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2024

@dkhater-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn a98c275 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Will create follow up cards for hard-coded tests + removal of kubeletconfigfeatures

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 28, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [dkhater-redhat,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7f0beb8 into openshift:master Mar 28, 2024
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-machine-config-operator-container-v4.16.0-202403281715.p0.g7f0beb8.assembly.stream.el8 for distgit ose-machine-config-operator.
All builds following this will include this PR.

@rioliu-rh
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 28, 2024
stbenjam added a commit to stbenjam/machine-config-operator that referenced this pull request Mar 29, 2024
openshift-merge-bot bot added a commit that referenced this pull request Mar 29, 2024
TRT-1587: Revert #4275 "MCO-1092: Adapt the MCO's featuregate usage to new API"
dkhater-redhat added a commit to dkhater-redhat/machine-config-operator that referenced this pull request Apr 3, 2024
cheesesashimi pushed a commit to cheesesashimi/machine-config-operator that referenced this pull request Apr 9, 2024
djoshy pushed a commit to djoshy/machine-config-operator that referenced this pull request Apr 10, 2024
djoshy pushed a commit to djoshy/machine-config-operator that referenced this pull request Apr 11, 2024
dkhater-redhat added a commit to dkhater-redhat/machine-config-operator that referenced this pull request Apr 11, 2024
dkhater-redhat added a commit to dkhater-redhat/machine-config-operator that referenced this pull request Apr 11, 2024
hexfusion pushed a commit to hexfusion/machine-config-operator that referenced this pull request Apr 11, 2024
djoshy added a commit to djoshy/machine-config-operator that referenced this pull request Apr 11, 2024
djoshy added a commit to djoshy/machine-config-operator that referenced this pull request Apr 12, 2024
djoshy pushed a commit to djoshy/machine-config-operator that referenced this pull request Apr 12, 2024
djoshy pushed a commit to djoshy/machine-config-operator that referenced this pull request Apr 13, 2024
inesqyx pushed a commit to inesqyx/machine-config-operator that referenced this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants