From d0c8c18196330e3cbffcb1c8f8f36d6a5bdfec40 Mon Sep 17 00:00:00 2001 From: Cody Hoag Date: Mon, 22 Feb 2021 18:17:45 -0500 Subject: [PATCH 1/2] RN 4.7 bug fixes --- release_notes/ocp-4-7-release-notes.adoc | 698 +++++++++++++++++++++-- 1 file changed, 657 insertions(+), 41 deletions(-) diff --git a/release_notes/ocp-4-7-release-notes.adoc b/release_notes/ocp-4-7-release-notes.adoc index d96ed68a93f1..bfdec82aadfe 100644 --- a/release_notes/ocp-4-7-release-notes.adoc +++ b/release_notes/ocp-4-7-release-notes.adoc @@ -83,6 +83,26 @@ The following Ignition updates are now available: * When executing in non-default AWS partitions, such as GovCloud or AWS China, Ignition now fetches `s3://` resources from the same partition. * Ignition now supports AWS EC2 Instance Metadata Service Version 2 (IMDSv2). +[id="ocp-4-7-rhcos-configure-timeout-when-acquiring-dhcp-lease"] +==== Configuring the timeout value used when trying to acquire a DHCP lease + +Previously, {op-system} DHCP kernel parameters were not working as expected because acquiring a DHCP lease would take longer than the default 45 seconds. With this fix, you now have the ability to configure the timeout value that is used when trying to acquire a DHCP lease. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1879094[*BZ#1879094*] for more information. + +[id="ocp-4-7-rhcos-supports-multipath"] +==== {op-system} supports multipath + +{op-system} now supports multipath on the primary disk, allowing stronger resilience to hardware failure so that you can set up {op-system} on top of multipath to achieve higher host availability. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1886229[*BZ#1886229*] for more information. + +[id="ocp-4-7-rhcos-fetching-aws-from-imdsv2"] +==== Fetching configs on AWS from Instance Metadata Service Version 2 (IMDSv2) + +Ignition now supports fetching configs on AWS from Instance Metadata Service Version 2 (IMDSv2). With this enhancement, AWS EC2 instances can be created with IMDSv1 disabled so that IMDSv2 is needed to read the Ignition config from instance userdata. As a result, Ignition successfully reads its config from instance userdata, regardless of whether IMDSv1 is enabled or not. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1899220[*BZ#1899220*] for more information. + +[id="ocp-4-7-rhcos-qemu-now-included"] +==== Qemu guest agent is now included in {op-system} + +The Qemu guest agent is now included by default in {op-system}. With this enhancement, Red Hat Virtualization (RHV) administrators can see rich information about {op-system} nodes through the reporting of useful information about {op-system} back to the RHV management interface. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1900759[*BZ#1900759*] for more information. + [id="ocp-4-7-installation-and-upgrade"] === Installation and upgrade @@ -148,6 +168,11 @@ You can now configure nodes to have more than 26 persistent Cinder volumes in cl The `computeFlavor` property that is used in the `install-config.yaml` file is deprecated. As an alternative, you can now configure machine pool flavors in the `platform.openstack.defaultMachinePlatform` property. +[id="ocp-4-7-using-static-ip-ipi"] +==== Using static DHCP reservations for the bootstrap host for clusters with installer-provisioned infrastructure + +In previous versions of {product-title}, you could not assign a static IP address to the bootstrap host of a bare metal installation that used installer-provisioned infrastructure. Now, you can specify the MAC address that is used by the bootstrap virtual machine, which means you can use static DHCP reservations for the bootstrap host. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1867165[*BZ#1867165*] for more information. + [id="ocp-4-7-enhancements-to-installer-provisioned-installation"] ==== Enhancements to installer-provisioned installation @@ -462,6 +487,11 @@ For more information, see xref:../storage/container_storage_interface/persistent The vSphere Problem Detector Operator periodically checks functionality of {product-title} clusters installed in a vSphere environment. The vSphere Problem Detector Operator is installed by default by the Cluster Storage Operator, allowing you to quickly identify and troubleshoot common storage issues, such as configuration and permissions, on vSphere clusters. +[id="ocp-4-7-storage-operator-cr-collection"] +==== Local Storage Operator now collects custom resources + +The Local Storage Operator now includes a must-gather image, allowing you to collect custom resources specific to this Operator for diagnostic purposes. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1756096[*BZ#1756096*] for more information. + [id="ocp-4-7-registry"] === Registry @@ -472,6 +502,11 @@ The {product-title} internal registry and image streams now support Open Contain //Add link +[id="ocp-4-7-registry-image-metrics"] +==== New image stream metrics + +The need to understand if clients are leveraging image stream imports using docker registry v1 protocol resulted in this enhancement, which exports Operator metrics to telemetry. Metrics related to protocol v1 usage are now visible in telemetry. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1885856[*BZ#1885856*] for more information. + [id="ocp-4-7-olm"] === Operator lifecycle @@ -497,6 +532,35 @@ By referencing one or more secrets in a catalog source, some of these required i See xref:../operators/admin/olm-managing-custom-catalogs.adoc#olm-accessing-images-private-registries_olm-managing-custom-catalogs[Accessing images for Operators from private registries] for more details. +[id="ocp-4-7-olm-mirror-command"] +==== Mirroring the content of an Operator catalog into a container image registry + +Cluster administrators can use the `oc adm catalog mirror` command to mirror the content of an Operator catalog into a container image registry. This enhancement updates the `oc adm catalog mirror` command to also now mirror the index image being used for the operation into the registry, which was previously a separate step requiring the `oc image mirror` command. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1832968[*BZ#1832968*] for more information. + +[id="ocp-4-7-olm-new-install-plan"] +==== Creating new install plan for better experience + +Deleting an `InstallPlan` object that is waiting for user approval causes the Operator to be stuck in an unrecoverable state as the Operator installation cannot be completed. This enhancement updates Operator Lifecycle Manager (OLM) to create a new install plan if the previously pending one is deleted. As a result, users can now approve the new install plan and proceed with the Operator installation. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1841175[*BZ#1841175*]) + +[id="ocp-4-7-olm-mirror-images-to-disconnected-registry"] +==== Mirroring images to a disconnected registry by first mirroring the images to local files + +This enhancement updates the `oc adm catalog mirror` command to support mirroring images to a disconnected registry by first mirroring the images to local files. For example: + +[source,terminal] +----- +$ oc adm catalog mirror //: file:///local/index +----- + +Then you can move the local `v2/local/index` directory to a location within the disconnected network and mirror the local files to the disconnected registry: + +[source,terminal] +----- +$ oc adm catalog mirror file:///v2/local/index / +----- + +See link:https://bugzilla.redhat.com/show_bug.cgi?id=1841885[*BZ#1841885*] for more information. + [id="ocp-4-7-osdk"] === Operator development @@ -640,6 +704,12 @@ You can now configure a priority class to be non-preempting by setting the `pree For more information, see xref:../nodes/pods/nodes-pods-priority.html#non-preempting-priority-class_nodes-pods-priority[Non-preempting priority classes]. +[id="ocp-4-7-nodes-crio-support-cpus-for-node-process"] +==== Specifying CPUs for node host processes with CRI-O + +CRI-O now supports specifying CPUs for node host processes (such as kubelet, CRI-O, and so forth). Using the `infra_ctr_cpuset` parameter in the `crio.conf` file allows you to reserve CPUs for the node host processes allowing {product-title} pods that require guaranteed CPUs to operate without any other processes running on those CPUs. Pods that request guaranteed CPUs do not have to compete for CPU time with the node host process. See +link:https://bugzilla.redhat.com/show_bug.cgi?id=1775444[*BZ#1775444*] for more information. + [id="ocp-4-7-logging"] === Red Hat OpenShift Logging @@ -663,7 +733,7 @@ Previously, the EO set the number of shards for an index to the number of data n // https://bugzilla.redhat.com/show_bug.cgi?id=1898920 ==== Updated Elasticsearch Operator name and maturity level -This release updates the display name of the Elasticsearch Operator and operator maturity level. The new display name and clarified specific use for the Elasticsearch Operator are updated in Operator Hub. +This release updates the display name of the Elasticsearch Operator and Operator maturity level. The new display name and clarified specific use for the Elasticsearch Operator are updated in Operator Hub. [discrete] [id="ocp-4-7-es-csv-success"] @@ -672,6 +742,12 @@ This release updates the display name of the Elasticsearch Operator and operator This release adds reporting metrics to indicate that installing or upgrading the Elasticsearch Operator ClusterServiceVersion (CSV) was successful. Previously, there was no way to determine, or generate an alert, if the CSV installation or upgrade for the Elasticsearch Operator failed. Now, an alert is provided as part of the Elasticsearch Operator. +[discrete] +[id="ocp-4-7-es-operator-template-update-changes"] +==== Elasticsearch Operator template update changes + +The Elasticsearch Operator now only updates its rollover index templates if they have different field values. Index templates have a higher priority than indices. When the template is updated, the cluster prioritizes distributing them over the index shards, impacting performance. To minimize Elasticsearch cluster operations, the Operator only updates the templates when the number of primary shards or replica shards changes from what is currently configured. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1920215[*1920215*] for more information. + [discrete] [id="ocp-4-7-reduced-cert-warnings"] // https://bugzilla.redhat.com/show_bug.cgi?id=1884812 @@ -717,6 +793,17 @@ This dashboard provides API server metrics, such as: // TODO: Link to troubleshooting using this dashboard section once it is available +[id="ocp-4-7-monitoring-etcd-alerts"] +==== New etcd alerts + +New etcd alerts are now available: + +* A critical alert when the etcd database quota is 95% full +* A warning alert when there is a sudden surge in etcd writes, leading to an increase in the etcd database quota size +* A critical alert when the 99th percentile of the etcd members fsync duration is greater than 1 second + +See link:https://bugzilla.redhat.com/show_bug.cgi?id=1890808[*BZ#1890808*] for more information. + [id="ocp-4-7-scale"] === Scale @@ -901,7 +988,7 @@ The `--filter-by-os` flag is also now deprecated. [id="ocp-4-7-imagechangesinprogress-deprecated"] ==== ImageChangesInProgress condition for Cluster Samples Operator -Image stream image imports are no longer tracked in real time by conditions on the Cluster Samples Operator configuration resource. In-progress image streams no longer directly affect updates to the `ClusterOperator` instance `openshift-samples`. Prolonged errors with image streams` are now reported by Prometheus alerts. +Image stream image imports are no longer tracked in real time by conditions on the Cluster Samples Operator configuration resource. In-progress image streams no longer directly affect updates to the `ClusterOperator` instance `openshift-samples`. Prolonged errors with image streams are now reported by Prometheus alerts. [id="ocp-4-7-migrationinprogress-deprecated"] ==== MigrationInProgress condition for Cluster Samples Operator @@ -973,6 +1060,18 @@ If you are using the option `--keep-manifest-list=true`, the only valid value fo [id="ocp-4-7-bug-fixes"] == Bug fixes +*api-server-auth* + +* Previously, the `openshift-service-ca` namespace was labeled with `openshift.io/run-level: 1`, which caused the pods in this namespace to run with extra privileges. This label has been removed, and now the pods in this namespace run with the appropriate privileges. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1806915[*BZ#1806915*]) + +* Previously, the `openshift-service-ca-operator` namespace was labeled with `openshift.io/run-level: 1`, which caused the pods in this namespace to run with extra privileges. This label has been removed for new installations, and now the pods in this namespace run with the appropriate privileges. For upgraded clusters, you can remove this label manually and restart the affected pods. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1806917[*BZ#1806917*]) + +* Previously, the configuration to scrape the OAuth API server pods in the `openshift-oauth-apiserver` namespace was missing, and metrics for the OAuth API server pods could not be queried in Prometheus. The missing configuration has been added, and OAuth API server metrics are now available in Prometheus. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887428[*BZ#1887428*]) + +* Previously, a missed condition in the Cluster Authentication Operator code caused its log to be flooded with messages about updates to a deployment that did not occur. The logic for deciding whether to update the Operator status was updated and the Cluster Authentication Operator log no longer receives messages for a deployment update that did not occur. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891758[*BZ#1891758*]) + +* Previously, the Cluster Authentication Operator only watched configuration resources named `cluster`, which caused the Operator to ignore changes in ingress configuration, which was named `default`. This led to incorrectly assuming that there were no schedulable worker nodes when ingress was configured with a custom node selector. The Cluster Authentication Operator now watches all resources regardless of their name, and the Operator now properly observes ingress configuration changes and reconciles worker node availability. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893386[*BZ#1893386*]) + *Bare Metal Hardware Provisioning* * Previously, when trying to enable `baremetal` on assisted installer the `baremetal-operator` errors with `no bmc details`. The Baseboard Management Controller (BMC) details can now be omitted for hosts in an unmanaged state. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1902653[*BZ#1902653*]) @@ -993,21 +1092,190 @@ If you are using the option `--keep-manifest-list=true`, the only valid value fo * Node auto-discovery is no longer enabled in `baremetal` IPI. It was not handled correctly and caused duplicate bare metal hosts registration. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1898517[*BZ#1898517*]) -*Scale* +* Previously, the syslinux-nonlinux package was not included with bare metal provisioning images. As a result, virtual media installations on machines that used BIOS boot mode failed. The package is now included in the image. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1862608[*BZ#1862608*]) -* The `nosmt` additional kernel argument which configures hyperthreading was previously undocumented for use with {product-title}. To disable hyperthreading, create a performance profile that is appropriate for your hardware and topology, and then set `nosmt` as an additional kernel argument. -+ -For more information, see xref:../scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc#about_hyperthreading_for_low_latency_and_real_time_applications_cnf-master[About hyperthreading for low latency and real-time applications]. +* Previously, certain Dell firmware versions reported the Redfish PowerState inaccurately. Updating Dell iDRAC firmware to version 4.22.00.53 resolves the issue. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1873305[*BZ#1873305*]) +* Previously, Redfish was not present in the list of interfaces that can get and set BIOS configuration values. As a result, Redfish could not be used in BIOS configuration. Redfish is now included in the list, and it can be used in BIOS configuration. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1877105[*BZ#1877105*]) -*Networking* +* Previously, the Redfish interface that is used to set BIOS configurations was not implemented properly. As a result, Dell iDRACs could not set BIOS configuration values. The implementation error was corrected. Now, the Redfish interface can set BIOS configurations.(link:https://bugzilla.redhat.com/show_bug.cgi?id=1877924[*BZ#1877924*]) -* The code in `ovn-kube` that detects the default gateway was not taking into consideration multipath environments. As a result, Kubernetes nodes failed to start because they could not find the default gateway. The logic has been modified to consider the first available gateway if multipath is present. OVN-Kubernetes now works in environments with multipath and multiple default gateways. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1914250[*BZ#1914250*]) +* Previously, differences in how Supermicro handles boot device settings through IPMI caused Supermicro nodes that use IPMI and UEFI to fail after an image was written to disk. Supermicro nodes are now passed an appropriate IPMI code to boot from disk. As a result, Supermicro nodes boot from disk correctly after deployment. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885308[*BZ#1885308*]) -* When deploying a cluster in dual stack mode OVN-Kubernetes was using the wrong source of truth. -+ -The OVN-Kubernetes master node performs an initial synchronization to keep OVN and Kubernetes system databases in sync. This issue resulted in race conditions on OVN-Kubernetes startup leading to some of the Kubernetes services becoming unreachable. Bootstrap logic deleted these services as they were considered orphans. -+ -This bug fix ensures Kubernetes is used as the source of truth. OVN-Kubernetes now starts correctly and keeps both OVN and Kubernetes in sync on startup. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1915295[*BZ#1915295*]) +* Bare metal installations on installer-provisioned infrastructure no longer silently skip writing an image when invalid root device hints are provided. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886327[*BZ#1886327*]) +* Previously, incomplete boot mode information for Supermicro nodes caused deployment by using Redfish to fail. That boot mode information is now included. As a result, Supermicro nodes can be deployed using Redfish. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888072[*BZ#1888072*]) + +* The Ironic API service that is embedded in bare-metal installer-provisioned infrastructure now uses four workers instead of eight workers. As a result, RAM usage is reduced. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894146[*BZ#1894146*]) + +*Builds* + +* Previously, Dockerfile builds could not change permissions of the `/etc/pki/ca-trust` directory or create files inside it. This issue was caused by fixing link:https://bugzilla.redhat.com/show_bug.cgi?id=1826183[*BZ#1826183*] in version 4.6, which added support for HTTPS proxies with CAs for builds and always mounted `/etc/pki/ca-trust`, which prevented builds that included their own CAs or modified the system trust store from working correctly at runtime. The current release fixes this issue by reverting Bug 1826183. Now, builder images that include their own CAs work again. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891759[*BZ#1891759*]) + +* Previously, after upgrading from {product-title} version 4.5 to version 4.6, running `git clone` from a private repository failed because builds did not add proxy information to the Git configuration that was used to pull the source code. As a result, the source code could not be pulled if the cluster used a global proxy and the source was pulled from a private Git repository. Now, Git is configured correctly when the cluster uses a global proxy and the `git clone` command can pull source code from a private Git repository if the cluster uses a global proxy. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1896446[*BZ#1896446*]) + +* Previously, the node pull secret feature did not work. Node pull secrets were not used if `forcePull: true` was set in the Source and Docker strategy builds. As a result, builds failed to pull images that required the cluster-wide pull secret. Now, node pull secrets are always merged with user-provided pull secrets. As a result, builds can pull images when `forcePull: true` is set, and the source registry requires the cluster-wide pull secret. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1883803[*BZ#1883803*]) + +* Previously, {product-title} builds failed on `git clone` when SCP-style SSH locations were specified because of Golang URL parsing, which does not accommodate Git SCP-styled SSH locations. As a result, {product-title} builds and Source-to-Image (S2I) failed when those types of source URLs were supplied. Now, builds and S2I bypass Golang URL parsing and strip the `ssh://` prefix to accommodate Git SCP-styled SSH locations (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884270[*BZ#1884270*]) + +* Previously, build errors caused by invalid build pull secrets, whose auth keys were not base64-encoded, did not propagate through the build stack. As a result, determining the root cause of these errors was difficult. The current release fixes this issue, so these types of build errors propagate through the build stack. Now, determining the root cause of invalid build pull secret keys is easier for users. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1918879[*BZ#1918879*]) + +*Cloud Compute* + +* Previously, the Machine API did not provide feedback to users when their credentials secret was invalid, thus making it difficult to diagnose when there were issues with the cloud provider credentials. Users are now warned if there is an issue with their credentials when creating or updating machine sets, for example if the credential secret does not exist or is in the wrong format. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1805639[*BZ#1805639*]) + +* Previously, the bare metal actuator deleted the underlying host by also deleting the `Machine` object, which is not the intended operation of the machine controller. This update sets the `InsufficientResourcesMachineError` error reason on machines when the search for a host is unsuccessful, and thus ensures that machines without a host are scaled down first. Machines are moved into the `Failed` phase if the host is deprovisioned. Now, a machine health check deletes failed machines and the `Machine` object is no longer automatically deleted. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1868104[*BZ#1868104*]) + +* Previously, when a machine entered a `Failed` state, the state of the cloud provider no longer reconciled. Thus, the machine status reported the cloud VM state as `Running` after it was possible to remove the VM. The machine status now more accurately reflects the observed state of the cloud VM as `Unknown` if the machine is in a `Failed` state. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1875598[*BZ#1875598*]) + +* Previously, several Machine API custom resource definitions contained broken links in the template schema description to corresponding reference documents. The links were updated to the correct upstream locations and are no longer broken. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1876469[*BZ#1876469*]) + +* Previously, the command `oc explain Provisioning` did not return the custom resource definition (CRD) description because an older version of the CRD definition was in use. The CRD version was updated, thus `oc explain` for the `Provisioning` CRD now returns the expected information. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1880787[*BZ#1880787*]) + +* Previously, when a user created or updated machines with a disk size less than the recommended minimum size, the machines failed to boot without warning when the disk size was too low. The disk size must be greater than the initial image size. The user is now notified with a warning that the disk size is low and that this might cause their machine or machine set to not start. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882723[*BZ#1882723*]) + +* Previously, the state of a machine did not persist across reconciliation, thus the `Machine` object `instance-state` annotation and `providerStatus.instanceState` occasionally showed different values. Now, the machine state is replicated on the reconciled machine, and the `instance-state` annotation is consistent with the `providerStatus.instanceState` value. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886848[*BZ#1886848*]) + +* Previously, machine sets running on Microsoft Azure in a disconnected environment failed to boot and scale if the `publicIP` option was set to true in the `MachineSet` resource object. Now, to prevent machines from failing, users cannot create machine sets in disconnected environments with this invalid `publicIP` configuration. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889620[*BZ#1889620*]) + +* Previously when creating a machine, only certain errors caused the `mapi_instance_create_failed` failure metric to update. Now, any error that occurs for machine creation appropriately increments the `mapi_instance_create_failed` metric. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890456[*BZ#1890456*]) + +* Previously, the cluster autoscaler used a template node for node scaling decisions in certain circumstances. Occasionally, the `nodeAffinity` predicate failed to scale up as intended, and pending pods could not be scheduled. With this update, the template node includes as many labels as possible to ensure that the cluster autoscaler can scale up and pass node affinity checks. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891551[*BZ#1891551*]) + +* Previously, the machine set default delete priority, which is `random`, did not prioritize nodes in the `Ready` state over nodes that were still building. As a result, especially when scaling a large number of machines, all nodes in the `Ready` state could potentially be deleted when scaling up a machine set and then immediately scaling down. This could also result in the cluster becoming unavailable. Now, a lower priority is assigned to machines that are not yet `Ready`. Thus, a large scale up of machines followed immediately by a scale down deletes machines that are still building before deleting machines that are running workloads. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903733[*BZ#1903733*]) + +*Cluster Version Operator* + +* Previously, a message in the installation and upgrade processes showed that the current process was 100% complete before it completed. This incorrect message was due to a rounding error. Now, the percentage is no longer rounded up, and the message shows both the number of finished subprocesses and an accurate percent compelte value. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1768255[*BZ#1768255*]) + +* Previously, the Cluster Version Operator (CVO) compared the pullspecs with the exact `available-update` and `current-target` values when it merged Cincinnati metadata like channel membership and errata URI. As a result, if you installed from or updated to mirrored release images that used valid alternative pullspecs, you did not receive Cincinnati metadata. Now, the CVO compares releases by digest and correctly associates Cincinnati metadata such as channel membership, regardless of which registry hosts the image. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879976[*BZ#1879976*]) + +* Previously, a race condition with the metrics-serving goroutine sometimes caused the CVO become stuck on shutdown. As a result, CVO behavior like managed-object reconciliation and monitoring was not possible, and updates and installs might freeze. Now, the CVO times out after a few minutes, abandons any stuck metrics goroutines, and shuts down as intended. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891143[*BZ#1891143*]) + +* Previously, some CVO log error messages did not render the variable for the type of changes that they were detecting correctly. Now, the variable is rendered correctly, and the error messages display as intended. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921277[*BZ#1921277*]) + +*CNF Platform Validation* + +* Previously, performing the end-to-end tests for platform validation results in an error for the SCTP validation step when a machine config does not include a config specification. This bug fix skips the SCTP test when the config specification is not found. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889275[*BZ#1889275*]) + +* Previously, when the Performance Addon Operator ran the `hugepages` test on a host with two or more NUMA nodes and the performance profile requested huge pages distributed across the nodes, the test failed. This bug fix corrects how the `hugepages` test determines the number of huge pages for a NUMA node. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889633[*BZ#1889633*]) + +*config-operator* + +* Previously, the deprecated `status.platformStatus` field was not being populated during upgrade, in clusters upgraded since {product-title} 4.1. As a consequence, the upgrade could fail. This fix modified the Cluster Config Operator to populate this field. As a result, the upgrade does not fail because of this field not being populated. +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1890038[*BZ#1890038*]) + +*Console Kubevirt Plugin* + +* Previously, the storage class was not propagating to the VM disk list from persistent volume claims for the `DataVolume` source. The storage class is now visible in the VM disk list of the web console. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1853352[*BZ#1853352*]) + +* Previously, imported SR-IOV networks could be set to different network interface types. With this fix, imported SR-IOV networks are now set only to the SR-IOV network interface type. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1862918[*BZ#1862918*]) + +* Previously, if a VM name was reused in the cluster, VM events displayed in the events screen were not correctly filtered and contained events mixed together from both VMs. Now, events are filtered properly and the events screen displays only the events belonging to the current VM. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878701[*BZ#1878701*]) + +* Previously, the `V2VVMWare` and `OvirtProvider` objects created by the *VM Import* wizard were not cleaned up properly. Now, the `V2VVMWare` and `OvirtProvider` objects are removed as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1881347[*BZ#1881347*]) + +* Previously, utilization data was not displayed for a Virtual Machine Interface (VMI) that did not have an associated VM. Now, if utilization data is available for a VMI, it is displayed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884654[*BZ#1884654*]) + +* Previously, when a PVC was cloned, its VM state was reported as *pending*, but additional information was not displayed. Now, when a PVC is cloned, the VM state is reported as *importing* along with a progress bar and additional info which contains a link to the pod or PVC. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885138[*BZ#1885138*]) + +* Previously, the VM import status displayed an incorrect VM import provider. Now, the VM import status displays the correct VM import provider. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886977[*BZ#1886977*]) + +* Previously, the default pod network interface type was to set to the wrong value. Now, the default pod network interface type is set to masquerade. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887797[*BZ#1887797*]) + +*Console Storage Plugin* + +* Previously, when the Local Storage Operator (LSO) was installed, the disks on a node were not displayed and there was no way to initiate a discovery of the disks on that node. Now, when the LSO is installed, the *Disk* tab is enabled and a *Discover Disks* option is available if a discovery is not already running. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889724[*BZ#1889724*]) + +* With this update, the `Disk Mode` option has been renamed `Volume Mode`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1920367[*BZ#1920367*]) + +*Web console (Developer perspective)* + +* Previously, the user was denied access to pull images from other projects, due to insufficient user permissions. This bug fix removes all the user interface checks for role bindings and shows the `oc` command alert to help users use the command line. With this bug fix, the user is no longer blocked from creating images from different namespaces and is now able to deploy images from their other projects. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894020[*BZ#1894020*]) + +* The console used a prior version of the `KafkaSource` object that used the `resources` and `service account` fields in their specification. The latest `v1beta1` version of the `KafkaSource` object removed these fields, due to which the user was unable to create the `KafkaSource` object with the v1beta1 version. This issue has been fixed now and the user is able to create the `KafkaSource` object with the v1beta1 version. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892653[*BZ#1892653*]) + +* Previously, when you created an application using source code from Git repositories with the `.git` suffix, and then clicked the edit source code link, a `page not found` error was displayed. This fix removes the `.git` suffix from the repository URL and transforms the SSH URL to an HTTPS URL. The generated link now leads to the correct repository page. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1896296[*BZ#1896296*]) + +* Previously, the underlying `SinkBinding` resources were shown in the *Topology* view, along with the actual source created in the case of `Container Source` and `KameletBinding` resources, confusing users. This issue was fixed. Now, only the actual resource created for the event source is displayed in the *Topology* view, and the underlying `SinkBinding` resources, if created, are displayed in the sidebar. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906685[*BZ#1906685*]) + +* Previously, when you installed the Serverless Operator, without creating the eventing custom resource, a channel card was displayed. When you clicked the card, a confusing alert message was displayed. This issue has now been fixed. The channel card, with a proper alert message, is now displayed only if the channel custom resource definition is present. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909092[*BZ#1909092*]) + +* Previously, when you closed the web terminal connection, all the terminal output from that session disappeared. This issue has been fixed. The terminal output is now retained even after the session is closed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909067[*BZ#1909067*]) + +* Technology preview badges were displayed on the Eventing user interface although it had its GA release with {product-title} 4.6. The Technology preview badges are now removed and the changes were back-ported to the {product-title} 4.6.9 version. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894810[*BZ#1894810*]) + +* Previously, volume mounts for deployments were not preserved if the deployment was edited using the console edit flows. The modified deployment YAML overwrote or removed the volume mounts in the pod template specification. This issue has been fixed. The volume mounts are now preserved even when the deployment is edited using the console edit flows. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1867965[*BZ#1867965*]) + +* In case of multiple triggers, one subscribing to Knative service and another to In Memory Channel as subscriber, the Knative resources were not displayed on the *Topology* view. This issue has been fixed now, so that the Knative data model returns proper data, and the Knative resources are displayed on the *Topology* view. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906683[*BZ#1906683*]) + +* Previously, in a disconnected environment, the Helm charts were not displayed in the *Developer Catalog* due to an invalid configuration while fetching code. This issue has been fixed by ensuring that proxy environment variables are considered and the Helm charts are now displayed on the *Developer Catalog*. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1918748[*BZ#1918748*]) + +* While running a Pipeline, the log tab of the `TaskRun` resource displayed the string as `undefined` after the command in the output. This was caused due to some edge cases where some internal string operations printed `undefined` to the log output. This issue has been fixed now, and the pipeline log output does not drop empty lines from the log stream and does not print the string `undefined` any longer. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1915898[*BZ#1915898*]) + +* Previously, the *Port* list in the *Add* flow only provided options for exposed ports and did not allow you to specify a custom port. The list has now been replaced by a typeahead select menu, and now it is possible to specify a custom port while creating the application. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1881881[*BZ#1881881*]) + +* Previously, when conditional tasks failed, the completed pipeline runs showed a permanent pending task for each failed conditional task. This issue has been fixed by disabling the failed conditional tasks and by adding skipped icons to them. This gives a better picture of the state of the pipeline run. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1880389[*BZ#1880389*]) + +* Previously, the pod scale up or down buttons were available for a single pod resource, and the page crashed when the user pressed the scale button. This issue has been fixed by not showing the scale up or down buttons for a single pod resource. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909678[*BZ#1909678*]) + +* Previously, the chart URL for downloading the chart to instantiate a helm release was unreachable. This happened because the `index.yaml` file from the remote repository, referenced in the Helm chart repository, was fetched and used as is. Some of these index files contained relative chart URLs. This issue has now been fixed by translating relative chart URLs to absolute URLs, which makes the chart URL reachable. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1912907[*BZ#1912907*]) + +* With Serverless 0.10, the latest supported versions were updated for `trigger`, `subscription`, `channel`, and `IMC`. Static models corresponding to each showed an API version of `beta`. The API version for eventing resources is now updated to `v1` and the UI now shows the latest supported version. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890104[*BZ#1890104*]) + +*DNS* + +* Previously, a cluster might experience intermittent DNS resolution errors because the `/etc/hosts` file on some nodes included invalid entries. With this release, DNS resolution no longer fails because of an `/etc/hosts` file with invalid entries. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882485[*BZ#1882485*]) + +*etcd* + +* Previously, the etcd readiness probe used `lsof` and `grep` commands, which could leave defunct processes. The etcd readiness probe now uses a TCP port probe, which is less expensive and does not create defunct processes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1844727[*BZ#1844727*]) + +* Previously, when an IP address was changed on a control plane node, which causes the certificates on disk to be invalid, the etcd error messages were not clear why etcd was failing to connect with peers. An IP address change on a control plane node is now detected, an event is reported, and `EtcdCertSignerController` is marked as `Degraded`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882176[*BZ#1882176*]) + +* Previously, new static pod revisions could occur when the etcd cluster had less than three members, which caused temporary quorum loss. Static pod revisions are now avoided when all control plane nodes are not available, and these temporary quorum losses are avoided. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892288[*BZ#1892288*]) + +* Previously, etcd backups included a recovery YAML file that was specific to the control plane node where the backup was taken from, so backups taken from one control plane node could not be restored on another control plane node. The recovery YAML file is now more generic so that the etcd backup can be restored on any control plane node. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1895509[*BZ#1895509*]) + +* Previously, the etcd backup script used the last modified timestamp to determine the latest revision, which caused the incorrect static pod resources to be stored in the etcd backup. The etcd backup script now uses the manifest file to determine the latest revision, and the correct static pod resources are now stored in the etcd backup. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1898954[*BZ#1898954*]) + +* Previously, the bootstrap rendering logic failed to detect a usable machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR was the first element in the install-config machine network CIDR array. The parsing logic was fixed to loop through all machine network CIDRs, so the IPv4 address is now correctly loaded among the machine network CIDRs in dual stack mode. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1907872[*BZ#1907872*]) + +* Previously, if the `openshift-etcd` namespace was deleted, the `etcd-endpoints` config map was not recreated, and the cluster would not recover. The `etcd-endpoints` config map is now recreated if it is missing, allowing the cluster to recover. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1916853[*BZ#1916853*]) + +*Image Registry* + +* The last Kubernetes update enforced a timeout on APIs. This timeout results in every long standing request being dropped after 34 seconds. When importing large repositories, specifically ones with several tags, the timeout is reached, not allowing the import to succeed as in previous versions. There is a flag to set a different timeout on `oc` client but there was not an example provided, making it difficult for the client to understand how to bypass the API timeout. Providing an example of the flag usage on `oc` help made things clear for the client, now it is easier to find this option. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878022[*BZ#1878022*]) + +* Previously, using two distinct versions of the same logging package resulted in Operator logs being partially lost. This fix makes logging package versions equal, which means the upgraded logging package used by the Operator matches the one used by client-go. Now, logs are not lost. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1883502[*BZ#1883502*]) + +* Previously the pruner was trying to detect the registry name using image streams, but when there were no image streams the pruner failed to detect the registry name. With this fix, the Image Registry Operator provides the pruner with the registry name. Now, the pruner does not depend on the existence of image streams to detect the registry name. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887010[*BZ#1887010*]) + +* Previously the Operator pod did not have memory requests, which in case of memory pressure on the node, the Operator could be killed because it was out of memory before other `BestEffort` containers. This fix added memory requests. Now, the Operator is not killed when it is out of memory if there are other `BestEffort` containers on the node. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888118[*BZ#1888118*]) + +* Previously the pruner was trying to detect the registry name using image streams, but when there were no image streams the pruner failed to detect the registry name. With this fix, the Image Registry Operator provides the pruner with the registry name if the registry is configured or disables registry pruning if the registry is not installed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888494[*BZ#1888494*]) + +* Previously, there was a lack of analysis on Operand deployment status when defining the Operator status. This meant that in some scenarios the Image Registry Operator was presenting itself with two contradicting pieces of information. It was informing the user that it was not Available and at the same time not Degraded. These two conditions were still being presented even after the deployment stopped trying to get the image registry up and running. In this scenario the Degraded flag should be set by the Operator. By taking image registry deployment into account, the Operator now sets itself to Degraded if the Operand deployment reaches its progress deadline when trying to get the application running. Now, when the Deployment fails, after the progress deadline has been reached, the Operator sets itself to Degraded. The Operator still reports itself as Progressing while the Operator deployment is progressing. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889921[*BZ#1889921*]) + +* Previously the Image Registry Operator did not use its entrypoint because an explicit command was provided. So a cluster-wide `trusted-ca` was not used by the Operator and the Operator could not connect to storage providers that do not work without custom `trusted-ca`. This fix removed the explicit command from the pod spec. Now, the image entrypoint is used by the container that applies `trusted-ca`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892799[*BZ#1892799*]) + +* Previously the default log level for the pruner was `2`. So when an error happened, the pruner was dumping stack trace. This fix changed the default log level to `1`. Now, only the error message is printed without stack traces. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894677[*BZ#1894677*]) + +* Previously the `configs.imageregistry.operator.openshift.io` status field did not update during the Operator sync, which meant the status field was not presenting the most up to date applied swift configuration. With this fix, the sync process updates the `configs.imageregistry.operator.openshift.io` status to the spec values. The spec and status fields are in sync with the status field, presenting the applied configuration. link:https://bugzilla.redhat.com/show_bug.cgi?id=1907202[*BZ#1907202*]) + +* Previously a lack of retries on a HTTP/2 protocol caused a related retryable error, which in turn caused mirroring to be cancelled with an error message. This fix added a retry when the error message corresponds to the HTTP/2 protocol related error. Now, for these errors, the mirror operation is cancelled after attempting multiple times. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1907421[*BZ#1907421*]) + +* Previously the absence of explicit user and group IDs on the `node-ca` daemon set confused the interpretation of what user and group were in use in the `node-ca` pods. This fix explicitly provides the `node-ca` daemon set with `runAsUser` and `runAsGroup` configuration. Now, there is a clear definition of user and group when inspecting the `node-ca` DaemonSet YAML file. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1914407[*BZ#1914407*]) + +*ImageStreams* + +* Previously, the image pruner did not account for images that were used by `StatefulSet`, `Job`, and `Cronjob` objects when it gathered lists of images that were in use. As a result, the wrong images could be pruned. The image pruner now accounts for images in use by these objects when it creates image lists. Images that are in use by these objects are no longer pruned. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1880068[*BZ#1880068*]) + +* Previously, newly created image streams were not decorated with `publicDockerImageRepository` values. Watchers did not receive `publicDockerImageRepository` values for new objects. Image streams are now decorated with the correct values. As a result, watchers get image streams with `publicDockerImageRepository` values. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1912590[*BZ#1912590*]) + +*Insights Operator* + +* Previously, due to incorrect error handling, the Operator would end its process ambiguously when a file that it observed changed. Error handling for the Operator is improved. Now, the Operator continues to run and no longer sends an ending process signal when an observed file changes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884221[*BZ#1884221*]) + +* Previously, the Operator did not use the namespace of a resource while archiving reports. As a result, resources that had identical names in different namespaces were overwritten. The Operator now uses report paths in combination with namespaces while archiving data. As a result, all reports are collected for each namespace. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886462[*BZ#1886462*]) *Installer* @@ -1017,11 +1285,47 @@ This bug fix ensures Kubernetes is used as the source of truth. OVN-Kubernetes n * Bare metal provisioning now does not fail if there is a small, up to one hour, clock skew between the control plane and a host being deployed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906448[*BZ#1906448*]) +* When upper case letters were included in the vCenter host name, the {product-title} installation program for VMware vSphere waited a long time for the cluster to complete before finally failing. The installation program now validates that the vCenter host name does not contain upper case letters early in the installation process, avoiding long wait times. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1874248[*BZ#1874248*]) + +* Previously, the internal Terraform backend for the {product-title} installation program did not support large inputs from Terraform core to the Terraform provider, like Amazon Web Services (AWS). When the `bootstrap.ign` file was passed to the AWS provider as a string, the input limit could be exceeded, causing the installation program to fail when creating a bootstrap Ignition S3 bucket. This bug fix modifies the Terraform backend to pass the `bootstrap.ign` as a path on disk, allowing the AWS provider to read the large file by circumventing the input size limit. Now, the installation program succeeds when performing a Calico installation that creates the bootstrap Ignition file larger than the input limits. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1877116[*BZ#1877116*]) + +* Previously, pre-flight installer validation for {rh-openstack-first} was performed on the flavor metadata. This could prevent installations to flavors detected as `baremetal`, which might have the required capacity to complete the installation. This is usually caused by {rh-openstack} administrators not setting the appropriate metadata on their bare metal flavors. Validations are now skipped on flavors detected as `baremetal`, to prevent incorrect failures from being reported. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878900[*BZ#1878900*]) + +* Previously, the installation program did not allow the `Manual` credentials mode for clusters being installed to GCP and Azure. Because of this, users could not install their clusters to GCP or Azure using manual credentials. The installation program can now validate manual credentials provided for GCP and Azure. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884691[*BZ#1884691*]) + +* Previously, the installation program could not verify that a resource group existed before destroying a cluster installed to Azure. This caused the installation program to continuously loop with errors. The installation program now verifies the resource group exists before destroying a cluster, allowing the cluster to be destroyed successfully. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888378[*BZ#1888378*]) + +* Previously, the installation program did not check to ensure AWS accounts had `UnTagResources` permissions when creating a cluster with shared resources. Because of this, when destroying a cluster, the installation program did not have permission to delete tags added to the pre-existing network. This bug fix adds a permission check for `UnTagResources` when creating cluster with shared network resources to make sure the account has proper permissions before finishing the installation process. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888464[*BZ#1888464*]) + +* For the `openshift-install destroy cluster` command to work properly, the cluster objects the installation program initially created must be removed. In some instances, the hosted zone object is already removed, causing the installation program to hang. The installation program now skips the removal of the object if the object has already been removed, allowing the cluster to successfully be destroyed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890228[*BZ#1890228*]) + +* Previously, the control plane ports on {rh-openstack-first} were not assigned the additional user-defined security groups. This caused the additional user-defined security group rules to not be applied properly to control plane nodes. The additional user-defined security groups are now assigned to the control plane ports, allowing the security group rules to correctly apply to the control plane nodes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899853[*BZ#1899853*]) + +* Previously, rules on the default AWS security group that sourced another security group prevented the installation program from deleting that other security group when destroying the cluster. This caused the cluster destroy process to never complete and left AWS resources remaining. The rules from the default security group are now deleted, unblocking the deletion of other security groups. This allows all AWS resources to be deleted from the cluster. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903277[*BZ#1903277*]) + +* A missing guard in {rh-openstack-first} validations could fetch the list of subnets with an empty subnet ID, and cause some non-{rh-openstack} clouds to return unexpected values. The unexpected error code would fail validation and prevent {product-title} from installing on these non-{rh-openstack} clouds. This bug fix adds the missing guard against the empty subnet ID, allowing for proper validations. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906517[*BZ#1906517*]) + +* Previously, the reference load balancer for a user-provisioned infrastructure installation on VMware vSphere was configured for a simple TCP check, and the health checks did not consider the health of the api server. This configuration sometimes led to failed API requests whenever the API server restarted. Now, the health checks now verify API server health against the `/readyz` endpoint, and the reference API load balancer now handles requests during API server restarts gracefully. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1836017[*BZ#1836017*]) + +* Previously, when you pressed CTRL+C while using the installation program, the program was not always interrupted and did not always exit as expected. Now, when you press CTRL+C while using the installation program, the program always interrupts and exits. link:https://bugzilla.redhat.com/show_bug.cgi?id=1855351[*BZ#1855351*]) + +* Previously, if you attempted to delete a cluster in Azure while using invalid credentials, such as when your service principal expired, and did not display the debug logs, it appeared that the cluster was deleted when it was not. In addition to not deleting the cluster, the locally stored cluster metadata was deleted, which made it impossible to remove the cluster by running the `openshift-install destroy cluster` command again. Now, if you attempt to delete a cluster while using invalid Azure credentials, the installation program exits with an error, and you can update your credentials and try again. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1866925[*BZ#1866925*]) + +* Previously, the `install-config.yaml` file for the installer-provisioned infrastructure bare metal installation method incorrectly used the `provisioningHostIP` name instead of the `clusterProvisioningIP` name, which caused a disconnect between documentation and the actual field name used in the YAML file. Now, the `provisioningHostIP` field is deprecated in favor of `clusterProvisioningIP`, which removes the disconnect. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1868748[*BZ#1868748*]) + +* Previously, the installation program did not check for expired certificates in the Ignition configuration files. The expired certificates caused installation to fail without explanation. Now, the installation program checks for expired certificates and prints warning if certificates are expired. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1870728[*BZ#1870728*]) + +*kube-apiserver* + +* Previously, the `preserveUnknownFields` field was set to `true` in `v1beta1` CRDs, and there was no error when `oc explain` did not explain CRD fields. A validation condition was added, and the status of `v1beta` CRDs without the `preserveUnknownFields` field set to `false` will show an error of `spec.preserveUnknownFields: Invalid value: true: must be false`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1848358[*BZ#1848358*]) + +* Previously, the `LocalStorageCapacityIsolation` feature gate was disabled by default in {product-title} on IBM Cloud clusters. When disabled, setting an ephemeral storage request or limit causes the pod to be unschedulable. This fix changed the code so that if the `LocalStorageCapacityIsolation` feature gate is disabled, ephemeral storage requests or limits are ignored and pods can be scheduled as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886294[*BZ#1886294*]) + *Red Hat OpenShift Logging* -* Previously, logs were not sent to managed storage when legacy log forwarding was enabled. This happened because the internal generation of the `logforwarding` configuration improperly made a decision for either `logforwarding` or legacy `logforwarding`. The current release fixes this issue: Logs are sent to managed storage when the logstore is defined in the `clusterlogging` instance. Additionally, logs are sent to legacy `logforwarding` when enabled regardless of whether a managed logstore is enabled or not. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921263[*1921263*]) +* Previously, logs were not sent to managed storage when legacy log forwarding was enabled. This happened because the internal generation of the `logforwarding` configuration improperly made a decision for either `logforwarding` or legacy `logforwarding`. The current release fixes this issue: Logs are sent to managed storage when the logstore is defined in the `clusterlogging` instance. Additionally, logs are sent to legacy `logforwarding` when enabled regardless of whether a managed logstore is enabled or not. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921263[*BZ#1921263*]) -* Previously, the Fluentd collector pod went into a crash loop when the `ClusterLogForwarder` had an incorrectly-configured secret. The current release fixes this issue. Now, the `ClusterLogForwarder` validates the secrets and reports any errors in its status field. As a result, it does not cause the Fluentd collector pod to crash. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888943[*1888943*]) +* Previously, the Fluentd collector pod went into a crash loop when the `ClusterLogForwarder` had an incorrectly-configured secret. The current release fixes this issue. Now, the `ClusterLogForwarder` validates the secrets and reports any errors in its status field. As a result, it does not cause the Fluentd collector pod to crash. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888943[*BZ#1888943*]) * Previously, nodes did not recover from `Pending` status because a software bug did not correctly update their statuses in the Elasticsearch custom resource (CR). The current release fixes this issue, so the nodes can recover when their status is `Pending.` (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887357[*BZ#1887357*]) @@ -1045,67 +1349,377 @@ The current release fixes this issue. Now, when a rollover occurs in the `indexm * Previously, Fluent stopped sending logs even though the logging stack seemed functional. Logs were not shipped to an endpoint for an extended period even when an endpoint came back up. This happened if the max backoff time was too long and the endpoint was down. The current release fixes this issue by lowering the max backoff time, so the logs are shipped sooner. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894634[*BZ#1894634*]) -* Previously, if you deleted the secret, it was not recreated. Even though the certificates were on a disk local to the operator, they weren't rewritten because they hadn't changed. That is, certificates were only written if they changed. The current release fixes this issue. It rewrites the secret if the certificate changes or is not found. Now, if you delete the master certificates, they are replaced. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901869[*BZ#1901869*]) +* Previously, if you deleted the secret, it was not recreated. Even though the certificates were on a disk local to the Operator, they weren't rewritten because they hadn't changed. That is, certificates were only written if they changed. The current release fixes this issue. It rewrites the secret if the certificate changes or is not found. Now, if you delete the master certificates, they are replaced. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901869[*BZ#1901869*]) -* Previously, because of a bug, the software did not find some certificates and regenerated them. This triggered the Elasticsearch operator to perform a rolling upgrade on the Elasticsearch cluster, which sometimes produced mismatched certificates. The current release fixes this issue. Now the operator consistently reads and writes certificates to the same working directory and only regenerates the certificates if needed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905910[*BZ#1905910*]) +* Previously, because of a bug, the software did not find some certificates and regenerated them. This triggered the Elasticsearch Operator to perform a rolling upgrade on the Elasticsearch cluster, which sometimes produced mismatched certificates. The current release fixes this issue. Now, the Operator consistently reads and writes certificates to the same working directory and only regenerates the certificates if needed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905910[*BZ#1905910*]) * Previously, queries to the root endpoint to retrieve the Elasticsearch version received a 403 response. The 403 response broke any services that used this endpoint in prior releases. This error happened because non-administrative users did not have the `monitor` permission required to query the root endpoint and retrieve the Elasticsearch version. Now, non-administrative users can query the root endpoint for the deployed version of Elasticsearch. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906765[*BZ#1906765*]) -*Builds* +* Previously, the Cluster Logging Operator (CLO) would attempt to reconcile the Elasticsearch resource, which depended upon the Red Hat-provided Elastic Custom Resource Definition (CRD). Attempts to list an unknown kind caused the CLO to exit its reconciliation loop. This happened because the CLO tried to reconcile all of its managed resources whether they were defined or not. The current release fixes this issue. The CLO only reconciles types provided by the Elasticsearch Operator if a user defines managed storage. As a result, users can create collector-only deployments of cluster logging by deploying the CLO. +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1891738[*BZ#1891738*]) -* Previously, Dockerfile builds could not change permissions of the `/etc/pki/ca-trust` directory or create files inside it. This issue was caused by fixing link:https://bugzilla.redhat.com/show_bug.cgi?id=1826183[*BZ#1826183*] in version 4.6, which added support for HTTPS proxies with CAs for builds and *always* mounted `/etc/pki/ca-trust`, which prevented builds that included their own CAs or modified the system trust store from working correctly at runtime. The current release fixes this issue by reverting Bug 1826183. Now, builder images that include their own CAs work again. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891759[*BZ#1891759*]) +* Previously, when deploying Fluentd as a stand-alone, a Kibana pod was created even if the value of `replicas` was `0`. This happened because Kibana defaulted to `1` pod even when there were no Elasticsearch nodes. The current release fixes this. Now, a Kibana only defaults to `1` when there are one or more Elasticsearch nodes. +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1901424[*BZ#1901424*]) -* Previously, after upgrading from {product-title} version 4.5 to version 4.6, running `git clone` from a private repository failed because builds did not add proxy information to the Git configuration that was used to pull the source code. As a result, the source code could not be pulled if the cluster used a global proxy and the source was pulled from a private Git repository. Now, Git is configured correctly when the cluster uses a global proxy and the `git clone` command can pull source code from a private Git repository if the cluster uses a global proxy. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1896446[*BZ#1896446*]) +* Previously, in some bulk insertion situations, the Elasticsearch proxy timed out connections between fluentd and Elasticsearch. As a result, fluentd failed to deliver messages and logged a `Server returned nothing (no headers, no data)` error. The current release fixes this issue: It increases the default HTTP read and write timeouts in the Elasticsearch proxy from five seconds to one minute. It also provides command-line options in the Elasticsearch proxy to control HTTP timeouts in the field. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908707[*BZ#1908707*]) -* Previously, the node pull secret feature did not work: node pull secrets were not used if `forcePull: true` was set in the Source and Docker strategy builds. As a result, builds failed to pull images that required the cluster-wide pull secret. Now, node pull secrets are always merged with user-provided pull secrets. As a result, builds can pull images when `forcePull: true` is set, and the source registry requires the cluster-wide pull secret. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1883803[*BZ#1883803*]) +* Previously, the Kibana log level was increased not to suppress instructions to delete indices that failed to migrate, which also caused the display of GET requests at the INFO level that contained the Kibana user's email address and OAuth token. The current release fixes this issue by masking these fields, so the Kibana logs do not display them. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1925081[*BZ#1925081*]) -* Previously, {product-title} builds failed on `git clone` when SCP-style SSH locations were specified because of Golang URL parsing, which does not accommodate Git SCP-styled SSH locations. As a result, {product-title} builds and Source-to-Image (S2I) failed when those types of source URLs were supplied. Now, builds and S2I bypass Golang URL parsing and strip the `ssh://` prefix to accommodate Git SCP-styled SSH locations (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884270[*BZ#1884270*]) +* Previously, the fluentd collector pod went into a crash loop when the ClusterLogForwarder had multiple outputs using the same secret. The current release fixes this issue. Now, multiple outputs can share a secret. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890072[*BZ#1890072*]) + +*Machine Config Operator* + +* Previously, when deploying on {rh-openstack-first} and using an HTTP proxy with a host name, sometimes the installation process can fail to pull container images and report the error message `unable to pull image`. This bug fix corrects how the proxy is set in environment variables and nodes can pull container images from remote registries. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1873556[*BZ#1873556*]) + +* Previously, during an upgrade, the Machine Config Controller (MCC) for the previous release could react to a configuration change from the newer Machine Config Operator (MCO). The MMC then introduced another change that resulted in an unnecessary reboot during the upgrade process. This bug fix prevents the MCC from reacting to a configuration change from a newer MCO and avoids an unnecessary reboot. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879099[*BZ#1879099*]) + +* Previously, the forward plugin for CoreDNS distributed queries randomly to all the configured DNS servers. Name resolution failed intermittently because CoreDNS would query a DNS server that was not functional. This bug fix sets the forward plugin to use the sequential policy so that queries are sent to the first DNS server that is responsive. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882209[*BZ#1882209*]) + +* Previously, the Machine Config Operator was reading enabled systemd target units only from the `multi-user.target.wants` directroy. As a consequence, any unit that does not target the `multi-user.target.wats` directory was changed to target that directory. This fix modified the MCO to use the systemd-preset file to create a preset file in the MCO. As a result, all systemd services are enabled and disabled as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885365[*BZ#1885365*]) + +* Previously, when migrating a cluster to the OVN-Kubernetes default Container Network Interface (CNI), bond options on a pre-configured Linux bond interface. As a consequence, bonds are configured using round-robin instead of the mode specified and the bonds might not function. The ovs-configuration.service (configure-ovs.sh) was modified to copy all of the previous bond options on the Linux bond to `ovs-if-phys0` Network Manager connection. As a result, all bonds should work as originally configured. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899350[*BZ#1899350*]) + +* In {product-title} 4.6, a change was made to use the Budget Fair Queueing (BFQ) Linux I/O scheduler. As a consequence, there was an increased fsync I/O latency in etcd. This fix modified the I/O scheduler to use the mq-deadline scheduler, except for NVMe devices, which are configured to not use an I/O scheduler. For {op-system-first} updates, the BFQ scheduler is still used. As a result, latency times have been reduced to acceptable levels. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899600[*BZ#1899600*]) + +*Web console (Administrator perspective)* + +* Previously, an issue with a dependency resulted in the persistent unmounting and remounting of the *YAML Editor* in the {product-title} web console. As a consequence, the YAML editor jumped to the top of the YAML file every few seconds This fix removed a default parameter value for the dependency. As a result, the *YAML Editor* behaves as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903164[*BZ#1903164*]) + +* Previously, a link in the Operator description in the {product-title} web console was rendered in a sandboxed iframe, which disables javascript within that iframe. As a consequence: when user clicked the link, the sandbox limitations are inherited by the new tab, so JavaScript did not run the linked page. The links were fixed by adding an `allow-popups-to-escape-sandbox` parameter to Operator description iframe sandbox attribute, which opens new tabs outside of the sandbox. As a result, the link from Operator descriptions now open and run normally. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905416[*BZ#1905416*]) + +* Previously, the scale pods function in the {product-title} web console was not using the `scale` subresource, any custom role without the `patch` verb in the deployment config and deployment could not scale the pods in the web console. The fix changed the code so that the scale pods function is now using the `scale` subresource. As a result, users can scale pods in the web console without adding the `patch` verb. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1911307[*BZ#1911307*]) + +* Previously, creating a custom resource in the {product-title} web console where a `fieldDependency` description was applied to a schema property that used a control field with an identical name the `getJSONSchemaPropertySortWeight` helper function would recurse infinitely. As a consequence, the `DynamicForm` component would throw an exception and the web browser could crash. This fix modified the `getJSONSchemaPropertySortWeight` helper function to keep track of the current path and use the entire path to determine dependency relationship instead of just the field names. As a result, the `DynamicForm` component no longer throws an exception under the above condition. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1913969[*BZ#1913969*]) + +* Previously, the `SamplesTBRInaccessibleOnBoot` alert description contained a misspelling of the word "bootstrapped". The alert description is now correct. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1914723[*BZ#1914723*]) + +* Previously, the CPU and Memory `specDescriptor` fields added an empty string in the YAML editor. Now, these fields no longer add an empty string in the YAML editor. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1797766[*BZ#1797766*]) + +* Previously, The `Subscription` and `CSV` objects were both displayed on the *Installed Operators* page during Operator installation. Now, this duplication has been fixed so that the `Subscription` Operator is not displayed on the *Installed Operators* page if a matching `CSV` object already exists. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1854567[*BZ#1854567*]) + +* Previously, empty resource utilization charts were displayed on the *Build details* page when a build was started over an hour prior, but the default was set to display only the last hour. Now, the utilization charts on the *Build details* page shows data for the time that the build ran. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1856351[*BZ#1856351*]) + +* Previously, OpenAPI definitions were only updated on the initial page load. The OpenAPI definitions are now updated on a 5-minute interval and whenever the models are fetched from the API. OpenAPI definitions stay up to date without a page refersh. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1856354[*BZ#1856354*]) + +* In this release, the broken link to the cluster monitoring documentation has been fixed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1856803[*BZ#1856803*]) + +* Previously, the `utm_source` parameter was missing from Red Hat Marketplace URLs. In this release, the `utm_source` parameter was added to Red Hat Marketplace URLs for attribution. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1874901[*BZ#1874901*]) + +* Previously, the project selection drop down could not be closed by using the `Escape` key. The handler for the `Escape` key is now updated and the user can exit and close the project selection drop down. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1874968[*BZ#1874968*]) + +* Previously, the font colors used for Scheduling Status was not in compliance with accessibility. The font and font colors were updated to be accessible. The scheduling disabled node is displayed in a yellow warning icon (exclamation icon). (link:https://bugzilla.redhat.com/show_bug.cgi?id=1875516[*BZ#1875516*]) + +* Previously, the patch paths on some API calls were incorrect. This caused spec descriptor fields to not update resource properties. In this release, the logic for building a patch path from a descriptor was updated. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1876701[*BZ#1876701*]) + +* Previously, the `Unschedulable` status field only appeared when it was set to `True`. In this release, a new UX design was implemented to display status information more clearly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878301[*BZ#1878301*]) + +* Previously, subscriptions with an automatic approval strategy behave as if they have a manual approval strategy if another subscription in the same namespace has a manual approval strategy. In this release, an update was made to notify the user that a subscription with a manual approval strategy causes all subscriptions in the namespace to behave as manual. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882653[*BZ#1882653*]) + +* Previously, a manual install plan can affect more than one Operator. However, the UI did not clearly indicate that is the case when it is true and presents the UI requesting approval. As a result, a user could be approving the install plan for multiple Operators, but the UI did not clearly communicate that. In this release, the UI lists all Operators affected by the manual approval plan and it clearly indicates which Operators will be installed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882660[*BZ#1882660*]) + +* Previously, creating a duplicate namepsace from the create namespace modal would result in a rejection error. In this release, we added an error handler for when you create projects and creating duplicate projects will not result in a rejection error. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1883563[*BZ#1883563*]) + +* Previously, the Prometheus swagger definition contained a `$ref` property that could not be resolved, so a runtime error occurred on the Prometheus operand creation form. Now, the `definitions` property is added to the schema that was returned by the `definitionFor` helper function, so the `$ref` resolves and no runtime error occurs.(link:https://bugzilla.redhat.com/show_bug.cgi?id=1884613[*BZ#1884613*]) + +* Previously, users had to wait for the needed resources to load in the background before the install status page appears. Now, the install status page was updated so that it immediately appears once the user starts the Operator install. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884664[*BZ#1884664*]) + +* Previously, iOS did not support connecting via secured Websocket with self-signed certificate, so a white screen displayed on the console. Now, the connection falls back to https if the Websocket with self-assigned certificate is not successful, so the console loads properly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885343[*BZ#1885343*]) + +* Previously, system roles are not present when users create a new role binding in the web console. Now, system roles appears in the Role name dropdown, so users can now select a system role when creating a new role binding. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886154[*BZ#1886154*]) + +* Previously, the terminal assumed all pods are Linux pods and did not account for Windows pods, so the terminal would not work with Windows pods as it defaulted to the sh command. Now, the terminal detects the pod type and changes the command as necessary. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886524[*BZ#1886524*]) + +* Previously, new provisioners names did not contain the `kubernetes.io/` prefix, so users could select the RWX and RWO access mode when creating PVC by aws-ebs-csi-driver(gp2-csi) in the web-console. Now, additional provisioners have been added to the AccessMode mapping, so RWX and RWO access modes are not available when creating PVC by aws-ebs-csi-driver(gp2-csi) in the web-console. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887380[*BZ#1887380*]) + +* Previously, the logic for maintaining active Namespace didn't account for deleting the currently active namespace, so a namespace that was recently deleted in the UI could remain set as the currently active Namespace. Now, the active namespace logic has been updated so that, in a current browser session, it defaults to "All namespaces" when a user deletes the currently active namespace. So now when the user deletes the currently active Namespace, the active namespace in that same browser session is automatically updated to "All Namespaces". (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887465[*BZ#1887465*]) + +* Previously, the console vendor's 'runc' module in v0.1.1 contained a potential security issue, so frog xray flags the 'runc' dependency as a potential vulnerability. Now, the 'runc' module is pinned to the v1.0.0-rc8 version, which contains the fix, so the 'runc' dependency is no longer flagged as a potential vulnerability. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887864[*BZ#1887864*]) + +* Previously, the CSV and PackageManifests listed every provided API version instead of just the latest version, so the CSV and PackageManifest pages could show duplicate APIs. +Now, an update to the logic for retrieving APIs so that only the latest version of each provided API is displayed for each. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888150[*BZ#1888150*]) + +* Previously, the Install Operand Form description was missing the 'SynchMarkdownView' component, so it is not formatted with markdown. Now, the Install Operand Form is formatted with markdown, so the Install Operand Form description is properly formatted.(link:https://bugzilla.redhat.com/show_bug.cgi?id=1888036[*BZ#1888036*]) + +* Previously, the `fieldDependency specDescriptor` was not designed or tested with non-sibling dependencies. Consequently, non-sibling dependencies were not guaranteed to behave as expected. This update revises the logic to ensure that non-sibling dependencies behave as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890180[*BZ#1890180*]) + +* Previously, an exception was thrown if a local `ensureKind` function did not properly handle null `data` argument. This update adds null coalescence when using `data` argument to ensure that no exceptions are thrown, which allows graceful handling of null `data` arguments. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892198[*BZ#1892198*]) +* Previously, TLS secrets were not editable in the console. This update adds a `type` field so that TLS secrets can be updated in the console. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893351[*BZ#1893351*]) + +* This update fixes an issue where the web console displayed incorrect filesystem capacity and usage data. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893601[*BZ#1893601*]) + +* Previously, the web console was incorrectly granting permissions to the wrong service account, the Prometheus Operator, for scraping metrics for Operator Lifecycle Manager (OLM) Operators. The console now correctly grants permissions to the prometheus-k8s service account, allowing metrics to be scraped. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893724[*BZ#1893724*]) + +* Previously, the console pod's `TopologyKey` was set to `kubernetes.io/hostname`, which created availability problems during updates and zone outages. This update sets the `TopologyKey` to `topology.kubernetes.io/zone`, which improves availability during updates and zone outages. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894216[*BZ#1894216*]) + +* Previously, an OperatorGroup with a missing `status` block in any namespace could cause a runtime error in the web console when installing a new Operator from OperatorHub. The problem has been resolved. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1895372[*BZ#1895372*]) + +* Previously, the console filtered out Custom Resource Definitions (CRDs) from the Provided APIs list if the model for the CRD did not exist. Consequently, the Details tab did not present Provided API cards upon initial install, which gave the impression that the Operator offered no APIs. This update removes the filter from the API cards so that they appear even if the model has yet to exist. As a result, the Provided API cards and their corresponding tabs always match, and the UI will no longer present an empty state if the models are not yet available. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1897354[*BZ#1897354*]) + +* In some cases, the lodash `startCase` function was being applied to the operand form descriptor field. Consequently, the field label would be formatted as Start Case, which would override the `displayName` property of the descriptor. This update applies `startCase` only when a descriptor `displayName` is not provided, which properly shows `displayName` on the operand form. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1898532[*BZ#1898532*]) + +* Previously, the `react-jsonschema-form` did not properly handle array type schemas that were explicitly set to null. If the form data passed to the DynamicForm component contained an array type property set to null, a runtime exception would occur. This update adds a null check in the array fields, ensuring that exceptions are no longer thrown in this scenario. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901531[*BZ#1901531*]) + +*Monitoring* + +* Previously, the `prometheus-adapter` did not implement an OpenAPI spec. As a result, the API server logged a message every 60 seconds that the OpenAPI did not exist while the Prometheus Adapter was deployed into the cluster. Additionally, the `KubeAPIErrorsHigh` alert might have fired due to the errors in the logs. This fix introduces the OpenAPI spec into `prometheus-adapter`, which complies with other core API resources within {product-title}. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1819053[*BZ#1819053*]) + +* Previously, certain scenarios that elevated security context constraints (SCCs) caused Prometheus stateful set deployments to fail. Now, the `nonroot` SCC is used for stateful set deployments for monitoring. This fix requires the following configuration of Kubernetes security context settings for all monitoring stateful set deployments, which are Alertmanager, Prometheus, and Thanos Ruler: ++ +[source,yaml] +---- +securityContext: + fsGroup: 65534 <1> + runAsNonRoot: true + runAsUser: 65534 <2> +---- +<1> The filesystem group ID is set to the `nobody` user, ID `65534`. Kubelet recursively sets the group ID at pod startup. See the link:https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods[Kubernetes documentation] for more information on configuring volume permission and ownership change policy for pods. +<2> All stateful set monitoring deployments run as the `nobody` user, ID `65534`. ++ +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1868976[*BZ#1868976*]) + +* Previously, CPU steal time, which is the time that a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor, impacted the metrics that reported CPU consumption. As a result, CPU usage could be reported as higher than the CPU count on a node. Now, the metrics that report CPU consumption do not take into account CPU steal time, and thus reported CPU usage accurately reflects the actual CPU usage. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878766[*BZ#1878766*]) + +* Previously, authenticated requests without elevated permissions could access the `/api/v1/query` and `/api/v1/query_range` endpoints of Prometheus in user-defined projects. Thus, users with access to the token for a regular service account could read metrics from any monitored target. Now, `kube-rbac-proxy` is configured to allow requests to only the `/metrics` endpoint. Authenticated requests without cluster-wide permissions for the `/metrics` endpoint receive an HTTP 404 status code in response to a query to the `api/v1/query` and `/api/v1/query_range` endpoints. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1913386[*BZ#1913386*]) + +*Networking* + +* The code in `ovn-kube` that detects the default gateway was not taking into consideration multipath environments. As a result, Kubernetes nodes failed to start because they could not find the default gateway. The logic has been modified to consider the first available gateway if multipath is present. OVN-Kubernetes now works in environments with multipath and multiple default gateways. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1914250[*BZ#1914250*]) + +* When deploying a cluster in dual stack mode OVN-Kubernetes was using the wrong source of truth. ++ +The OVN-Kubernetes master node performs an initial synchronization to keep OVN and Kubernetes system databases in sync. This issue resulted in race conditions on OVN-Kubernetes startup leading to some of the Kubernetes services becoming unreachable. Bootstrap logic deleted these services as they were considered orphans. ++ +This bug fix ensures Kubernetes is used as the source of truth. OVN-Kubernetes now starts correctly and keeps both OVN and Kubernetes in sync on startup. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1915295[*BZ#1915295*]) + +* When creating an additional network by specifying the `additionalNetworks` stanza in the Cluster Network Operator (CNO) configuration object, the CNO manages the lifecycle for the NetworkAttachmentDefinition object that is created. However, that object was never deleted if the CNO configuration was updated to exclude the additional network from the `additionalNetworks` stanza. In this release, the CNO now deletes all objects related to the additional network. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1755586[*BZ#1755586*]) + +* For the OVN-Kubernetes cluster network provider, if an egress IP address was configured and one of the nodes hosting the egress IP address became unreachable, any egress IP addresses assigned to the unreachable node were never reassigned to other nodes. In this release, if a node hosting an egress IP address is found to be unreachable, the egress IP address is assigned to another node. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1877273[*BZ#1877273*]) + +* For the OVN-Kubernetes cluster network provider, the route priority of the `br-ex` bridge could be superseded by the default route for a secondary network interface added after installing the cluster. When the default route for the secondary device supersedes the br-ex bridge on a node, the cluster network no longer functions. In this release, the default route for `br-ex` bridge cannot be superseded. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1880259[*BZ#1880259*]) + +* For clusters using the OVN-Kubernetes cluster network provider, when adding a {op-system-base-full} 7 worker node to the cluster, the new worker node was unable to connect to the cluster network. In this release, you can now add {op-system-base} worker nodes successfully. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882667[*BZ#1882667*]) + +* For clusters using the OVN-Kubernetes cluster network provider, it was not possible to use a VLAN or bonded network device as the default gateway on a node. In this release, OVN-Kubernetes now works with these network devices. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884628[*BZ#1884628*]) + +* For clusters using the Kuryr cluster network provider, unnecessary Neutron ports were created for pods using on the host network. In this release, Neutron ports are no longer created for host network pods. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886871[*BZ#1886871*]) + +* For clusters using the OVN-Kubernetes cluster network provider, the `br-ex` bridge did not support the attachment of other interfaces, such as `veth` pairs, and any interface added to the bridge did not function correctly. In this release, new interfaces can be attached to the `br-ex` interface and function correctly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887456[*BZ#1887456*]) + +* For clusters using the OVN-Kubernetes cluster network provider, if an ExternalIP address was configured, any node in the cluster not configured to use that IP address did not route traffic sent to the externalIP correctly. Now, every node in the cluster is configured with the necessary routes for an ExternalIP. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890270[*BZ#1890270*]) + +* For clusters using the OpenShift SDN cluster network provider, the order in which a namespace and a network namespace were deleted mattered. If the NetNamespace object associated with a Namespace object were deleted first, it was not possible to recreate that network namespace again. In this release, a namespace and its associated network namespace may be deleted in any order. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892376[*BZ#1892376*]) + +* For clusters using the OpenShift SDN cluster network provider, previously the network provider logged the following message: `unable to allocate netid 1`. Because this message is harmless for any NETID less than `10`, in this release OpenShift SDN no longer emits the message for any NETID less than `10`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1897073[*BZ#1897073*]) + +* If the cluster is using the OVN-Kubernetes cluster network provider, all inbound ICMPv6 was erroneously sent to both the node and OVN. In this release, only ICMPv6 Neighbor Advertisements and Route Advertisements are sent to both the host and OVN. As a result, a ping sent to a node in the cluster no longer results in duplicate responses. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1897641[*BZ#1897641*]) + +* Previously, in a cluster with a large number of nodes, excessive multicast DNS (mDNS) traffic was generated. As a result network switches might overflow. This release limits mDNS queries to once per second. +* Previously, creating an additional network attachment that used IPv6, the Whereabouts CNI plug-in, and specified excluded subnet ranges would ignore the excluded subnet ranges. This bug fix corrects the plug-in so that subnet ranges can be excluded. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1900835[*BZ#1900835*]) + +* Previously, under certain circumstances, pods did not terminate due to an error condition with Multus. Multus includes the message `failed to destroy network for pod sandbox` in logs when the problem occurs. This bug fix makes Multus tolerate a deleted cache file and pods can terminate. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1900835[*BZ#1900835*]) + +* Previously, when using the Kuryr SDN network provider with network policies, any update to a network policy caused the Kuryr controller to recreate security group rules. This bug fix corrects how the Kuryr controller compares security group rules after a network policy is updated. Rules are preserved when possible and added or removed when necessary. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1904131[*BZ#1904131*]) + +* Previously, when using the OpenShift SDN network provider with network policies, it was possible for for pods to experience network connectivity problems even in namespaces that do not use network policies. This bug fix ensures that the underlying Open vSwitch (OVS) flows that implement the network policy are valid. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1914284[*BZ#1914284*]) + +* Previously, when using the OVN-Kubernetes network provider and using multiple pods to serve as external gateways, scaling down the pods prevented other pods in the namespace from routing traffic to the remaining external gateways. Instead, traffic was routed to the default gateway of the node. This bug fix enables the pods to continue routing traffic to the remaining external gateways. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1917605[*BZ#1917605*]) + +*Node* + +* Previously clusters under load can timeout if pod or container creation requests take too long. The kubelet attempts to re-request that resource even though CRI-O is still working on creating that resource, causing the requests to fail with the _name is reserved_ error. After CRI-O finishes the original request, it notices the request timed out, and cleans up the failed pod/container, starting the process over. As a consequence, pod and container creation can stall and multiple _name is reserved_ errors are reported by the kubelet. This also causes an already overloaded node to be further overloaded. This fix modified CRI-O to save the progress of any pod or container creation that timeout due to system load. CRI-O also stalls new requests from the kubelet so there are fewer _name is reserved_ errors. As a result, when clusters are under load, CRI-O slows the kubelet and reduces the load on the cluster. The overall load on the node is reduced and Kubelet and CRI-O should reconcile more quickly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1785399[*BZ#1785399*]) + +* Previously, deep directories in volumes cause long SELinux relabeling times. As a consequence, container creation requests can timeout, and the kubelet attempts to re-request that resource, causing the _error reserving ctr name_ or _Kubelet may be retrying requests that are timing out in CRI-O due to system load_ error. This fix modified CRI-O to save the progress of any pod or container creation that timeout due to system load. As a result, the container request are fulfilled in a timely manner. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1806000[*BZ#1806000*]) + +* Previously, CRI-O used only IPv4 iptables for managing the host port mapping. As a consequence: The host port does not work for IPv6. This fix modified CRI-O to support IPv6 host ports. As a result, host ports function with IPv6 as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1872128[*BZ#1872128*]) + +* Previously, HTTP/2 transports did not have the correct options attached to the connections that provide timeout logic, which caused VMWare network interfaces (and other scenarios) to blip for a few seconds causing connections to fail silently. As a consequence, connections lingered, which caused other related failures, such as nodes not being detected as down, API calls using stale connections and failing, and so forth. This fix added proper timeouts. As a result, HTTP/2 connections within the system are more reliable, and side-effects are mitigated. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1873114[*BZ#1873114*]) + +* Previously, the Topology Manager end-to-end test (`openshift-tests run-test`) required the Machine Config Daemon (MCD) to be running on each worker node, which is the case for nodes deployed on {op-system-first} nodes but not for nodes deployed on {op-system-base-full}. As a consequence, the Topology Manager end-to-end test incorrectly failed with a false-negative when running against clusters deployed on {op-system-base}. This fix modified the test to skip any nodes where it does not detect an MCD. As a result, the false-negative failures are no longer reported. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887509[*BZ#1887509*]) + +* Previously, the Kubelet did not handle transitions properly when statuses were missing. As a consequence some terminated pods did not get restarted. This fix fix added a `failed` container status to allow the container to be restarted as needed. As a result, kubelet pod handling does not result in an invalid state transition. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888041[*BZ#1888041*]) + +* Previously, machine metrics from *cAdvisor* were missing in Kubernetes 1.19 and later. This fix modified the code to properly collect the *CAdvisor* machine metrics. As a result, the machine meterics are present. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1913096[*BZ#1913096*]) + +* Previously, the Horizontal Pod Autoscaler (HPA) ignored pods with incomplete metrics, such as pods that have init containers. As a consequence, any pod with an init container would not be scaled. This fix makes the Prometheus Adapter send complete metrics for init containers. As a result, HPA can scale pods with init containers. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1867477[*BZ#1867477*]) + +* Previously, the Vertical Pod Autoscaler (VPA) did not have access to monitor deployment configs. As a consequence, the VPA was unable to scale deployment config workloads. This fix added the appropriate permissions to the VPA to monitor deployment configs. As a result, the VPA can scale deployment config workloads. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885213[*BZ#1885213*]) *Node Tuning Operator* -* When an invalid Tuned profile is created, the `openshift-tuned` supervisor process may ignore future profile updates and fail to apply the updated profile. This bug fix keeps state information about Tuned profile application success or failure. Now `openshift-tuned` recovers from profile application failures on receiving new valid profiles. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1919970[*BZ#1919970*]) +* When an invalid Tuned profile is created, the `openshift-tuned` supervisor process may ignore future profile updates and fail to apply the updated profile. This bug fix keeps state information about Tuned profile application success or failure. Now, `openshift-tuned` recovers from profile application failures on receiving new valid profiles. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1919970[*BZ#1919970*]) -*Web console (Developer perspective)* +*oauth-proxy* -* Previously, the user was denied access to pull images from other projects, due to insufficient user permissions. This bug fix removes all the user interface checks for role bindings and shows the `oc` command alert to help users use the command line. With this bug fix, the user is no longer blocked from creating images from different namespaces and is now able to deploy images from their other projects. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894020[*BZ#1894020*]) +* Previously, there was legacy logging of a failed authentication check. Requests to services behind the oauth-proxy could cause a line written to the proxy log, which would cause log flood. This fix removed the uninformative log line from the proxy. Now, the proxy no longer experiences log spam. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879878[*BZ#1879878*]) -* The console used a prior version of the `KafkaSource` object that used the `resources` and `service account` fields in their specification. The latest v1beta1 version of the `KafkaSource` object removed these fields, due to which the user was unable to create the `KafkaSource` object with the v1beta1 version. This issue has been fixed now and the user is able to create the `KafkaSource` object with the v1beta1 version. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1892653[*BZ#1892653*]) +* Previously, invalid option handling caused a nil dereference when incorrect option combinations were specified with the `oauth-proxy` command. This resulted in a segmentation fault stack trace being output at the end of the usage message. The option handling is now improved and nil dereferences do not occur when incorrect option combinations are specified. The usage message is output without a stack track when incorrect options are now specified. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884565[*BZ#1884565*]) -* Previously, when you created an application using source code from Git repositories with the `.git` suffix, and then clicked the edit source code link, a "page not found" error was displayed. This fix removes the `.git` suffix from the repository URL and transforms the SSH URL to an HTTPS URL. The generated link now leads to the correct repository page. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1896296[*BZ#1896296*]) +*oc* -* Previously, the underlying `SinkBinding` resources were shown in the *Topology* view, along with the actual source created in the case of `Container Source` and `KameletBinding` resources, confusing users. This issue was fixed. Now, only the actual resource created for the event source is displayed in the *Topology* view, and the underlying `SinkBinding` resources, if created, are displayed in the sidebar. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906685[*BZ#1906685*]) +* Previously, changes in logging libraries caused goroutine stack traces to be printed even at a low log level of 2, which made debugging more difficult. The log level for goroutine stack traces was increased, and now they will only be printed at log level 6 and above. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1867518[*BZ#1867518*]) -* Previously, when you installed the Serverless Operator, without creating the eventing custom resource, a channel card was displayed. When you clicked the card, a confusing alert message was displayed. This issue has now been fixed. The channel card, with a proper alert message, is now displayed only if the channel custom resource definition is present. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909092[*BZ#1909092*]) +* Previously, users logging in with the OpenShift CLI (`oc`) to multiple clusters using the same user name had to log in to each cluster every time. The context name has been properly updated so that it is unique even when the user name is the same. Now, after logging in and switching context, it is not necessary to log in again. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1868384[*BZ#1868384*]) -* Previously, when you closed the web terminal connection, all the terminal output from that session, which can be useful to you, disappeared. This issue has been fixed. The terminal output is now retained even after the session is closed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909067[*BZ#1909067*]) +* Previously, when a release was mirrored to disk using `oc adm release mirror`, the manifest file names did not contain the architecture extension, for example `-x86_64`. This did not allow for mirroring multiple architectures to the same repository without having tag name collisions. File names now contain the appropriate architecture extension, which prevents tag name collisions. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1878972[*BZ#1878972*]) -* Technology preview badges were displayed on the Eventing user interface although it had its GA release with {product-title} 4.6. The Technology preview badges are now removed and the changes were back-ported to the {product-title} 4.6.9 version. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894810[*BZ#1894810*]) +* Previously, an image verifier object was not set properly which could cause the OpenShift CLI (`oc`) to fail with a nil pointer exception when verifying images. The image verifier object is now set properly and the OpenShift CLI (`oc`) no longer fails with a nil pointer exception when verifying images. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885170[*BZ#1885170*]) -* Previously, volume mounts for deployments were not preserved if the deployment was edited using the console edit flows. The modified deployment YAML overwrote or removed the volume mounts in the pod template specification. This issue has been fixed. The volume mounts are now preserved even when the deployment is edited using the console edit flows. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1867965[*BZ#1867965*]) +* Previously, the wrong user name was used when verifying image signatures using `oc adm verify-image-signature`, and image signature verification failed. The proper user name is now used when verifying image signatures and image signature verification now works as expected. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890671[*BZ#1890671*]) -* In case of multiple triggers one subscribing to Knative service and another to In Memory Channel as subscriber, the Knative resources were not displayed on the *Topology* view. This issue has been fixed now, so that the Knative data model returns proper data, and the Knative resources are displayed on the *Topology* view. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906683[*BZ#1906683*]) +* Previously, metadata providing version information was not produced during the build process and was not present on Windows binaries of the OpenShift CLI (`oc`). Proper Windows version information is now generated and available on Windows binaries. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891555[*BZ#1891555*]) -* Previously, in a disconnected environment, the Helm charts were not displayed in the *Developer Catalog* due to an invalid configuration while fetching code. This issue has been fixed by ensuring that proxy environment variables are considered and the Helm charts are now displayed on the Developer Catalog. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1918748[*BZ#1918748*]) +* Previously, a missing nil check for a route condition could cause the OpenShift CLI (`oc`) to crash when describing a route. A nil check was added and describing a route now works properly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893645[*BZ#1893645*]) -* While running a Pipeline, the log tab of the `TaskRun` resource displayed the string as `undefined` after the command in the output. This was caused due to some edge cases where some internal string operations printed `undefined` to the log output. This issue has been fixed now, and the pipeline log output does not drop empty lines from the log stream and does not print the string `undefined` any longer. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1915898[*BZ#1915898*]) +* Previously, the OpenShift CLI (`oc`) had a low limit for client throttling, and the requests reaching for API discovery were limited by the client code. The client throttling limit was increased and client-side throttling should now appear less frequently. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899575[*BZ#1899575*]) -* Previously, the *Port* list in the *Add* flow only provided options for exposed ports and did not allow you to specify a custom port. The list has now been replaced by a typeahead select menu, and now it is possible to specify a custom port while creating the application. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1881881[*BZ#1881881*]) +* Previously, support for init containers was lost during changes to the `oc debug` command, and it was not possible to debug init containers. Support for init containers was added to the `oc debug` command, and it is now possible to debug init containers. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909289[*BZ#1909289*]) -* Previously, when conditional tasks failed, the completed pipeline runs showed a permanent pending task for each failed conditional task. This issue has been fixed by disabling the failed conditional tasks and by adding skipped icons to them. This gives a better picture of the state of the pipeline run. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1880389[*BZ#1880389*]) +*OLM* -* Previously, the pod scale up or down buttons were available for a single pod resource, and the page crashed when the user pressed the scale button. This issue has been fixed by not showing the scale up or down buttons for a single pod resource. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1909678[*BZ#1909678*]) +* The Marketplace Operator was written to report that the services it offered were degraded whenever the `marketplace-operator` pod exited gracefully, which would happen during routine cluster upgrades. This caused the pod to report as degraded during normal upgrades, which caused confusion. The Marketplace Operator no longer reports that it is degraded when it exits gracefully and is no longer flagged by the Telemeter client as degraded. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1838352[*BZ#1838352*]) -* Previously, the chart URL for downloading the chart to instantiate a helm release was unreachable. This happened because the `index.yaml` file from the remote repository, referenced in the Helm chart repository, was fetched and used as is. Some of these index files contained relative chart URLs. This issue has now been fixed by translating relative chart URLs to absolute URLs, which makes the chart URL reachable. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1912907[*BZ#1912907*]) +* Previously during an Operator upgrade, Operator Lifecycle Manager (OLM) deleted existing cluster service versions (CSVs) before the upgrade was completed. This caused the new CSV to be stuck in a "Pending" state. This bug fix updates OLM to check the ownership of the service account to ensure the new service account is created for the new CSV. As a result, existing CSVs are no longer deleted until the new CSV reaches the "Succeeded" state correctly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1857877[*BZ#1857877*]) + +* Previously, Operator Lifecycle Manager (OLM) would accept a `Subscription` object that specified a channel that did not exist. The subscription would appear to succeed, and no related error message was presented, which caused user confusion. This bug fix updates OLM to cause `Subscription` objects to fail in this scenario. Cluster administrators can review events in the `default` namespace for dependency resolution failure information, for example: ++ +[source,terminal] +----- +$ oc get event -n default +----- ++ +.Example output +[source,terminal] +---- +LAST SEEN TYPE REASON OBJECT MESSAGE +6m22s Warning ResolutionFailed namespace/my-namespace constraints not satisfiable: my-operator is mandatory, my-operator has a dependency without any candidates to satisfy it +---- ++ +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1873030[*BZ#1873030*]) + +* Previously, support for admission webhook configurations in Operator Lifecycle Manager (OLM) reused the CA certificate generation code used when deploying API servers. The mounting directory used by this code placed the certificate information at the following locations: ++ +-- +** `/apiserver.local.config/certificates/apiserver.crt` +** `/apiserver.local.config/certificates/apiserver.key` +-- ++ +However, admission webhooks built using Kubebuilder or the Operator SDK expect the CA certificates to be mounted in the following locations: ++ +-- +** `/tmp/k8s-webhook-server/serving-certs/tls.cert` +** `/tmp/k8s-webhook-server/serving-certs/tls.key` +-- ++ +This mismatch caused the webhooks to fail to run. This bug fix updates OLM to now mount the webhook CA certificates at the default locations expected by webhooks built with Kubebuilder or the Operator SDK. As a result, webhooks built with Kubebuilder or the Operator SDK can now be deployed by OLM. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879248[*BZ#1879248*]) + +* When deploying an Operator with an API service, conversion webhook, or an admission webhook, Operator Lifecycle Manager (OLM) should retrieve the CA from an existing resource to calculate a CA hash annotation. This annotation influences a deployment hash that OLM relies on to confirm that the deployment is installed correctly. OLM currently does not retrieve the CA from conversion webhooks, resulting in an invalid deployment hash, which causes OLM to attempt to reinstall the cluster service version (CSV). ++ +If a CSV defines a conversion webhook but does not include an API service or an admission webhook, the CSV cycles through the "Pending", "ReadyToInstall", and "Installing" phases indefinitely. This bug fix updates OLM to use the existing conversion webhook to retrieve the value of the CA and correctly calculate the deployment hash. As a result, OLM can now install CSVs that define a conversion webhook without an API service or admission webhook. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1885398[*BZ#1885398*]) + +* In the `opm` command, the `semver-skippatch` mode previously allowed only bundles with later patch versions as valid replacements, ignoring any pre-release versions. Bundles with the same patch versions but later pre-release versions were not accepted as replacements. This bug fix updates the `opm` command to base the `semver-skippatch` check on the semantic versioning as a whole instead of just the patch version. As a result, later pre-release versions are now valid for the `semver-skippatch` mode. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889721[*BZ#1889721*]) + +* Previously, the Marketplace Operator did not clean stale services during a cluster upgrade, and Operator Lifecycle Manager (OLM) accepted the stale service without validating the service. This caused the stale service to direct traffic to a catalog source pod that contained outdated content. This bug fix updates OLM to add spec hash information to the service and check to ensure the service has the correct spec by comparing the hash information. OLM then deletes and recreates the service if it is stale. As a result, the service spec now directs traffic to the correct catalog source pod. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891995[*BZ#1891995*]) + +* After mirroring an Operator to a disconnected registry, the Operator install could fail due to a missing related bundle image. This issue was due to the bundle image not being present in the `index.db` database. This bug fix updates the `opm` command to ensure the bundle image is present in the `related_images` table of the database. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1895367[*BZ#1895367*]) + +* Previously, Operator authors could create cluster service versions (CSVs) that defined webhooks with container ports set outside of the `1` to `65535` range. This prevented the `ValidatingWebhookConfiguration` or `MutatingWebhookConfiguration` objects from being created because of a validation failure; CSVs could be created that never successfully installed. The custom resource definition (CRD) validation for CSVs now includes the proper minimum and maximum values for the `webhookDescription` `ContainerPort` field. This now defaults to `443` if the container port is not defined. CSVs with invalid container ports now fail validation before the CSV is created. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891898[*BZ#1891898*]) + +* Stranded Operator image bundles that were not referenced by any channel entries remained after an `opm index prune` operation. This lead to unexpected index images being mirrored. Stranded image bundles are now removed when an index is pruned and the unexpected images are not included when the Operator catalog is later mirrored. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1904297[*BZ#1904297*]) + +* Previously, Operator updates could result in Operator pods being deployed before a new service account was created. The pod could be deployed by using the existing service account and would fail to start with insufficient permissions. A check has been added to verify that a new service account exists before the cluster service version (CSV) is moved from a `Pending` to `Installing` state. If a new service account does not exist, the CSV remains in a `Pending` state which prevents the deployment from being updated. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905299[*BZ#1905299*]) + +* Previously, when Operator Lifecycle Manager (OLM) copied a `ClusterServiceVersion` (CSV) object to multiple target namespaces, the `.status.lastUpdateTime` field in the copied CSV was set to the current time. If the current time was later than the last update time of the original CSV, a synchronization race condition was triggered where the copied CSV never converged to match the original. This was more likely to occur when many namespaces were present in a cluster. Now, the original `.status.lastUpdateTime` timestamp is preserved in the copied CSVs and the synchronization race condition is not triggered by a difference between the `.status.lastUpdateTime` values. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905599[*BZ#1905599*]) + +* Previously, pod templates defined in the `StrategyDetailsDeployment` objects of a `ClusterServiceVersion` (CSV) object could include pod annotations that do not match those defined in the CSV. Operator Lifecycle Manager (OLM) would fail to install the Operator because the annotations in the CSV are expected to be present on the pods deployed as part of the CSV. The pod template annotations defined in the `StrategyDetailsDeployment` objects are now overwritten by those defined in the CSV. OLM no longer fails to deploy CSVs whose annotations conflict with those defined in the pod template. +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1907381[*BZ#1907381*]) + +* When a default catalog source in the `openshift-marketplace` namespace is disabled through the OperatorHub API, you can create a custom catalog source with the same name as that default. Previously, custom catalog sources with the same name as a default catalog source were deleted by the Marketplace Operator when the marketplace was restarted. An annotation has been added to the default catalog sources that are created by the Marketplace Operator. Now, the Operator only deletes the catalog sources that contain the annotation when the marketplace is restarted. Custom catalog sources created with the same name as the default catalog sources are not deleted. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908431[*BZ#1908431*]) + +* Previously, the `oc adm catalog mirror` command did not generate the proper mappings for Operator index images without namespaces. Additionally, the `--filter-by-os` option filtered the entire manifest list. This resulted in invalid references to the filtered images in the catalog. Index images without namespaces are now mapped correctly and an `--index-filter-by-os` option is added to filter only the index image that is pulled and unpacked. The `oc adm catalog mirror` command now generates valid mappings for index images without namespaces and the `--index-filter-by-os` option creates valid references to the filtered images. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908565[*BZ#1908565*]) + +* Previously, Operators could specify a `skipRange` in the cluster service version (CSV) replacement chain that would cause Operator Lifecycle Manager (OLM) to attempt to update the Operator with itself. This infinite loop would cause an increase in CPU usage. The CSV replacement chain is now updated so that Operators do not become stuck in an infinite loop due to an invalid `skipRange`. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1916021[*BZ#1916021*]) + +* Previously, the `csv.status.LastUpdateTime` time comparison in the cluster service version (CSV) reconciliation loop always returned a `false` result. This caused the Operator Lifecycle Manager (OLM) Operator to continuously update the CSV object and trigger another reconciliation event. The time comparison is now improved and the CSV is no longer updated when there are no status changes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1917537[*BZ#1917537*]) + +* Catalog update pods with polling intervals that were multiples of 15 greater than the default 15 minute resynchronization period would be continuously reconciled by the Catalog Operator. This would continue until the next poll time was reached, causing increased CPU load. The reconciliation requeuing logic is now improved so that the continuous reconciliation and the associated CPU load increases do not occur. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1920526[*BZ#1920526*]) + +* Previously, if no matching Operators were found during an attempt to create an Operator subscription, the constraints listed in the resolution failure event contained internal terminology. The subscription constraint strings did not describe the resolution failure reason from a user perspective. The constraint strings are now more meaningful. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921954[*BZ#1921954*]) + +*openshift-apiserver* + +* Previously, requests targeting the `deploymentconfigs//instantiate` subresource failed with `no kind "DeploymentConfig" is registered for version apps.openshift.io/`. The correct version for the `DeploymentConfig` is now set and these requests no longer fail. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1867380[*BZ#1867380*]) + +*Operator SDK* + +* Previously, all `operator-sdk` subcommands attempted to read the `PROJECT` file, even if `PROJECT` was a directory. As a result, subcommands that did not require the `PROJECT` file failed. Now, subcommands that do not require the `PROJECT` file do not attempt to read it and succeed even if an invalid `PROJECT` file is present. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1873007[*BZ#1873007*]) + +* Previously, running the `operator-sdk cleanup` command did not clean up Operators that were deployed with the `operator-sdk run bundle` command. Instead, an error message was displayed and the Operator was not cleaned up. Now, the `operator-sdk cleanup` command has been updated, and Operators deployed with `run bundle` can be cleaned up by using the `cleanup` command. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1883422[*BZ#1883422*]) *Performance Addon Operator* -* Previously, incorrect wait in the must-gather logic resulted in too early termination of log gathering. This issue resulted in depending on timing the log gathering operation being interrupted prematurely, leading to partial log collection. -This is now fixed by adding the correct wait in the must-gather logic. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906355[*BZ#1906355*]) +* Previously, incorrect wait in the must-gather logic resulted in early termination of log gathering. This issue resulted in, depending on timing, the log gathering operation being interrupted prematurely. This led to a partial log collection. This is now fixed by adding the correct wait in the must-gather logic. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906355[*BZ#1906355*]) * Previously, must-gather collected an unbounded amount of kubelet logs on all nodes. This issue resulted in an excessive amount of data being transferred and collected, with no clear benefit for the user. + This issue is fixed by collecting a bounded amount, the last eight hours, of kubelet logs only on worker nodes and not collecting kubelet logs on the control plane nodes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1918691[*BZ#1918691*]) +* Previously, when the machine config pool was degraded, the performance profile was not updated to display an accurate machine config pool state. Now, the performance profile node selector or machine config pool selector correctly watches the relevant machine config pools, and a degraded machine config pool reflects the correct status. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903820[*BZ#1903820*]) + +*RHCOS* + +* Previously, configuring additional Azure disks during {op-system} installation caused a failure because the `udev` rules for Azure disks were missing from the {op-system} initramfs. The necessary `udev` rules have been added so that configuring additional disks during installation now works properly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1756173[*BZ#1756173*]) + +* Previously, the `rhcos-growpart.service` was being used in a way that was not best practice. Now, the `rhcos-growpart.service` has been removed in favor of configuring disks via Ignition at installation time. To change disk configuration after initial {op-system} installation, you must reprovision your systems with the necessary disk configuration changes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1851103[*BZ#1851103*]) + +* Previously, the Machine Config Operator would attempt to rollback rpm-ostree changes when running `rpm-ostree cleanup -p`, causing a "System transaction in progress" error to occur. This fix improves rpm-ostree code related to D-Bus handling so that the error no longer occurs. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1865839[*BZ#1865839*]) + +* Previously, there was no support in ppc64le or s390x for NVME emulation in KVM in RHEL 8.2, which caused the `kola --basic-qemu-scenarios` using NVME emulation to fail. The tests for NVME emulation on ppc64le and s390x have been disabled so that the tests now succeed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1866445[*BZ#1866445*]) + +* Previously, Ignition could not fetch a remote configuration over the network when the DHCP server took too long to respond to DHCP queries because NetworkManager would stop waiting for a DHCP answer and the network would not be configured in the initramfs. The new version of NetworkManager now understands the `rd.net.timeout.dhcp=xyz` and `rd.net.dhcp.retry=xyz` options when set as kernel parameters to increase the timeout and number of retries, allowing you to set those options to account for delayed DHCP answers. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1877740[*BZ#1877740*]) + +* Previously, an incorrect networking configuration was created because multiple `nameserver=` entries on the kernel command line could create multiple NetworkManager connection profiles. A newer version of NetworkManager in RHCOS now correctly handles multiple `nameserver=` entries so that networking configuration is properly generated when multiple `nameserver=` entries are provided. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1882781[*BZ#1882781*]) + +* Previously, a node process would seg fault due to a recursive call that was overflowing the stack. This logic error has been fixed so that there are no longer seg faults. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1884739[*BZ#1884739*]) + +* Previously, network-related service units were not strictly ordered, which sometimes meant that network configurations copied using `-copy-network` did not take effect on the first reboot into the installed system. The ordering of the relevant service units has been fixed so that they now always take effect on the first reboot. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1895979[*BZ#1895979*]) + +* Previously, when the `coreos-installer` command invoked fdasd to check for a valid DASD label on s390x, udev would reprobe the DASD device, causing the DASD formatting to fail because udev was still accessing the device. Now, after checking for a DASD label, `coreos-installer` waits for udev to finish processing the DASD to ensure that the DASD formatting is successful. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1900699[*BZ#1900699*]) + +* Previously, it could be confusing to query and modify connection settings in NetworkManager when using DHCP because a single NetworkManager connection was created by default that matched all interfaces. The user experience has been improved so that when using DHCP, NetworkManager now creates a separate connection for each interface by default. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901517[*BZ#1901517*]) + +* Previously, failure to properly tear down network interfaces in the initrd before switching to the real root might cause static IP assignment to a VLAN interface to not be successfully activated in the real root. This fix changes how network interfaces are torn down in the initrd so that static IP assignments to VLAN interfaces are successfully activated in the real root. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1902584[*BZ#1902584*]) + +* Previously, if you had configured {op-system} to use dhclient for DHCP operations, you were left with systems that could not properly acquire a DHCP address because the dhclient binary was removed from {op-system} when the switch to using NetworkManager in the initramfs was made. The dhsclient binary is now included in {op-system} so that {op-system} systems can successfully perform DHCP operations using dhclient. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908462[*BZ#1908462*]) + +* Previously, upgraded nodes would not receive uniquely generated initiator names because the service unit that regenerates the iSCSI initiator name only worked on first boot. With this fix, the service unit now runs on every boot so that upgraded nodes receive generated initiator names if one does not already exist. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908830[*BZ#1908830*]) + +* Previously, you could not create ext4 filesystems with Ignition because `mkfs.ext4` failed when `/etc/mke2fs.conf` did not exist. With this fix, `/etc/mke2fs.conf` has been added to the initramfs so that Ignition successfully creates ext4 filesystems. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1916382[*BZ#1916382*]) + +*Routing* + +* Previously, it was possible to set the `haproxy.router.openshift.io/timeout` annotation on a route with a value that exceeded 25 days. Values greater than 25 days caused the ingress controller to crash. This bug fix sets an upper limit of 25 days for the timeout. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1861383[*BZ#1861383*]) + +* Previously, an ingress controller would report a status of Available even if DNS was not provisioned or a required load balancer was not ready. This bug fix adds validation to the Ingress Operator to ensure that DNS is provisioned and the load balancer, if required, is ready before the ingress controller is reported as available. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1870373[*BZ#1870373*]) + +* Previously, it was possible to set the default certificate for an ingress controller to a secret that does not exist, such as by entering a typographical error. This bug fix adds validation to ensure the secret exists before changing the default certificate. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887441[*BZ#1887441*]) + +* Previously, a route with a name that is longer than 63 characters could be created. However, after the route was created, it failed validation. This bug fix adds validation when the route is created. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1896977[*BZ#1896977*]) + +*Storage* + +* Previously, the admission plug-in would add a failover domain and region labels, even when they were not configured properly, causing pods that used statically provisioned persistent volumes (PVs) to fail to start on OpenStack clusters with an empty region in the configuration. With this fix, the tables are now added to the PV only when they contain a valid region and failure domain so that pods using statically provisioned PVs behave the same as dynamically provisioned PVs on OpenStack clusters that have been configured with an empty region or failure domain. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1877681[*BZ#1877681*]) + +* Previously, the `LocalVolumeDiscoveryResult` object was displayed in the web console, implying that these could be manually defined. With this fix, the `LocalVolumeDiscoveryResult` type has been flagged as an internal object and is no longer displayed in the web console. To view local disks, navigate to *Compute -> Nodes -> Select Nodes -> Disks* instead. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886973[*BZ#1886973*]) + +* Previously when creating snapshots that require credentials, force deletion would not work for snapshots if the `VolumeSnapshotClass` CRD was already deleted. Now, instead of relying on the `VolumeSnapshotClass` CRD to exist, the credentials are fetched from the `VolumeSnapshotContent` CRD so that volume snapshots and volume snapshot contents that use credentials can be deleted as long as the secret containing these credentials continues to exist. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893739[*BZ#1893739*]) + +* Previously, the Kubernetes FibreChannel (FC) volume plug-in did not properly flush a multipath device before deleting it, and in rare cases, a filesystem on a multipath FC device was corrupted during pod destruction. Now, Kubernetes flushes data before deleting a FC multipath device to prevent filesystem corruption. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903346[*BZ#1903346*]) + +*Scale* + +* The `nosmt` additional kernel argument that configures hyperthreading was previously undocumented for use with {product-title}. To disable hyperthreading, create a performance profile that is appropriate for your hardware and topology, and then set `nosmt` as an additional kernel argument. ++ +For more information, see xref:../scalability_and_performance/cnf-performance-addon-operator-for-low-latency-nodes.adoc#about_hyperthreading_for_low_latency_and_real_time_applications_cnf-master[About hyperthreading for low latency and real-time applications]. [id="ocp-4-7-technology-preview"] == Technology Preview features @@ -1380,6 +1994,8 @@ The root cause of this issue is unknown and no workaround currently exists. + (link:https://bugzilla.redhat.com/show_bug.cgi?id=1913279[*BZ#1913279*]) +* Previously, a bug in the OpenStack SDK caused a failure when requesting server group `OSP16`. Consequently, the UPI playbook `control-plane.yaml` fails during the task to create the control plane server. As a temporary workaround, you can request a hotfix to update the OpenStack SDK, which updates the OpenStack SDK on the bastion host to execute UPI Ansible tasks to at least `python-openstacksdk-0.36.4-1.20201113235938.el8ost`. With this hotfix, the playbook successfully runs. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891816[*BZ#1891816*]) + * When attempting an IPI installation on bare metal using the latest Dell firmware (04.40.00.00) nodes will not be deployed and an error will show in their status. This is due to Dell Firmware (4.40.00.00) using eHTML5 as the Virtual Console Plug-in. + To work around this issue, change the Virtual Console Plugin to HTML5 and run the deployment again. The nodes should now be successfully deployed. For more information, see xref:../installing/installing_bare_metal_ipi/ipi-install-prerequisites.adoc#ipi-install-firmware-requirements-for-installing-with-virtual-media_ipi-install-prerequisites[Firmware requirements for installing with virtual media]. From 8d4b7db87dd6e8bc01aeae5a65b1eaf347abb218 Mon Sep 17 00:00:00 2001 From: Vikram Goyal Date: Tue, 23 Feb 2021 12:58:38 +1000 Subject: [PATCH 2/2] Update ocp-4-7-release-notes.adoc Updated notes from Lisa. --- release_notes/ocp-4-7-release-notes.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/release_notes/ocp-4-7-release-notes.adoc b/release_notes/ocp-4-7-release-notes.adoc index bfdec82aadfe..38b2d893485c 100644 --- a/release_notes/ocp-4-7-release-notes.adoc +++ b/release_notes/ocp-4-7-release-notes.adoc @@ -1669,7 +1669,7 @@ This issue is fixed by collecting a bounded amount, the last eight hours, of kub * Previously, configuring additional Azure disks during {op-system} installation caused a failure because the `udev` rules for Azure disks were missing from the {op-system} initramfs. The necessary `udev` rules have been added so that configuring additional disks during installation now works properly. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1756173[*BZ#1756173*]) -* Previously, the `rhcos-growpart.service` was being used in a way that was not best practice. Now, the `rhcos-growpart.service` has been removed in favor of configuring disks via Ignition at installation time. To change disk configuration after initial {op-system} installation, you must reprovision your systems with the necessary disk configuration changes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1851103[*BZ#1851103*]) +* Previously, the `rhcos-growpart.service` was being used in a way that was not a best practice. Now, the `rhcos-growpart.service` has been removed in favor of configuring disks via Ignition at installation time. To change disk configuration after initial {op-system} installation, you must reprovision your systems with the necessary disk configuration changes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1851103[*BZ#1851103*]) * Previously, the Machine Config Operator would attempt to rollback rpm-ostree changes when running `rpm-ostree cleanup -p`, causing a "System transaction in progress" error to occur. This fix improves rpm-ostree code related to D-Bus handling so that the error no longer occurs. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1865839[*BZ#1865839*]) @@ -1697,7 +1697,7 @@ This issue is fixed by collecting a bounded amount, the last eight hours, of kub *Routing* -* Previously, it was possible to set the `haproxy.router.openshift.io/timeout` annotation on a route with a value that exceeded 25 days. Values greater than 25 days caused the ingress controller to crash. This bug fix sets an upper limit of 25 days for the timeout. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1861383[*BZ#1861383*]) +* Previously, it was possible to set the `haproxy.router.openshift.io/timeout` annotation on a route with a value that exceeded 25 days. Values greater than 25 days caused the ingress controller to fail. This bug fix sets an upper limit of 25 days for the timeout. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1861383[*BZ#1861383*]) * Previously, an ingress controller would report a status of Available even if DNS was not provisioned or a required load balancer was not ready. This bug fix adds validation to the Ingress Operator to ensure that DNS is provisioned and the load balancer, if required, is ready before the ingress controller is reported as available. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1870373[*BZ#1870373*]) @@ -1711,7 +1711,7 @@ This issue is fixed by collecting a bounded amount, the last eight hours, of kub * Previously, the `LocalVolumeDiscoveryResult` object was displayed in the web console, implying that these could be manually defined. With this fix, the `LocalVolumeDiscoveryResult` type has been flagged as an internal object and is no longer displayed in the web console. To view local disks, navigate to *Compute -> Nodes -> Select Nodes -> Disks* instead. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1886973[*BZ#1886973*]) -* Previously when creating snapshots that require credentials, force deletion would not work for snapshots if the `VolumeSnapshotClass` CRD was already deleted. Now, instead of relying on the `VolumeSnapshotClass` CRD to exist, the credentials are fetched from the `VolumeSnapshotContent` CRD so that volume snapshots and volume snapshot contents that use credentials can be deleted as long as the secret containing these credentials continues to exist. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893739[*BZ#1893739*]) +* Previously when creating snapshots that require credentials, force deletion would not work for snapshots if the `VolumeSnapshotClass` CRD was already deleted. Now, instead of relying on the `VolumeSnapshotClass` CRD to exist, the credentials are fetched from the `VolumeSnapshotContent` CRD so that volume snapshots and volume snapshot contents that use credentials can be deleted provided the secret containing these credentials continues to exist. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893739[*BZ#1893739*]) * Previously, the Kubernetes FibreChannel (FC) volume plug-in did not properly flush a multipath device before deleting it, and in rare cases, a filesystem on a multipath FC device was corrupted during pod destruction. Now, Kubernetes flushes data before deleting a FC multipath device to prevent filesystem corruption. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1903346[*BZ#1903346*]) @@ -1996,7 +1996,7 @@ The root cause of this issue is unknown and no workaround currently exists. * Previously, a bug in the OpenStack SDK caused a failure when requesting server group `OSP16`. Consequently, the UPI playbook `control-plane.yaml` fails during the task to create the control plane server. As a temporary workaround, you can request a hotfix to update the OpenStack SDK, which updates the OpenStack SDK on the bastion host to execute UPI Ansible tasks to at least `python-openstacksdk-0.36.4-1.20201113235938.el8ost`. With this hotfix, the playbook successfully runs. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891816[*BZ#1891816*]) -* When attempting an IPI installation on bare metal using the latest Dell firmware (04.40.00.00) nodes will not be deployed and an error will show in their status. This is due to Dell Firmware (4.40.00.00) using eHTML5 as the Virtual Console Plug-in. +* When attempting an IPI installation on bare metal using the latest Dell firmware (04.40.00.00) nodes will not be deployed and an error is displayed in their status. This is due to Dell Firmware (4.40.00.00) using eHTML5 as the Virtual Console Plug-in. + To work around this issue, change the Virtual Console Plugin to HTML5 and run the deployment again. The nodes should now be successfully deployed. For more information, see xref:../installing/installing_bare_metal_ipi/ipi-install-prerequisites.adoc#ipi-install-firmware-requirements-for-installing-with-virtual-media_ipi-install-prerequisites[Firmware requirements for installing with virtual media]. +