Skip to content

STOR-2751: Rebase to upstream v1.24.1 for OCP 4.22#98

Merged
openshift-merge-bot[bot] merged 208 commits intoopenshift:masterfrom
dfajmon:rebase-v1.23.3
Mar 18, 2026
Merged

STOR-2751: Rebase to upstream v1.24.1 for OCP 4.22#98
openshift-merge-bot[bot] merged 208 commits intoopenshift:masterfrom
dfajmon:rebase-v1.23.3

Conversation

@dfajmon
Copy link
Copy Markdown

@dfajmon dfajmon commented Feb 19, 2026

Issue link
https://issues.redhat.com/browse/STOR-2751

Diff to upstream v1.24.1
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.24.1...dfajmon:rebase-v1.24.1

Notes for reviewers

The rebase was initially based on v1.23.3. A new version was released recently, and because of merge conflicts introduced by cherry-picks, we have decided to rebase to a newer version.

Summary of changes

Major Features

  • feat: add PD node labeler (#2257)
  • Support Downward API node name override (#2277)
  • Retrieve node name from metadata service (#2279)
  • Add disk support label to NodeGetInfo topologies (#2199)
  • Move VAC api from v1beta1 to v1 (#2175)
  • Add support for NVMe disk types by using deviceutils (#2178)
  • Add disk type selection logic for generic volumes (#2235)
  • Replace kubelet volume mount with subdirectories (#2259)
  • Cache devices and their symlinks in node driver, periodically noting changes (#2141)
  • feat: Instrument unexpected device path changes (#2193)
  • [btrfs] support dynamic_reclaim (#2164)
  • [btrfs] add btrfs-specific bdi/read_ahead_kb (#2156)

Notable Bug Fixes

  • Fix partial cache tail latency by correcting the cache chunk size calculation (#2176)
  • Check for nil waitOp before examining error code (#2238)
  • Manually create udev links if udev trigger doesn't work (#2216)
  • [btrfs csi driver] fix blkid argument (#2248)
  • pkg/common/utils.go: update regionalLocationFmt for Europe with 10+ regions (#2117)

Cherry-picked commits

Commit Subject Author
db5f1ba7 UPSTREAM: <carry>: Add OpenShift files @jsafrane
19f4f8fc UPSTREAM: 2279: Retrieve node name from metadata service @noamasu

Upstream changelogs

Full changelog

kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.21.6...v1.24.1

Last rebase

#85

@openshift/storage

Summary by CodeRabbit

  • New Features

    • Added Dynamic Volumes support for flexible disk provisioning via StorageClass parameters
    • Added machine and disk type compatibility labels for node-aware scheduling
    • Extended Btrfs mount option support including dynamic reclaim and read-ahead settings
  • Documentation

    • Added Dynamic Volumes user guide with configuration examples
    • Enhanced security documentation for scoped service account impersonation
    • Expanded Btrfs mount options reference with sysfs settings
  • Improvements

    • Updated Go toolchain to 1.26.0
    • Enhanced service account role configuration flexibility

mattcary and others added 30 commits June 4, 2025 01:25
Change-Id: Id9befb7508b785ed1fe96efff48c5fa84a6f1d1a
…egions

GCP now offers more than 10 regions in Europe. Update the regex accordingly.
Added filtering for incorrect slo boosting logs
Bumps golang from 1.24.6 to 1.25.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-version: 1.25.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates in the / directory: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.22.2 to 2.23.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.22.2...v2.23.3)

Updates `github.com/onsi/gomega` from 1.36.2 to 1.36.3
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.36.2...v1.36.3)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: onsi
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: onsi
...

Signed-off-by: dependabot[bot] <support@github.com>
…ot/go_modules/onsi-f75b1e49c1

Bump the onsi group across 1 directory with 2 updates
…ot/docker/golang-1.25.0

Bump golang from 1.24.6 to 1.25.0
Bumps the k8s-dependencies group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.33.3` | `0.33.4` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.33.3` | `0.33.4` |
| [k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider) | `0.33.3` | `0.33.4` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.33.3` | `0.33.4` |
| [k8s.io/mount-utils](https://github.com/kubernetes/mount-utils) | `0.33.3` | `0.33.4` |


Updates `k8s.io/api` from 0.33.3 to 0.33.4
- [Commits](kubernetes/api@v0.33.3...v0.33.4)

Updates `k8s.io/apimachinery` from 0.33.3 to 0.33.4
- [Commits](kubernetes/apimachinery@v0.33.3...v0.33.4)

Updates `k8s.io/cloud-provider` from 0.33.3 to 0.33.4
- [Commits](kubernetes/cloud-provider@v0.33.3...v0.33.4)

Updates `k8s.io/component-base` from 0.33.3 to 0.33.4
- [Commits](kubernetes/component-base@v0.33.3...v0.33.4)

Updates `k8s.io/mount-utils` from 0.33.3 to 0.33.4
- [Commits](kubernetes/mount-utils@v0.33.3...v0.33.4)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s-dependencies
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s-dependencies
- dependency-name: k8s.io/cloud-provider
  dependency-version: 0.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s-dependencies
- dependency-name: k8s.io/component-base
  dependency-version: 0.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s-dependencies
- dependency-name: k8s.io/mount-utils
  dependency-version: 0.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the golang-x group with 2 updates: [golang.org/x/net](https://github.com/golang/net) and [golang.org/x/tools](https://github.com/golang/tools).


Updates `golang.org/x/net` from 0.42.0 to 0.43.0
- [Commits](golang/net@v0.42.0...v0.43.0)

Updates `golang.org/x/tools` from 0.35.0 to 0.36.0
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](golang/tools@v0.35.0...v0.36.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.43.0
  dependency-type: indirect
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/tools
  dependency-version: 0.36.0
  dependency-type: indirect
  update-type: version-update:semver-minor
  dependency-group: golang-x
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.23.4 to 2.25.1
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.23.4...v2.25.1)

Updates `github.com/onsi/gomega` from 1.36.3 to 1.37.0
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.36.3...v1.37.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.25.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: onsi
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.37.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: onsi
...

Signed-off-by: dependabot[bot] <support@github.com>
…ot/go_modules/k8s-dependencies-8e611a1eb0

Bump the k8s-dependencies group with 5 updates
…ot/go_modules/golang-x-bc3441d089

Bump the golang-x group with 2 updates
…ot/go_modules/onsi-26932303e1

Bump the onsi group with 2 updates
Add a knob for the btrfs-specific `bdi/read_ahead_kb`. If my
understanding is correct, this value is the upper limit of the btrfs
read-ahead of the _underlying file_.

Confusingly, this option is not related to
`/sys/block/sdX/queue/read_ahead_kb`, which is what controls the
read-ahead of the block device (regardless of what's in there). The
current `read_ahead_kb` "special" mount option configures the value of
the block device.

The default `bdi/read_ahead_kb` is 4MiB (on all the systems I've
checked). Ext4 does not have this option and relies on the block-device
specific readahead, which is 128KiB (again, on all the systems I've
checked). After migrating to btrfs we have experienced a notable write
amplification and tracked to this setting. Once we changed
`bdi/read_ahead_kb` to 128, our IO utilization (and the properties of
the underlying workload) became very similar to the one of ext4.

Now that we have 3 btrfs-specific "special" mount options, I refactored
the code and tests to make it less repetitive.
…d_kb

[btrfs] add btrfs-specific `bdi/read_ahead_kb`
Currently btrfs offers 3 methods to balance the filesystem:

1. beginning of time: `btrfs filesystem balance`.
2. kernel v5.19+: `/sys/fs/btrfs/UUID/allocation/{meta,}data/bg_reclaim_threshold`.
3. kernel v6.11+: `/sys/fs/btrfs/UUID/allocation/{meta,}data/dynamic_reclaim`.

(3) is most robust out of the 3, and best documented in the kernel commit
[f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a).
Once GKE rolls up to the kernel v6.11, this will be the recommended
option to set.
Bumps golang from 1.25.0 to 1.25.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-version: 1.25.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the golang-x group with 6 updates:

| Package | From | To |
| --- | --- | --- |
| [golang.org/x/oauth2](https://github.com/golang/oauth2) | `0.30.0` | `0.31.0` |
| [golang.org/x/sys](https://github.com/golang/sys) | `0.35.0` | `0.36.0` |
| [golang.org/x/time](https://github.com/golang/time) | `0.12.0` | `0.13.0` |
| [golang.org/x/sync](https://github.com/golang/sync) | `0.16.0` | `0.17.0` |
| [golang.org/x/term](https://github.com/golang/term) | `0.34.0` | `0.35.0` |
| [golang.org/x/text](https://github.com/golang/text) | `0.28.0` | `0.29.0` |


Updates `golang.org/x/oauth2` from 0.30.0 to 0.31.0
- [Commits](golang/oauth2@v0.30.0...v0.31.0)

Updates `golang.org/x/sys` from 0.35.0 to 0.36.0
- [Commits](golang/sys@v0.35.0...v0.36.0)

Updates `golang.org/x/time` from 0.12.0 to 0.13.0
- [Commits](golang/time@v0.12.0...v0.13.0)

Updates `golang.org/x/sync` from 0.16.0 to 0.17.0
- [Commits](golang/sync@v0.16.0...v0.17.0)

Updates `golang.org/x/term` from 0.34.0 to 0.35.0
- [Commits](golang/term@v0.34.0...v0.35.0)

Updates `golang.org/x/text` from 0.28.0 to 0.29.0
- [Release notes](https://github.com/golang/text/releases)
- [Commits](golang/text@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/oauth2
  dependency-version: 0.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/sys
  dependency-version: 0.36.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/time
  dependency-version: 0.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/sync
  dependency-version: 0.17.0
  dependency-type: indirect
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/term
  dependency-version: 0.35.0
  dependency-type: indirect
  update-type: version-update:semver-minor
  dependency-group: golang-x
- dependency-name: golang.org/x/text
  dependency-version: 0.29.0
  dependency-type: indirect
  update-type: version-update:semver-minor
  dependency-group: golang-x
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega).


Updates `github.com/onsi/ginkgo/v2` from 2.25.1 to 2.25.3
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](onsi/ginkgo@v2.25.1...v2.25.3)

Updates `github.com/onsi/gomega` from 1.37.0 to 1.38.2
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](onsi/gomega@v1.37.0...v1.38.2)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.25.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: onsi
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: onsi
...

Signed-off-by: dependabot[bot] <support@github.com>
…ot/docker/golang-1.25.1

Bump golang from 1.25.0 to 1.25.1
…ot/go_modules/golang-x-cdf31bdd0d

Bump the golang-x group with 6 updates
…ot/go_modules/onsi-8f1ed76d56

Bump the onsi group with 2 updates
@chao007
Copy link
Copy Markdown

chao007 commented Mar 6, 2026

/verified by CI

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2026
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@chao007: This PR has been marked as verified by CI.

Details

In response to this:

/verified by CI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 11, 2026
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 11, 2026

@dfajmon: This pull request references STOR-2751 which is a valid jira issue.

Details

In response to this:

Issue link
https://issues.redhat.com/browse/STOR-2751

Diff to upstream v1.24.1
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.24.1...dfajmon:rebase-v1.24.1

Notes for reviewers

Summary of changes

Major Features

  • feat: add PD node labeler (#2257)
  • Support Downward API node name override (#2277)
  • Retrieve node name from metadata service (#2279)
  • Add disk support label to NodeGetInfo topologies (#2199)
  • Move VAC api from v1beta1 to v1 (#2175)
  • Add support for NVMe disk types by using deviceutils (#2178)
  • Add disk type selection logic for generic volumes (#2235)
  • Replace kubelet volume mount with subdirectories (#2259)
  • Cache devices and their symlinks in node driver, periodically noting changes (#2141)
  • feat: Instrument unexpected device path changes (#2193)
  • [btrfs] support dynamic_reclaim (#2164)
  • [btrfs] add btrfs-specific bdi/read_ahead_kb (#2156)

Notable Bug Fixes

  • Fix partial cache tail latency by correcting the cache chunk size calculation (#2176)
  • Check for nil waitOp before examining error code (#2238)
  • Manually create udev links if udev trigger doesn't work (#2216)
  • [btrfs csi driver] fix blkid argument (#2248)
  • pkg/common/utils.go: update regionalLocationFmt for Europe with 10+ regions (#2117)

Cherry-picked commits

Commit Subject Author
db5f1ba7 UPSTREAM: <carry>: Add OpenShift files @jsafrane
19f4f8fc UPSTREAM: 2279: Retrieve node name from metadata service @noamasu

Upstream changelogs

Full changelog

kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.21.6...v1.24.1

Last rebase

#85

@openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 11, 2026

Walkthrough

This pull request introduces a comprehensive refactoring to reorganize code packages and add support for dynamic volumes and machine-type-based disk labeling. Changes include moving common utilities to new constants and parameters packages, adding a convert package for type conversions, creating a node labeler component for machine-type disk compatibility, updating Go toolchain to 1.26.0, and documenting historical releases in versioned changelog files.

Changes

Cohort / File(s) Summary
Changelog Files
CHANGELOG/*, CHANGELOG/README.md
Historical release notes for versions 0.6.0 through 1.18.0 with categorized changes, dependencies, and bug fixes.
Go Toolchain Updates
Dockerfile, Dockerfile.Windows, Dockerfile.debug, Dockerfile.node-labeler
Updated builder image from golang:1.24.6 to golang:1.26.0; added new Dockerfile for node-labeler with multi-stage build.
Build System
Makefile, cloudbuild.yaml, release-tools/cloudbuild.yaml
Added labeler build targets, multi-arch image publishing for node-labeler, and substitution formatting updates.
Package Reorganization
pkg/constants/constants.go
New package extracting disk/machine type constants and elevated struct fields (MachineHyperdiskLimit.Max/Value); includes VolumeOperationAlreadyExistsFmt constant.
Conversion Utilities
pkg/convert/convert.go, pkg/convert/convert_test.go
New package providing string-to-type conversion functions (Int64, Bool, MiB, GiB) and label/tag string parsing with validation (501 lines total).
Parameter Handling
pkg/parameters/*
New package consolidating disk/snapshot parameters, storage pool utilities, sanitization logic, and validation rules; replaces previous common package utilities (types, constants, utils, sanitize with comprehensive tests).
Device Path Management
pkg/deviceutils/device-utils.go
Added manual device linking fallback when udevadm polling fails; introduced findDevice and manuallySetDevicePath helpers.
Node Labeler Component
pkg/nodelabeler/*
New reconciler-based labeler (reconciler.go, configmap.go, corresponding tests) that patches node labels with disk-type compatibility based on machine type from ConfigMap data.
Metrics Enhancement
pkg/metrics/metrics.go, pkg/metrics/metadata.go
Added unexpected_device_path_changes counter metric and registration methods; updated metadata parameter type to use parameters.DiskParameters.
Cache and Link Management
pkg/linkcache/*
Added metricsManager parameter to device cache constructors and integrated device path change metrics recording.
CSI Driver Core Updates
cmd/gce-pd-csi-driver/main.go, cmd/gce-pd-node-labeler/main.go
Updated driver main to use new constants/convert packages and initialize unexpected device path metrics; added node-labeler entrypoint with controller-runtime integration.
Controller Implementation
pkg/gce-pd-csi-driver/controller.go, pkg/gce-pd-csi-driver/controller_test.go
Migrated to parameters package for disk/snapshot types; added dynamic volumes support via enableDynamicVolumes flag; updated parameter processor pattern (207 line changes).
Disk Selection Logic
pkg/gce-pd-csi-driver/disk_selection.go, pkg/gce-pd-csi-driver/disk_selection_test.go
New module selecting disk type based on CreateVolumeRequest parameters, topologies, and source volumes (571 lines total).
Node Implementation
pkg/gce-pd-csi-driver/node.go, pkg/gce-pd-csi-driver/node_test.go
Refactored to use constants package; introduced btrfsFlags struct consolidating Btrfs mount option handling (162 line changes).
Cloud Provider Interfaces
pkg/gce-cloud-provider/compute/gce-compute.go, pkg/gce-cloud-provider/compute/fake-gce.go
Updated GCECompute interface signatures: RepairUnderspecifiedVolumeKey adds fallbackZone parameter; Disk/Snapshot/Image methods use parameters types instead of common types (103 line changes).
Utility Functions
pkg/gce-pd-csi-driver/utils.go, pkg/gce-pd-csi-driver/utils_test.go
Migrated storage pool validation and access mode mapping to parameters package; updates signature types (73 line changes).
Deployment Manifests
deploy/kubernetes/base/controller/controller.yaml, deploy/kubernetes/base/node_*/node.yaml
Removed enable-multitenancy flag; restructured kubelet volume mounts (separate kubelet-plugin-dir and kubelet-pods); added node-labeler overlays with RBAC and ConfigMap.
Node Labeler Deployment
deploy/kubernetes/overlays/node-labeler/*
New kustomize overlay: ConfigMap with machine-PD compatibility matrix (258 lines), RBAC for labeler operations, image transformer, and controller patch (350 lines total).
Deployment Scripts
deploy/common.sh, deploy/disk_type_labels.sh, deploy/kubernetes/deploy-driver.sh, deploy/setup-project.sh
Added NODE_SERVICE_ACCOUNTS support for scoped IAM roles; new disk_type_labels.sh script maps GCE disk families to supported types (388 lines); enhanced role validation (34 line changes).
Documentation
docs/kubernetes/user-guides/driver-install.md, docs/kubernetes/user-guides/dynamic-volumes.md, README.md
Updated installation guide with security notes on NODE_SERVICE_ACCOUNTS; new dynamic-volumes guide (65 lines); expanded btrfs mount options table in README.
Test Infrastructure
test/e2e/tests/*, test/k8s-integration/cluster.go, test/remote/*, test/sanity/sanity_test.go, test/run-k8s-integration.sh
Migrated test references from common to constants/parameters packages; enhanced cluster setup with maintenance windows and hash verification; added SSH udev control methods; updated Kubernetes build and version defaults.
Go Dependencies
go.mod
Bumped Go toolchain to 1.25.0; updated CSI spec to v1.12.0, Kubernetes to v0.35.1, controller-runtime added, and extensive indirect dependency updates.
Common Package Cleanup
pkg/common/utils.go, pkg/common/utils_test.go, pkg/common/volume_lock.go
Removed exported utilities (ConvertLabels/Tags/Int64 functions, StoragePool types) and constant (VolumeOperationAlreadyExistsFmt) migrated to new packages; retained core cache/lock logic (308 line reduction).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Request
    participant Controller as Controller Server
    participant DiskSelection as Disk Selection
    participant CloudProvider as GCE Cloud Provider
    participant Node as Node Server

    Client->>Controller: CreateVolumeRequest (with parameters)
    activate Controller
    Controller->>DiskSelection: SelectDisk (params, topology, source)
    activate DiskSelection
    DiskSelection->>CloudProvider: GetDisk (source volume)
    CloudProvider-->>DiskSelection: disk type (if source)
    DiskSelection->>DiskSelection: evaluate topology preferences
    DiskSelection-->>Controller: selected disk type
    deactivate DiskSelection
    Controller->>Controller: extract DiskParameters
    Controller->>CloudProvider: InsertDisk (with disk type)
    activate CloudProvider
    CloudProvider-->>Controller: disk created
    deactivate CloudProvider
    Controller->>Node: volume available
    Node->>Node: stage volume (mount, format)
    Node-->>Client: volume ready
    deactivate Controller
Loading
sequenceDiagram
    participant K8sAPI as Kubernetes API
    participant NodeLabeler as Node Labeler
    participant ConfigMap as ConfigMap (machine-pd-compatibility)
    participant Node as Node Resource

    K8sAPI->>NodeLabeler: ConfigMap changed event
    activate NodeLabeler
    NodeLabeler->>ConfigMap: fetch machine-pd-compatibility.json
    activate ConfigMap
    ConfigMap-->>NodeLabeler: disk compatibility matrix
    deactivate ConfigMap
    NodeLabeler->>NodeLabeler: UpdateMachinePDCompatibility
    NodeLabeler->>K8sAPI: list all Nodes
    activate K8sAPI
    K8sAPI-->>NodeLabeler: node list
    deactivate K8sAPI
    NodeLabeler->>Node: Reconcile (per node)
    activate Node
    Node->>Node: extract instance-type label
    Node->>Node: compute compatible disks
    Node->>Node: patch labels (disk-type.gke.io/*)
    K8sAPI->>Node: update applied
    deactivate Node
    NodeLabeler-->>K8sAPI: reconciliation complete
    deactivate NodeLabeler
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

@dfajmon dfajmon changed the title STOR-2751: Rebase to upstream v1.23.3 for OCP 4.22 STOR-2751: Rebase to upstream v1.24.1 for OCP 4.22 Mar 11, 2026
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 11, 2026

@dfajmon: This pull request references STOR-2751 which is a valid jira issue.

Details

In response to this:

Issue link
https://issues.redhat.com/browse/STOR-2751

Diff to upstream v1.24.1
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.24.1...dfajmon:rebase-v1.24.1

Notes for reviewers

The rebase was initially based on v1.23.3. A new version was released recently, and because of merge conflicts introduced by cherry-picks, we have decided to rebase to a newer version.

Summary of changes

Major Features

  • feat: add PD node labeler (#2257)
  • Support Downward API node name override (#2277)
  • Retrieve node name from metadata service (#2279)
  • Add disk support label to NodeGetInfo topologies (#2199)
  • Move VAC api from v1beta1 to v1 (#2175)
  • Add support for NVMe disk types by using deviceutils (#2178)
  • Add disk type selection logic for generic volumes (#2235)
  • Replace kubelet volume mount with subdirectories (#2259)
  • Cache devices and their symlinks in node driver, periodically noting changes (#2141)
  • feat: Instrument unexpected device path changes (#2193)
  • [btrfs] support dynamic_reclaim (#2164)
  • [btrfs] add btrfs-specific bdi/read_ahead_kb (#2156)

Notable Bug Fixes

  • Fix partial cache tail latency by correcting the cache chunk size calculation (#2176)
  • Check for nil waitOp before examining error code (#2238)
  • Manually create udev links if udev trigger doesn't work (#2216)
  • [btrfs csi driver] fix blkid argument (#2248)
  • pkg/common/utils.go: update regionalLocationFmt for Europe with 10+ regions (#2117)

Cherry-picked commits

Commit Subject Author
db5f1ba7 UPSTREAM: <carry>: Add OpenShift files @jsafrane
19f4f8fc UPSTREAM: 2279: Retrieve node name from metadata service @noamasu

Upstream changelogs

Full changelog

kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.21.6...v1.24.1

Last rebase

#85

@openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 11, 2026

@dfajmon: This pull request references STOR-2751 which is a valid jira issue.

Details

In response to this:

Issue link
https://issues.redhat.com/browse/STOR-2751

Diff to upstream v1.24.1
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.24.1...dfajmon:rebase-v1.24.1

Notes for reviewers

The rebase was initially based on v1.23.3. A new version was released recently, and because of merge conflicts introduced by cherry-picks, we have decided to rebase to a newer version.

Summary of changes

Major Features

  • feat: add PD node labeler (#2257)
  • Support Downward API node name override (#2277)
  • Retrieve node name from metadata service (#2279)
  • Add disk support label to NodeGetInfo topologies (#2199)
  • Move VAC api from v1beta1 to v1 (#2175)
  • Add support for NVMe disk types by using deviceutils (#2178)
  • Add disk type selection logic for generic volumes (#2235)
  • Replace kubelet volume mount with subdirectories (#2259)
  • Cache devices and their symlinks in node driver, periodically noting changes (#2141)
  • feat: Instrument unexpected device path changes (#2193)
  • [btrfs] support dynamic_reclaim (#2164)
  • [btrfs] add btrfs-specific bdi/read_ahead_kb (#2156)

Notable Bug Fixes

  • Fix partial cache tail latency by correcting the cache chunk size calculation (#2176)
  • Check for nil waitOp before examining error code (#2238)
  • Manually create udev links if udev trigger doesn't work (#2216)
  • [btrfs csi driver] fix blkid argument (#2248)
  • pkg/common/utils.go: update regionalLocationFmt for Europe with 10+ regions (#2117)

Cherry-picked commits

Commit Subject Author
db5f1ba7 UPSTREAM: <carry>: Add OpenShift files @jsafrane
19f4f8fc UPSTREAM: 2279: Retrieve node name from metadata service @noamasu

Upstream changelogs

Full changelog

kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.21.6...v1.24.1

Last rebase

#85

@openshift/storage

Summary by CodeRabbit

  • New Features

  • Added Dynamic Volumes support for flexible disk provisioning via StorageClass parameters

  • Added machine and disk type compatibility labels for node-aware scheduling

  • Extended Btrfs mount option support including dynamic reclaim and read-ahead settings

  • Documentation

  • Added Dynamic Volumes user guide with configuration examples

  • Enhanced security documentation for scoped service account impersonation

  • Expanded Btrfs mount options reference with sysfs settings

  • Improvements

  • Updated Go toolchain to 1.26.0

  • Enhanced service account role configuration flexibility

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
pkg/deviceutils/device-utils.go (1)

352-356: ⚠️ Potential issue | 🔴 Critical

Nil pointer dereference when err is nil but devFsSerial is empty.

When getDevFsSerial returns an empty serial with a nil error (e.g., for an unrecognized device prefix per line 315), the condition on line 352 evaluates to true, but line 355 unconditionally calls err.Error(), causing a panic.

🐛 Proposed fix to handle nil error safely
 		if err != nil || len(devFsSerial) == 0 {
 			// If we get an error, ignore. Either this isn't a block device, or it
 			// isn't something we can get a serial number from
-			klog.Errorf("failed to get serial num for disk %s at device path %s: %v", deviceName, devFsPath, err.Error())
+			klog.V(4).Infof("skipping device path %s for disk %s: serial=%q, err=%v", devFsPath, deviceName, devFsSerial, err)
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/deviceutils/device-utils.go` around lines 352 - 356, The klog.Errorf call
in the loop after getDevFsSerial can dereference a nil err because it
unconditionally uses err.Error() when devFsSerial == "" but err == nil; update
the error logging in the block that checks "if err != nil || len(devFsSerial) ==
0" (around getDevFsSerial usage) so it only calls err.Error() when err != nil,
e.g. build a message variable that includes the returned error when present and
otherwise logs a descriptive message that serial is empty for
deviceName/devFsPath (referencing getDevFsSerial, devFsSerial, err, deviceName,
devFsPath, and klog.Errorf).
test/run-k8s-integration.sh (1)

83-85: ⚠️ Potential issue | 🟠 Major

Inverted condition: flag added when variable is empty.

The condition [ -z "$gke_node_version" ] is true when gke_node_version is empty, but then it adds --gke-node-version=${gke_node_version} which would pass an empty value. This appears inverted — typically you'd add the flag when the variable is set (non-empty):

🐛 Proposed fix
-if [ -z "$gke_node_version" ]; then
+if [ -n "$gke_node_version" ]; then
   base_cmd="${base_cmd} --gke-node-version=${gke_node_version}"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/run-k8s-integration.sh` around lines 83 - 85, The condition is inverted:
it appends the --gke-node-version flag to base_cmd when gke_node_version is
empty; change the test to only append the flag when gke_node_version is
non-empty (i.e., test for non-zero length of gke_node_version) so base_cmd only
gets --gke-node-version=${gke_node_version} when the variable is set.
pkg/gce-cloud-provider/compute/fake-gce.go (1)

174-190: ⚠️ Potential issue | 🟡 Minor

Reject unresolved dynamic disk types in the fake provider.

InsertDisk currently turns any params.DiskType into a URI. If the controller accidentally passes parameters.DynamicVolumeType through instead of resolving it to a real PD/Hyperdisk type, tests will still succeed here even though GCE would reject that disk type.

🔧 Suggested guard
 func (cloud *FakeCloudProvider) InsertDisk(ctx context.Context, project string, volKey *meta.Key, params parameters.DiskParameters, capBytes int64, capacityRange *csi.CapacityRange, replicaZones []string, snapshotID string, volumeContentSourceVolumeID string, multiWriter bool, accessMode string) error {
+	if params.DiskType == parameters.DynamicVolumeType {
+		return fmt.Errorf("unresolved disk type %q", params.DiskType)
+	}
 	if disk, ok := cloud.disks[volKey.String()]; ok {
 		err := ValidateExistingDisk(ctx, disk, params,
 			int64(capacityRange.GetRequiredBytes()),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-cloud-provider/compute/fake-gce.go` around lines 174 - 190, In
InsertDisk of FakeCloudProvider, add a guard that rejects unresolved dynamic
disk types: check params.DiskType against parameters.DynamicVolumeType before
calling cloud.GetDiskTypeURI (or building computeDisk) and return a clear error
if it's still DynamicVolumeType, so tests cannot pass with an unresolved dynamic
type; reference the InsertDisk function, params.DiskType,
parameters.DynamicVolumeType, and cloud.GetDiskTypeURI when making the change.
pkg/gce-pd-csi-driver/controller.go (1)

1163-1169: ⚠️ Potential issue | 🟠 Major

Use instanceZone when repairing publish keys.

The new repair helper now errors when the same zonal disk name exists in multiple zones and no fallback is provided. In this path the target node’s zone is already known, so passing "" regresses attach for legacy underspecified volume IDs that could still be resolved safely.

Suggested fix
-project, volKey, err = gceCS.CloudProvider.RepairUnderspecifiedVolumeKey(ctx, project, volKey, "")
+project, volKey, err = gceCS.CloudProvider.RepairUnderspecifiedVolumeKey(ctx, project, volKey, instanceZone)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/controller.go` around lines 1163 - 1169, The repair
call to gceCS.CloudProvider.RepairUnderspecifiedVolumeKey passes an empty
fallback zone, which breaks resolution when the same disk name exists in
multiple zones; update the call in the ControllerPublishVolume path to pass the
known target node zone (instanceZone) instead of "" so
RepairUnderspecifiedVolumeKey can disambiguate the zonal disk; locate the call
using the symbols gceCS.CloudProvider.RepairUnderspecifiedVolumeKey, project,
volKey and instanceZone and replace the empty-string fallback with instanceZone,
keeping the existing error handling logic unchanged.
🟡 Minor comments (32)
pkg/deviceutils/device-utils.go-367-373 (1)

367-373: ⚠️ Potential issue | 🟡 Minor

os.Symlink fails if a stale symlink already exists at the target path.

If a broken or stale symlink already exists at /dev/disk/by-id/google-<serial>, os.Symlink returns an error. Consider removing any existing file before creating the symlink.

🛠️ Proposed fix to handle existing symlinks
 func manuallySetDevicePath(deviceName string) error {
 	devFsPath, devFsSerial, err := findDevice(deviceName)
 	if err != nil {
 		return err
 	}
-	return os.Symlink(devFsPath, path.Join(diskByIdPath, diskGooglePrefix+devFsSerial))
+	symlinkPath := path.Join(diskByIdPath, diskGooglePrefix+devFsSerial)
+	// Remove any existing stale symlink before creating a new one
+	if err := os.Remove(symlinkPath); err != nil && !os.IsNotExist(err) {
+		return fmt.Errorf("failed to remove existing symlink %s: %w", symlinkPath, err)
+	}
+	return os.Symlink(devFsPath, symlinkPath)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/deviceutils/device-utils.go` around lines 367 - 373, In
manuallySetDevicePath, os.Symlink can fail if a stale/broken symlink already
exists at the target (path.Join(diskByIdPath, diskGooglePrefix+devFsSerial));
before calling os.Symlink, compute target := path.Join(...), use
os.Lstat(target) to see if something exists, if it is an existing symlink remove
it (os.Remove(target)) and if a non-symlink file/dir exists return a clear
error; then call os.Symlink(devFsPath, target) and propagate any error from that
call. Ensure you reference devFsPath, devFsSerial, diskByIdPath and
diskGooglePrefix inside manuallySetDevicePath.
test/k8s-integration/cluster.go-259-265 (1)

259-265: ⚠️ Potential issue | 🟡 Minor

Make the exclusion duration consistent.

Line 264 adds only 2 hours, but this block says it is creating a 4-hour exclusion. Either update the comment or extend the end time to 4 hours; otherwise longer runs can still be upgraded mid-test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/k8s-integration/cluster.go` around lines 259 - 265, The comment says a
4-hour maintenance exclusion is created but the code uses
startExclusionTime.Add(2*time.Hour); update the duration to match the comment by
changing the end time expression to startExclusionTime.Add(4*time.Hour) (the
exec.Command call that builds the gcloud args) or alternately change the comment
to state 2 hours—prefer updating the end to 4 hours so startExclusionTime and
the "--add-maintenance-exclusion-end" argument reflect a 4-hour window.
README.md-112-112 (1)

112-112: ⚠️ Potential issue | 🟡 Minor

Fix the duplicated word in the link text.

See more in the [in btrfs docs] reads like a typo in the rendered README.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 112, The README link text contains a duplicated word:
change the link text "[in btrfs docs]" to remove the extra "in" (e.g., use
"[btrfs docs]" or "[the btrfs docs]") so the sentence reads correctly while
preserving the URL; locate the instance of the link text "[in btrfs docs]" in
README.md and update it accordingly.
docs/kubernetes/user-guides/dynamic-volumes.md-40-40 (1)

40-40: ⚠️ Potential issue | 🟡 Minor

Fix the internal anchor in the table entry.

The fragment #applying-disk-support-labels does not match this file’s numbered heading, so the “more information” link is broken in the rendered doc.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kubernetes/user-guides/dynamic-volumes.md` at line 40, The table row for
`use-allowed-disk-topology` contains a broken internal link fragment
`#applying-disk-support-labels`; open this file, find the actual heading ID for
the section about applying support labels (the heading used for "Apply Support
Labels" or its numbered variant) and replace `#applying-disk-support-labels`
with the correct fragment so the "(See ...)" link points to the right anchor;
update only the fragment in the table entry referencing
`use-allowed-disk-topology` to match the exact heading slug used in the
document.
CHANGELOG/CHANGELOG-1.18.md-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Don't check in an empty changelog placeholder.

This lands as docs, but it currently contains only a TODO. Either populate the notes now or omit the file until they're ready.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.18.md` around lines 1 - 2, The changelog file currently
contains only a TODO placeholder under the heading "# v1.18.0 - Changelog since
v1.17.12"; either populate that section with the actual release notes (replace
the TODO line with the generated release notes content) or remove the
file/commit so no empty placeholder is checked in — update the content under the
"# v1.18.0 - Changelog since v1.17.12" heading accordingly.
pkg/convert/convert_test.go-289-295 (1)

289-295: ⚠️ Potential issue | 🟡 Minor

The >50 tags case is failing for the wrong reason.

p1p51 are invalid parent_id values for this parser, so this subtest never reaches the limit guard. Use valid org/project IDs here so the count check is what actually gets exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/convert/convert_test.go` around lines 289 - 295, The "50 tags at most"
subtest is using invalid parent_id values (p1…p51) so the parser bails before
hitting the 50-tag guard; update the test case's tags field (the test with name
"50 tags at most" in convert_test.go) to use valid org/project-style parent IDs
(i.e., real-looking org and project identifiers) for each tag so the parser
accepts them and the count limit check is exercised; keep the same number of
tags (51) and expectedError=true so the test fails due to exceeding the 50-tag
limit rather than invalid parent IDs.
pkg/convert/convert.go-72-75 (1)

72-75: ⚠️ Potential issue | 🟡 Minor

Fix the field name in the invalid-label-key error.

This branch validates key, but the message says label value, so bad keys point operators at the wrong side of the input.

Proposed fix
 	checkLabelKeyFn := func(key string) error {
 		if !regexKey.MatchString(key) {
-			return fmt.Errorf("label value %q is invalid (should start with lowercase letter / lowercase letter, digit, _ and - chars are allowed / 1-63 characters", key)
+			return fmt.Errorf("label key %q is invalid (should start with lowercase letter / lowercase letter, digit, _ and - chars are allowed / 1-63 characters", key)
 		}
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/convert/convert.go` around lines 72 - 75, The error message in
checkLabelKeyFn incorrectly says "label value" while validating the key; update
the fmt.Errorf call (the one using regexKey and key) to say "label key %q is
invalid ..." instead of "label value %q is invalid ..." so invalid keys produce
the correct field name in the error; keep the rest of the message and formatting
unchanged.
pkg/convert/convert_test.go-143-160 (1)

143-160: ⚠️ Potential issue | 🟡 Minor

These cases never hit the label-value length check.

Both fixtures put the long string on the key side of =, so they only re-test key validation and leave value-length behavior uncovered.

Proposed fix
 			{
 				name:          "label value may not have over 63 characters",
-				labels:        "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij1234=v",
+				labels:        "k=abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij1234",
 				expectedError: true,
 			},
@@
 			{
 				name:          "label value can have up to 63 characters",
-				labels:        "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij123=v",
+				labels:        "k=abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij123",
 				expectedError: false,
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/convert/convert_test.go` around lines 143 - 160, The two failing test
cases ("label value may not have over 63 characters" and "label value can have
up to 63 characters") currently place the long string on the key side of the '='
so they exercise key validation instead of value-length validation; update the
test fixtures in convert_test.go so the long 63+/63-length string is on the
value side (e.g., use "k=<long-string>" rather than "<long-string>=v") for those
two cases to ensure the label-value length checks in the code paths exercised by
the test are actually covered.
CHANGELOG/CHANGELOG-1.10.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Fix malformed markdown link.

The markdown link syntax is incorrect - missing the opening parenthesis before the URL.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1602`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1602`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.10.md` at line 7, Fix the malformed markdown link in
the CHANGELOG entry containing "Change GetDisk error reporting to temporary in
CreateVolume codepath
([`#1602`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602),
[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))"
by inserting the missing opening parenthesis before the PR URL so the PR
reference becomes a proper markdown link (e.g.,
"([`#1602`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602))")
and ensure any extra closing parentheses are corrected around the PR and author
references.
CHANGELOG/CHANGELOG-1.11.md-94-220 (1)

94-220: ⚠️ Potential issue | 🟡 Minor

Remove the stray dependency block under v1.11.5.

Starting at Line 94, a large dependency list is pasted after ### Removed / _Nothing has changed._, begins mid-token (pigateway), and runs until the next real section header. That makes the v1.11.5 changelog structurally inaccurate and very hard to read.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.11.md` around lines 94 - 220, Remove the stray
dependency block that was pasted into the v1.11.5 section: locate the "v1.11.5"
/ "### Removed" section containing the line starting with "pigateway: v1.4.0 →
v1.6.1" (directly after "_Nothing has changed._") and delete the entire
dependency list up to the next real section header so the v1.11.5 changelog only
contains its intended content.
CHANGELOG/CHANGELOG-1.11.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Fix the broken PR link in the v1.11.9 entry.

Line 7 has malformed Markdown, so the PR reference will not render correctly in the changelog.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1601`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1601`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.11.md` at line 7, The changelog entry "Change GetDisk
error reporting to temporary in CreateVolume codepath" has malformed Markdown
for the PR link; fix it by making the PR number a proper link like
[`#1601`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601)
and ensure the trailing comma/parentheses are balanced so the username link
[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)
renders correctly (remove the extra closing parenthesis and place
commas/parentheses appropriately).
CHANGELOG/CHANGELOG-1.13.md-29-30 (1)

29-30: ⚠️ Potential issue | 🟡 Minor

Keep the cherry-pick entry in a single bullet.

Line 30 starts with #1667: on a new line, so the second cherry-picked PR falls out of the list and renders as malformed Markdown.

📝 Proposed fix
-- Automated cherry pick of `#1666`: migrate hyperdisk/chd/storagepools to GCE v1 disk API
-#1667: remove support for GCE Alpha Disks by `@amacaskill` in `#1669`
+- Automated cherry pick of `#1666`: migrate hyperdisk/chd/storagepools to GCE v1 disk API, `#1667`: remove support for GCE Alpha Disks by `@amacaskill` in `#1669`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.13.md` around lines 29 - 30, The cherry-pick entry is
split across two lines causing malformed Markdown; merge the two lines into a
single bullet so the entire entry remains one list item — combine the text so
the bullet reads something like the original first line followed by the second
(include the PR refs `#1666`, `#1667` and the mention of `@amacaskill` and `#1669`
together) ensuring the line begins with a single "-" to keep it a proper list
item.
CHANGELOG/CHANGELOG-1.13.md-69-72 (1)

69-72: ⚠️ Potential issue | 🟡 Minor

Rejoin the wrapped links in the v1.13.3 section.

The PR and author URLs are split across Lines 69-72, which breaks both markdown links in the rendered changelog.

📝 Proposed fix
-- Flag --use-instance-api-to-poll-attachment-disk-types uses instances.get API when polling for disk attachment in ControllerPublish for passed in disk types ([`#1630`](https://github.com/kube
-rnetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1630), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
-- Support for read_ahead_kb mount flag to change block device readahead ([`#1631`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1631), [`@k8s-infra-cherrypick-`
-robot](https://github.com/k8s-infra-cherrypick-robot))
+- Flag --use-instance-api-to-poll-attachment-disk-types uses instances.get API when polling for disk attachment in ControllerPublish for passed in disk types ([`#1630`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1630), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Support for read_ahead_kb mount flag to change block device readahead ([`#1631`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1631), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.13.md` around lines 69 - 72, The changelog has
wrapped/broken markdown links in the v1.13.3 bullets: the PR and author URLs for
the entries starting with "Flag
--use-instance-api-to-poll-attachment-disk-types" and "Support for read_ahead_kb
mount flag" are split across lines (the "[`#1630`](" and
"[`@k8s-infra-cherrypick-robot`](" fragments and the "[`#1631`](" and
"[`@k8s-infra-cherrypick-robot`](" fragments). Fix this by rejoining each broken
link so each PR link and each author link is a single continuous markdown link
(e.g., [`#1630`](https://github.com/...) and
[`@k8s-infra-cherrypick-robot`](https://github.com/...) and similarly for `#1631`),
preserving the existing text and punctuation.
CHANGELOG/CHANGELOG-1.9.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Fix the broken PR link in the v1.9.17 entry.

Line 7 has malformed Markdown, so the PR reference will not render as a clickable link in the changelog.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1603`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1603`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.9.md` at line 7, The changelog entry has malformed
Markdown for the PR link in the v1.9.17 entry; update the text so the PR
reference is a proper Markdown link (use
[`#1603`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603)
form) and remove the stray characters so the contributor mention
([`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
remains a valid link; edit the line containing "Change GetDisk error
reporting..." to replace the broken fragment with the correctly formatted PR and
author links.
CHANGELOG/CHANGELOG-1.14.md-47-55 (1)

47-55: ⚠️ Potential issue | 🟡 Minor

Remove the stray fenced block in the 1.14.0 entry.

Line 50 opens a code fence that never closes, so the rest of the changelog renders incorrectly. That also explains the downstream heading warning.

📝 Suggested fix
 - The --instances-list-filters command line flag allows the driver to filter on response properties when listing VMs from the GCE API in ListVolumes RPC.
 - The --use-instance-api-to-list-volumes-published-nodes command line flag allows the driver to use the instances.list API instead of disks.list API when determining PublishedNodeIs in ListVolumes RPC.
-    ```
-  
   **Testing**:
-  
   Validated existing behavior with: ([`#1696`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1696), [`@pwschuurman`](https://github.com/pwschuurman))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.14.md` around lines 47 - 55, Remove the stray opening
fenced code block in the 1.14.0 changelog entry: locate the lone backtick fence
that begins before the "Testing" section (the unmatched "```") and delete it so
the "Testing" paragraph and following bullets render normally; ensure the bullet
list under "Testing" is properly formatted (no stray empty list markers) after
removing the fence.
CHANGELOG/CHANGELOG-1.12.md-162-162 (1)

162-162: ⚠️ Potential issue | 🟡 Minor

Orphaned dependency content appears truncated.

Line 162 starts mid-text with e.com/go/area120: v0.6.0 → v0.8.1 which appears to be a continuation from a previous changelog entry that was improperly merged. This content block (lines 162-289) seems to belong under the v1.12.3 "Changed" section but the version header and package name prefix are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.12.md` at line 162, The changelog contains an orphaned
line starting with "e.com/go/area120: v0.6.0 → v0.8.1" that was truncated and is
missing its package prefix and the v1.12.3 "Changed" header; restore this block
by moving that line (and the following lines in the block) under the v1.12.3
"Changed" section and prepend the correct package prefix so the entry reads as a
complete package line (e.g., restore the missing leading characters so it
becomes "github.com/go/area120: v0.6.0 → v0.8.1" or the actual original package
name), and ensure the v1.12.3 "Changed" header is present above the block so the
entry is properly grouped.
pkg/parameters/sanitize.go-5-6 (1)

5-6: ⚠️ Potential issue | 🟡 Minor

Typo in comment: "sanitizeer" should be "sanitizer".

📝 Proposed fix
-// Map of sanitizers for parameters that apply to only a subset of disk types. Each sanitizeer
+// Map of sanitizers for parameters that apply to only a subset of disk types. Each sanitizer
 // should only be responsible for a single parameter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/parameters/sanitize.go` around lines 5 - 6, Update the comment on the map
of sanitizers to correct the typo: change "sanitizeer" to "sanitizer" so the
sentence reads that each sanitizer should only be responsible for a single
parameter; edit the comment near the map of sanitizers in
pkg/parameters/sanitize.go accordingly.
CHANGELOG/CHANGELOG-1.8.md-415-415 (1)

415-415: ⚠️ Potential issue | 🟡 Minor

Trailing space in link text.

The author link has a trailing space before the closing bracket.

📝 Proposed fix
-- Fix missing shared library libedit.so.2 caused from updating base image in [`#1162`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1162) ([`#1177`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1177), [`@sunnylovestiramisu` ](https://github.com/sunnylovestiramisu))
+- Fix missing shared library libedit.so.2 caused from updating base image in [`#1162`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1162) ([`#1177`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1177), [`@sunnylovestiramisu`](https://github.com/sunnylovestiramisu))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.8.md` at line 415, The changelog entry contains a
trailing space in the author link text "[`@sunnylovestiramisu` ]" which causes an
extra space before the closing bracket; edit CHANGELOG-1.8.md and remove the
trailing space so the link reads "[`@sunnylovestiramisu`]" (update the specific
line containing the entry shown in the diff).
CHANGELOG/CHANGELOG-1.12.md-49-49 (1)

49-49: ⚠️ Potential issue | 🟡 Minor

Duplicate broken markdown link syntax.

Same malformed link format as line 40.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.12.md` at line 49, The changelog line contains a
malformed/duplicated markdown link for the PR reference in the text "Change
GetDisk error reporting to temporary in CreateVolume codepath
([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600),
[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))";
fix it by replacing the broken combined link with a single properly formatted
markdown link for the PR (e.g., [`#1600`](https://github.com/.../pull/1600)) and
ensure the maintainer mention remains a valid link
([`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
with balanced parentheses and no extra punctuation.
CHANGELOG/CHANGELOG-1.12.md-40-40 (1)

40-40: ⚠️ Potential issue | 🟡 Minor

Broken markdown link syntax.

The link format is malformed - missing opening parenthesis before the URL.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.12.md` at line 40, Fix the malformed Markdown link in
the changelog line that currently reads "([`#1600`])https://github.com/...)" by
converting it to the proper inline link syntax; replace
"([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600)"
with
"([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600))"
and ensure the contributor reference
"[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)"
has matching parentheses (remove any extra trailing parenthesis).
CHANGELOG/CHANGELOG-1.8.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Broken markdown link syntax.

Same malformed link format as seen in CHANGELOG-1.12.md.

📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1604`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
+- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1604`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.8.md` at line 7, The changelog line contains malformed
Markdown link syntax around the PR and author references in the text "Change
GetDisk error reporting to temporary in CreateVolume codepath
([`#1604`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604),
[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))";
fix it by converting the PR reference to a proper link like
[`#1604`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604)
and ensure the author reference is a valid link
[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)
with no extra parentheses or stray characters so the line reads correctly.
pkg/gce-pd-csi-driver/disk_selection.go-59-61 (1)

59-61: ⚠️ Potential issue | 🟡 Minor

Typo: "unspecfied" should be "unspecified".

📝 Suggested fix
 	// Determine default disk type based on preference parameter. If the parameter is
-	// unspecfied than default to hdType.
+	// unspecified then default to hdType.
 	defaultDiskType := hdType
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/disk_selection.go` around lines 59 - 61, Fix the typo
in the comment above the defaultDiskType variable: change "unspecfied" to
"unspecified" in the comment that explains determining default disk type
(referencing defaultDiskType and hdType) so the comment reads "...If the
parameter is unspecified then default to hdType."
pkg/gce-pd-csi-driver/disk_selection_test.go-162-187 (1)

162-187: ⚠️ Potential issue | 🟡 Minor

Duplicate test cases with misleading names.

The test cases "pd default override" (lines 163-174) and "hd default override" (lines 176-187) are identical - both set ParameterDiskPreference to ParameterHDType and expect Default to be hdTestDiskType. One of them should test the PD preference scenario.

🐛 Suggested fix
 		{
 			desc: "pd default override",
 			parameters: map[string]string{
 				parameters.ParameterPDType:         pdTestDiskType,
 				parameters.ParameterHDType:         hdTestDiskType,
-				parameters.ParameterDiskPreference: parameters.ParameterHDType,
+				parameters.ParameterDiskPreference: parameters.ParameterPDType,
 			},
 			want: &dynamicDiskTypes{
 				PD:      pdTestDiskType,
 				HD:      hdTestDiskType,
-				Default: hdTestDiskType,
+				Default: pdTestDiskType,
 			},
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/disk_selection_test.go` around lines 162 - 187, The two
test cases in disk_selection_test.go named "pd default override" and "hd default
override" are identical; update the one labeled "pd default override" (or the
second duplicate) so it actually tests PD preference: set
parameters.ParameterDiskPreference to parameters.ParameterPDType (instead of
ParameterHDType), update the desc to "pd default override" if needed, and change
the expected want.Default in the dynamicDiskTypes to pdTestDiskType to assert PD
becomes the default.
CHANGELOG/CHANGELOG-1.1.md-17-18 (1)

17-18: ⚠️ Potential issue | 🟡 Minor

Minor grammar fix: hyphenate "cherry-picked".

The compound adjective should be hyphenated.

📝 Suggested fix
-- Update GCE PD CSI Driver Docker base image to `k8s.gcr.io/build-image/debian-base-amd64:v2.1.3` (previously `gcr.io/google-containers/debian-base-amd64:v2.0.0`) to address CVEs. ([`#596`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/596), [`@saad-ali`](https://github.com/saad-ali))
-  - Also cherry picked to 1.0.1.
+- Update GCE PD CSI Driver Docker base image to `k8s.gcr.io/build-image/debian-base-amd64:v2.1.3` (previously `gcr.io/google-containers/debian-base-amd64:v2.0.0`) to address CVEs. ([`#596`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/596), [`@saad-ali`](https://github.com/saad-ali))
+  - Also cherry-picked to 1.0.1.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.1.md` around lines 17 - 18, Fix the minor grammar by
hyphenating the compound adjective: update the phrase "Also cherry picked to
1.0.1." to "Also cherry-picked to 1.0.1." (look for the sentence containing
"cherry picked" in the CHANGELOG entry mentioning the GCE PD CSI Driver Docker
base image).
pkg/gce-pd-csi-driver/node_test.go-673-694 (1)

673-694: ⚠️ Potential issue | 🟡 Minor

Recreate the fake Btrfs sysfs tree per subtest.

These fixtures are initialized once but mutated by NodeStageVolume. Later cases then inherit values written by earlier ones, so some Btrfs assertions can pass without actually exercising the write path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/node_test.go` around lines 673 - 694, The test
currently creates the btrfsFixtures tree once (btrfsFixtures, btrfsPrefix,
tempDir) and reuses it across multiple subtests, allowing state written by
NodeStageVolume to leak into later cases; change the test to recreate (or clear
and re-create) the fake Btrfs sysfs tree before each subtest iteration so each
case starts with fresh files. Concretely, move the loop that creates dirs and
writes files (using btrfsFixtures -> fullPath, path.Dir, os.MkdirAll,
os.WriteFile) into the per-subtest setup (or add code to remove btrfsPrefix and
re-run creation) so NodeStageVolume tests cannot inherit prior mutations.
CHANGELOG/CHANGELOG-1.3.md-75-75 (1)

75-75: ⚠️ Potential issue | 🟡 Minor

Normalize the heading hierarchy in these release sections.

Both blocks jump from # straight to ###, which trips markdownlint MD001 and makes the document structure inconsistent. Either add a ## Changes by Kind wrapper or promote these headings to ##.

Also applies to: 92-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-1.3.md` at line 75, The release sections jump from
top-level headings to "###" which violates MD001; update the release blocks by
either adding a parent "## Changes by Kind" wrapper above the existing "###"
subsections or promoting each "### ..." heading (for example the "### Issues"
heading and the other similar "###" heading referenced) to "##" so the hierarchy
is consistent and markdownlint MD001 is satisfied; apply the same change to the
second occurrence noted in the review.
CHANGELOG/CHANGELOG-0.7.md-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor

Fix the PodSecurityPolicy typo.

PodSecurityPoliciy is misspelled here, which makes the release note harder to search and reuse downstream.

📝 Suggested fix
-- Adding `PodSecurityPoliciy` to allow `csi-gce-pd-node` in clusters with policies enabled.
+- Adding `PodSecurityPolicy` to allow `csi-gce-pd-node` in clusters with policies enabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG/CHANGELOG-0.7.md` at line 5, Typo in the changelog: replace the
misspelled token "PodSecurityPoliciy" with the correct "PodSecurityPolicy" in
the release note sentence referring to allowing `csi-gce-pd-node`; update the
string exactly so downstream searches and reuse work (search for the literal
"PodSecurityPoliciy" and correct it to "PodSecurityPolicy").
pkg/gce-pd-csi-driver/node_test.go-1412-1415 (1)

1412-1415: ⚠️ Potential issue | 🟡 Minor

This blkidCalled sanity check never triggers.

blkidCalled is never updated while building the fake command list, so this guard cannot catch a test case that accidentally expects the UUID lookup without any Btrfs settings.

🛠️ Suggested fix
 				case "blkid":
 					if strings.Contains(cmd.args, "TYPE") {
 						cmd.stdout = fmt.Sprintf(cmd.stdout, fsType)
+					} else if strings.Contains(cmd.args, "--match-tag UUID") {
+						blkidCalled = true
 					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/node_test.go` around lines 1412 - 1415, The test's
blkidCalled guard never fires because blkidCalled is never set when the fake
blkid command is appended; update the fake command-list construction to set
blkidCalled = true at the moment you add the blkid command to the fake
exec/commands (the same place where other fake commands are pushed when building
the fakeExec), so the final check that inspects blkidCalled can correctly detect
when a blkid invocation was expected; look for the blkidCalled variable and the
block that appends the blkid fake command in node_test.go and set it there.
pkg/parameters/utils_test.go-185-185 (1)

185-185: ⚠️ Potential issue | 🟡 Minor

Minor typo in test error message.

There's a typo "bu tgot" which should be "but got".

Proposed fix
-		t.Errorf("test failed: the provided key %s expected to be %v bu tgot %v", tc.diskEncryptionKmsKey, tc.expectedIsValid, isValid)
+		t.Errorf("test failed: the provided key %s expected to be %v but got %v", tc.diskEncryptionKmsKey, tc.expectedIsValid, isValid)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/parameters/utils_test.go` at line 185, Fix the typo in the test error
message: update the t.Errorf call that currently prints "the provided key %s
expected to be %v bu tgot %v" (using tc.diskEncryptionKmsKey,
tc.expectedIsValid, isValid) to read "the provided key %s expected to be %v but
got %v" so the message correctly says "but got".
pkg/parameters/parameters.go-323-332 (1)

323-332: ⚠️ Potential issue | 🟡 Minor

Error message references "boolean" but function handles availability class.

The error message at line 331 says "unexpected boolean string" but this function converts availability class strings, not booleans.

Proposed fix
 func convertStringToAvailabilityClass(str string) (string, error) {
 	switch strings.ToLower(str) {
 	case ParameterNoAvailabilityClass:
 		return ParameterNoAvailabilityClass, nil
 	case ParameterRegionalHardFailoverClass:
 		return ParameterRegionalHardFailoverClass, nil
 	}
-	return "", fmt.Errorf("unexpected boolean string %s", str)
+	return "", fmt.Errorf("unexpected availability class string %s", str)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/parameters/parameters.go` around lines 323 - 332, The error message in
convertStringToAvailabilityClass incorrectly says "unexpected boolean string";
update the fmt.Errorf call in that function to reference availability class
(e.g., "unexpected availability class string %s") so the error accurately
reflects what the function handles (keep the input str included). Touch the
convertStringToAvailabilityClass function only and preserve existing constants
ParameterNoAvailabilityClass and ParameterRegionalHardFailoverClass.
pkg/gce-pd-csi-driver/controller_test.go-1372-1509 (1)

1372-1509: ⚠️ Potential issue | 🟡 Minor

These dynamic-volume cases still don't prove the dynamic path works.

The "enabled but standard type" case never sets enableDynamicVolumes, and the selection cases only compare VolumeId/topology. They will still pass if the feature gate is off or if pd-type / hyperdisk-type resolution is skipped. Please flip the flag on the first case and assert the created fake disk's Type in the selection cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/controller_test.go` around lines 1372 - 1509, Enable
the dynamic-volumes feature for the test case named "success with dynamic
volumes enabled but standard type" by setting enableDynamicVolumes: true on that
case, and in the dynamic-selection cases (e.g., cases with
parameters.ParameterKeyType == parameters.DynamicVolumeType and names like
"success with dynamic volumes selecting disk" / "success with dynamic volumes
selecting disk override") add assertions against the fake disk created by the
test harness to verify its Type field matches the resolved disk type (e.g.,
"pd-balanced" or "hyperdisk-balanced") according to parameters.ParameterPDType,
parameters.ParameterHDType and parameters.ParameterDiskPreference; use the same
test fixtures/helpers used elsewhere in controller_test.go to access the created
fake disk and assert disk.Type equals the expected value.
pkg/gce-pd-csi-driver/controller.go-653-661 (1)

653-661: ⚠️ Potential issue | 🟡 Minor

Return a diagnosable user error from this guard.

If this fallback check ever trips, status.Errorf(codes.Internal, "") gives callers an empty Internal error for what is really an invalid access mode request.

Suggested fix
 		} else if disksWithUnsettableAccessMode[params.DiskType] {
 			// Disallow multi-attach for HdT and HdE. These checks were done in `createVolumeInternal`,
 			// but repeating them here future-proves us from possible refactors.
 			if am != constants.GCEReadWriteOnceAccessMode {
-				return "", status.Errorf(codes.Internal, "")
+				return "", status.Errorf(codes.InvalidArgument, "invalid access mode %q for disk type %s", am, params.DiskType)
 			}
 		} else {
 			return am, nil
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/controller.go` around lines 653 - 661, The guard in
controller.go using common.IsHyperdisk and
getHyperdiskAccessModeFromCapabilities currently returns an empty Internal error
(status.Errorf(codes.Internal, "")); change this to return a diagnosable
user-facing error (e.g., codes.InvalidArgument) with a clear message that
includes the disk type (params.DiskType) and the resolved access mode (am),
stating that multi-attach is not supported for that disk type; update the
status.Errorf call accordingly and keep the surrounding checks
(disksWithUnsettableAccessMode and constants.GCEReadWriteOnceAccessMode) intact.

Comment on lines +25 to +27
echo "Kubernetes Integration:"
echo " To apply generated labels directly to a node:"
echo " kubectl label nodes <node-name> \$($0 -m <machine-type>)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the script to see the actual code at the mentioned lines
cat -n deploy/disk_type_labels.sh | sed -n '20,30p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 471


🏁 Script executed:

# Also check lines 381-383
wc -l deploy/disk_type_labels.sh

Repository: openshift/gcp-pd-csi-driver

Length of output: 101


🏁 Script executed:

# Search for all echo statements that output label format
rg "echo.*disk-type" deploy/disk_type_labels.sh -n

Repository: openshift/gcp-pd-csi-driver

Length of output: 121


🏁 Script executed:

# Let's look at more context around line 383 to understand the output generation
cat -n deploy/disk_type_labels.sh | sed -n '375,390p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 536


🏁 Script executed:

# Check if there are other locations emitting labels with quotes
rg "echo.*\\\"true\\\"" deploy/disk_type_labels.sh -n

Repository: openshift/gcp-pd-csi-driver

Length of output: 157


🏁 Script executed:

# Let's find all echo statements that generate label output
rg "echo.*disk-type" deploy/disk_type_labels.sh -n -A 1 -B 1

Repository: openshift/gcp-pd-csi-driver

Length of output: 166


🏁 Script executed:

# Let's also check if there's a help/usage function that might show more context
rg "usage\|help" deploy/disk_type_labels.sh -n -i

Repository: openshift/gcp-pd-csi-driver

Length of output: 53


🏁 Script executed:

# Let's see what the actual script outputs when run (check for any test data or examples)
head -100 deploy/disk_type_labels.sh | tail -50

Repository: openshift/gcp-pd-csi-driver

Length of output: 1602


🌐 Web query:

kubectl label nodes KEY=VALUE syntax requirements quotes

💡 Result:

kubectl label nodes basic syntax

kubectl label nodes <node-name> <key>=<value> [<key2>=<value2> ...]

--overwrite is required to change an existing label’s value. (kubernetes.io)

Label key/value requirements (Kubernetes rules)

  • Key: either name or prefix/name
    • name is required, max 63 chars, must start/end alphanumeric; may contain - _ . in between.
    • prefix (optional) must be a DNS subdomain (max 253 chars), followed by /. (kubernetes.io)
  • Value: max 63 chars; may be empty; if non-empty must start/end alphanumeric; may contain - _ . in between. (kubernetes.io)

Quotes: when needed (and when they won’t help)

  • For labeling (key=value), you typically don’t need quotes.
  • Quotes cannot make invalid labels valid (e.g., spaces aren’t allowed in label keys/values per the rules above). (kubernetes.io)
  • You do commonly use quotes for label selectors (not labeling), e.g.:
    kubectl get pods -l 'environment in (production),tier in (frontend)'
    (kubernetes.io)

Useful related forms

Citations:


🏁 Script executed:

# Let's also look at the usage function more completely to understand the context
cat -n deploy/disk_type_labels.sh | sed -n '1,40p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 1542


Emit unquoted label values.

The usage block tells users to pipe this output into kubectl label, but the script generates disk-type.gke.io/$disk="true" with literal quote characters. Kubernetes label values cannot contain " characters and must conform to [a-zA-Z0-9]([a-zA-Z0-9._-]{0,61}[a-zA-Z0-9])?, making the generated arguments invalid.

Suggested fix
-        echo "disk-type.gke.io/$disk=\"true\""
+        echo "disk-type.gke.io/$disk=true"

Also applies to line 383.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/disk_type_labels.sh` around lines 25 - 27, The usage and
label-generation currently emit labels with literal double quotes (e.g.,
disk-type.gke.io/<disk>="true"), which produce invalid Kubernetes label values;
update the code that prints label arguments so it emits unquoted values
(disk-type.gke.io/<disk>=true) by removing the surrounding quotes in the
printed/echoed label string and ensuring variable expansion is performed safely
(apply this change to the usage example that prints "kubectl label nodes
<node-name> \$($0 -m <machine-type>)" and to the label-emission logic referenced
near line 383 so both produce unquoted label values).

Comment on lines +3 to +6
resources:
- configmap.yaml
- rbac.yaml
- image.yaml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

kustom=$(fd '^kustomization\.yaml$' deploy/kubernetes/overlays/node-labeler)
image_cfg=$(fd '^image\.yaml$' deploy/kubernetes/overlays/node-labeler)

echo "== ${kustom} =="
sed -n '1,80p' "$kustom"
echo

echo "== ${image_cfg} =="
sed -n '1,120p' "$image_cfg"
echo

echo "== related kustomize declarations =="
rg -n -C2 'kind:\s*Kustomization|transformers:|resources:|ImageTagTransformer' deploy/kubernetes

Repository: openshift/gcp-pd-csi-driver

Length of output: 27337


Move image.yaml from resources: to transformers:.

image.yaml contains an ImageTagTransformer, which is a transformer helper—not a rendered Kubernetes manifest. It must be configured under transformers: instead of resources: to be applied correctly.

Corrected kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- configmap.yaml
- rbac.yaml
transformers:
- image.yaml
patchesStrategicMerge:
- patch.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/kubernetes/overlays/node-labeler/kustomization.yaml` around lines 3 -
6, The kustomization lists image.yaml under resources but image.yaml is a
transformer (contains an ImageTagTransformer) and must be listed under
transformers; update the Kustomization (kustomization.yaml) by removing
image.yaml from the resources: block and adding a transformers: section that
includes - image.yaml (leave configmap.yaml and rbac.yaml under resources and
keep existing patchesStrategicMerge entries).

Comment on lines +7 to +34
spec:
containers:
- name: gce-pd-node-labeler
image: cloud-provider-gcp/gce-pd-node-labeler
args:
- "--v=5"
- "--configmap-name=machine-pd-compatibility"
- "--configmap-namespace=gce-pd-csi-driver"
- "--http-endpoint=:22015"
ports:
- containerPort: 22015
name: http-endpoint
protocol: TCP
livenessProbe:
failureThreshold: 1
httpGet:
path: /healthz/node-labeler
port: http-endpoint
initialDelaySeconds: 10
timeoutSeconds: 10
periodSeconds: 20
resources:
requests:
cpu: 20m
memory: 50Mi
limits:
cpu: 50m
memory: 100Mi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden the node-labeler container security context.

This container inherits the default pod/container security context, so it can run with a broader privilege set than necessary. For a long-lived controller process, please at least force non-root execution, disable privilege escalation, and make the root filesystem read-only.

🔐 Proposed hardening
 spec:
   template:
     spec:
+      securityContext:
+        seccompProfile:
+          type: RuntimeDefault
       containers:
       - name: gce-pd-node-labeler
         image: cloud-provider-gcp/gce-pd-node-labeler
+        securityContext:
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
+          runAsNonRoot: true
+          capabilities:
+            drop:
+              - ALL
         args:
           - "--v=5"

If the binary needs scratch space, add an explicit emptyDir mount instead of keeping the whole root filesystem writable.

🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Trivy (0.69.3)

[error] 9-34: Root file system is not read-only

Container 'gce-pd-node-labeler' of Deployment 'csi-gce-pd-controller' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[error] 9-34: Default security context configured

container csi-gce-pd-controller in default namespace is using the default security context

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 7-34: Default security context configured

deployment csi-gce-pd-controller in default namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/kubernetes/overlays/node-labeler/patch.yaml` around lines 7 - 34, Add
a container-level securityContext to the gce-pd-node-labeler container to force
non-root and restrict privileges: set runAsNonRoot: true (and optionally
runAsUser to a non-zero uid), set allowPrivilegeEscalation: false and
privileged: false, and set readOnlyRootFilesystem: true; if the binary needs
writable scratch space, add an explicit emptyDir volume and mount it (e.g.,
volume name like node-labeler-tmp mounted at /tmp) instead of leaving the root
filesystem writable so the pod stays secure.

Comment on lines +47 to +49
```console
$ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Comma-separated list of node service accounts
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the file exists and get its structure
fd driver-install.md

Repository: openshift/gcp-pd-csi-driver

Length of output: 116


🏁 Script executed:

# Read the specific lines mentioned in the review
head -100 docs/kubernetes/user-guides/driver-install.md | tail -60

Repository: openshift/gcp-pd-csi-driver

Length of output: 2597


🏁 Script executed:

# Search for setup-project.sh and deploy-driver.sh to understand how they use NODE_SERVICE_ACCOUNTS
rg "setup-project.sh|deploy-driver.sh" --type md -B 3 -A 3

Repository: openshift/gcp-pd-csi-driver

Length of output: 6080


🏁 Script executed:

# Look for NODE_SERVICE_ACCOUNTS usage in the codebase
rg "NODE_SERVICE_ACCOUNTS" --type md -B 2 -A 2

Repository: openshift/gcp-pd-csi-driver

Length of output: 1579


🏁 Script executed:

# Search for any shell script files to understand variable usage
fd "setup-project.sh|deploy-driver.sh" -type f

Repository: openshift/gcp-pd-csi-driver

Length of output: 241


🏁 Script executed:

# Find and examine the setup-project.sh and deploy-driver.sh scripts
fd "setup-project.sh|deploy-driver.sh" -type f

Repository: openshift/gcp-pd-csi-driver

Length of output: 241


🏁 Script executed:

# Search for NODE_SERVICE_ACCOUNTS usage in script files
rg "NODE_SERVICE_ACCOUNTS" --type sh -B 3 -A 3

Repository: openshift/gcp-pd-csi-driver

Length of output: 2519


🏁 Script executed:

# Check if the scripts are in deploy directory
ls -la deploy/

Repository: openshift/gcp-pd-csi-driver

Length of output: 555


🏁 Script executed:

# Look at the deploy directory structure
find deploy -type f -name "*.sh" | head -20

Repository: openshift/gcp-pd-csi-driver

Length of output: 317


🏁 Script executed:

# Read the complete setup-project.sh script to understand how it reads NODE_SERVICE_ACCOUNTS
cat -n deploy/setup-project.sh | head -50

Repository: openshift/gcp-pd-csi-driver

Length of output: 2391


🏁 Script executed:

# Check how setup-project.sh uses NODE_SERVICE_ACCOUNTS and whether it expects an exported variable
cat -n deploy/setup-project.sh | grep -A 10 "NODE_SERVICE_ACCOUNTS"

Repository: openshift/gcp-pd-csi-driver

Length of output: 1019


🏁 Script executed:

# Examine the deploy-driver.sh script
cat -n deploy/kubernetes/deploy-driver.sh | grep -A 10 "NODE_SERVICE_ACCOUNTS"

Repository: openshift/gcp-pd-csi-driver

Length of output: 1204


🏁 Script executed:

# Check if there are examples in development.md showing how variables are set
cat -n docs/kubernetes/development.md | grep -B 5 -A 5 "export.*ENABLE"

Repository: openshift/gcp-pd-csi-driver

Length of output: 483


🏁 Script executed:

# Check if there are any inline assignment patterns used in the documentation
rg "NODE_SERVICE_ACCOUNTS\s*=.*\./(setup-project|deploy-driver)" docs/

Repository: openshift/gcp-pd-csi-driver

Length of output: 53


🏁 Script executed:

# Look at more context around the deploy examples to see the full command structure
sed -n '40,75p' docs/kubernetes/user-guides/driver-install.md

Repository: openshift/gcp-pd-csi-driver

Length of output: 2101


🏁 Script executed:

# Check the common.sh file to understand how the variable is checked
cat deploy/common.sh

Repository: openshift/gcp-pd-csi-driver

Length of output: 1077


Export NODE_SERVICE_ACCOUNTS to child shell processes.

These examples assign NODE_SERVICE_ACCOUNTS on a separate line from invoking the scripts. Without export, the scripts won't receive the variable and the scoped impersonation logic in use_scoped_sa_role() will silently fall back to the broader project-level serviceAccountUser role. The development guide already demonstrates this pattern with export ENABLE_KMS=true.

Suggested fix
-$ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com"  # Comma-separated list of node service accounts
+$ export NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com"  # Comma-separated list of node service accounts
-$ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com"  # Same as the setup-project.sh step
+$ export NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com"  # Same as the setup-project.sh step

Also applies to: 69-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kubernetes/user-guides/driver-install.md` around lines 47 - 49, The
example sets NODE_SERVICE_ACCOUNTS but does not export it, so child scripts (and
the use_scoped_sa_role() logic) won't see it; update the examples that assign
NODE_SERVICE_ACCOUNTS (e.g., the occurrences around the shown block and the
other instance) to export the variable (use "export NODE_SERVICE_ACCOUNTS=...")
so the scripts receive the value and scoped impersonation in
use_scoped_sa_role() works as intended.

golang.org/x/oauth2 v0.30.0
golang.org/x/sys v0.35.0
golang.org/x/time v0.12.0
go.opentelemetry.io/otel/sdk v1.36.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GO-2026-4394 GHSA-9h8m-3fm2-qjrq otel/sdk vulnerability

💡 Result:

GO-2026-4394 / GHSA-9h8m-3fm2-qjrq (CVE-2026-24051) is a local PATH-hijacking arbitrary code execution issue in the OpenTelemetry Go SDK on macOS/Darwin. The SDK’s host ID resource detection runs the ioreg command via PATH lookup (sdk/resource/host_id.go), so an attacker who can influence PATH can get code executed in the application’s context. [2][3]

  • Affected: go.opentelemetry.io/otel/sdk (notably the .../sdk/resource module), versions 1.21.0–1.39.0 (NVD description notes 1.20.0–1.39.0). [1][2][3]
  • Severity: High (CVSS 3.1: 7.0) (per GitHub/CNA). [2][4]
  • Fixed: upgrade to v1.40.0 (patch commit d45961b...). [1][2][3]

Mitigation (if you can’t upgrade immediately): ensure your service doesn’t run with an attacker-controlled PATH on macOS (e.g., sanitize/lock down environment variables and execution context). [2][3]

Sources: [1] Go vuln db (pkg.go.dev) GO-2026-4394; [2] OpenTelemetry GitHub security advisory GHSA-9h8m-3fm2-qjrq; [3] NVD CVE-2026-24051; [4] NVD metrics section showing GitHub/CNA CVSS 7.0.


🏁 Script executed:

git ls-files | grep -E "go\.(mod|sum)$"

Repository: openshift/gcp-pd-csi-driver

Length of output: 84


🏁 Script executed:

cat -n go.mod | head -50

Repository: openshift/gcp-pd-csi-driver

Length of output: 2202


Bump otel/sdk to v1.40.0 or later.

go.opentelemetry.io/otel/sdk v1.36.0 is vulnerable to GO-2026-4394 / GHSA-9h8m-3fm2-qjrq (CVE-2026-24051)—a local PATH-hijacking arbitrary code execution issue on macOS. Affected versions range from 1.20.0 to 1.39.0, with v1.40.0 as the first fixed release. Upgrade before merging.

🧰 Tools
🪛 OSV Scanner (2.3.3)

[HIGH] 24-24: go.opentelemetry.io/otel/sdk 1.36.0: OpenTelemetry Go SDK Vulnerable to Arbitrary Code Execution via PATH Hijacking in go.opentelemetry.io/otel/sdk

(GO-2026-4394)


[HIGH] 24-24: go.opentelemetry.io/otel/sdk 1.36.0: OpenTelemetry Go SDK Vulnerable to Arbitrary Code Execution via PATH Hijacking

(GHSA-9h8m-3fm2-qjrq)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 24, Update the OpenTelemetry SDK dependency to a fixed
release by changing the module version for go.opentelemetry.io/otel/sdk in
go.mod to v1.40.0 (or later); after updating the version run the module tooling
(e.g., go get go.opentelemetry.io/otel/sdk@v1.40.0 and go mod tidy) to refresh
go.sum and ensure the lockfile reflects the upgraded, non‑vulnerable release.

Comment on lines +293 to 299
klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err)
if err := manuallySetDevicePath(deviceName); err != nil {
return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err)
}
}

return devicePath, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

devicePath not updated after successful manual symlink creation.

When manuallySetDevicePath succeeds, it creates a symlink but devicePath remains empty or stale from the failed poll loop. The function then returns this incorrect value at line 299, causing callers to receive an invalid device path despite the symlink being successfully created.

🐛 Proposed fix to return the correct device path
 	if err != nil {
 		klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err)
 		if err := manuallySetDevicePath(deviceName); err != nil {
 			return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err)
 		}
+		// Update devicePath to the newly created symlink
+		devicePath = path.Join(diskByIdPath, diskGooglePrefix+deviceName)
 	}
 
 	return devicePath, nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err)
if err := manuallySetDevicePath(deviceName); err != nil {
return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err)
}
}
return devicePath, nil
klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err)
if err := manuallySetDevicePath(deviceName); err != nil {
return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err)
}
// Update devicePath to the newly created symlink
devicePath = path.Join(diskByIdPath, diskGooglePrefix+deviceName)
}
return devicePath, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/deviceutils/device-utils.go` around lines 293 - 299, The code returns a
stale/empty devicePath after manuallySetDevicePath succeeds; fix by updating
devicePath before returning: after manuallySetDevicePath(deviceName) returns
nil, resolve the newly created symlink for the deviceName (e.g., use os.Readlink
or filepath.EvalSymlinks on the symlink path derived from deviceName) and assign
that result to devicePath, handling any resolution error and returning it if
resolution fails so callers get the correct, current device path.

Comment on lines +327 to 337
params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters())
if gceCS.enableDynamicVolumes && params.IsDiskDynamic() {
params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error())
}
parameters.SanitizeDiskParameters(&params)
}
metrics.UpdateRequestMetadataFromParams(ctx, params)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fail fast on parameter extraction errors.

ExtractAndDefaultParameters can already return a non-nil err here. Because the dynamic-volume block reuses err before it is checked, a successful SelectDisk can clear the original extraction failure and let invalid parameters/data-cache settings through.

Suggested fix
 params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters())
+if err != nil {
+	return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err)
+}
 if gceCS.enableDynamicVolumes && params.IsDiskDynamic() {
 	params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider)
 	if err != nil {
 		return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error())
 	}
 	parameters.SanitizeDiskParameters(&params)
 }
 metrics.UpdateRequestMetadataFromParams(ctx, params)
-if err != nil {
-	return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error())
-}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters())
if gceCS.enableDynamicVolumes && params.IsDiskDynamic() {
params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error())
}
parameters.SanitizeDiskParameters(&params)
}
metrics.UpdateRequestMetadataFromParams(ctx, params)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error())
params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters())
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err)
}
if gceCS.enableDynamicVolumes && params.IsDiskDynamic() {
params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error())
}
parameters.SanitizeDiskParameters(&params)
}
metrics.UpdateRequestMetadataFromParams(ctx, params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gce-pd-csi-driver/controller.go` around lines 327 - 337, The code calls
gceCS.parameterProcessor().ExtractAndDefaultParameters(...) and then reuses the
err variable inside the dynamic-volume branch, which can overwrite an earlier
extraction error; fix by immediately checking the returned err from
ExtractAndDefaultParameters and returning a status error if non-nil before any
further processing (before the enableDynamicVolumes/IsDiskDynamic block and
before calling metrics.UpdateRequestMetadataFromParams), then proceed to
SelectDisk and parameters.SanitizeDiskParameters only when extraction succeeded.

Comment on lines +65 to +67
// Some architectures don't have local ssd. Give way to opt out of tests like datacache.
skipLocalSsdTests = flag.Bool("skip-local-ssd-tests", false, "Skip local ssd tests like datacache")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e
rg -n -C3 'skipLocalSsdTests|getLocalSsdCount|SetupDataCachingConfig' test/e2e/tests/setup_e2e_test.go
rg -n -C5 '\bfunc\s+SetupDataCachingConfig\s*\(' test/e2e/utils

Repository: openshift/gcp-pd-csi-driver

Length of output: 2036


🏁 Script executed:

cat -n test/e2e/utils/utils.go | sed -n '299,350p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 2804


🏁 Script executed:

rg -n 'LocalSSDCount' test/e2e/ -A2 -B2

Repository: openshift/gcp-pd-csi-driver

Length of output: 1086


Incomplete opt-out: SetupDataCachingConfig runs regardless of the skip-local-ssd-tests flag.

The flag only changes LocalSSDCount to 0 but doesn't prevent SetupDataCachingConfig from executing unconditionally at line 238. This function always modifies /etc/lvm/lvm.conf via SSH without checking whether the instance actually has a local SSD or supports data caching, so the test suite will still fail on unsupported machine types.

This also affects the hardcoded case at line 211 where LocalSSDCount is set to 0 for hdMachineType—that path will hit the same error.

Either skip the SetupDataCachingConfig call when LocalSSDCount is 0, or make that function conditional on local SSD availability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/tests/setup_e2e_test.go` around lines 65 - 67, The test currently
calls SetupDataCachingConfig unconditionally which modifies /etc/lvm/lvm.conf
even when no local SSD is present; change the call sites so
SetupDataCachingConfig is only invoked when LocalSSDCount > 0 and the
skip-local-ssd-tests flag is false (i.e., check both *skipLocalSsdTests and the
test context's LocalSSDCount before calling SetupDataCachingConfig), and ensure
the hdMachineType branch that sets LocalSSDCount = 0 also avoids calling
SetupDataCachingConfig; alternatively make SetupDataCachingConfig early-return
if LocalSSDCount == 0 so it will be a no-op when no local SSDs exist.

Comment on lines +284 to +302
cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal)
out, err = cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("failed to list cluster: %v %s", err, out)
}
scanner := bufio.NewScanner(strings.NewReader(string(out)))
printedHash := false
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "id: ") {
klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
klog.Infof("GKE cluster hash %s", line)
printedHash = true
break
}
}
if !printedHash {
return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Google Cloud CLI docs: for gcloud container clusters describe, can the output be affected by the core/formatproperty, and is--format='value(id)' the supported way to return only the cluster ID?

💡 Result:

Yes—the default output of gcloud container clusters describe can be affected by the core/default_format property (there isn’t a core/format property; the documented one is core/default_format). The command also supports the global --format flag, which overrides the default output formatting. [1], [2]

And yes—using the global format flag like this is the supported way to return a single field:

gcloud container clusters describe CLUSTER_NAME --format="value(id)"

value(...) is a supported format expression in Google Cloud CLI output formatting. [1], [4]


🏁 Script executed:

# First, check the imports and bufio usage in the file
cat -n test/k8s-integration/cluster.go | head -30

Repository: openshift/gcp-pd-csi-driver

Length of output: 880


🏁 Script executed:

# Check if bufio is imported
rg "^import" test/k8s-integration/cluster.go -A 20

Repository: openshift/gcp-pd-csi-driver

Length of output: 459


🏁 Script executed:

# Verify the exact lines 284-302
sed -n '284,302p' test/k8s-integration/cluster.go

Repository: openshift/gcp-pd-csi-driver

Length of output: 722


🏁 Script executed:

# Check for any other uses of bufio.Scanner in the file to see if removal is safe
rg "bufio" test/k8s-integration/cluster.go

Repository: openshift/gcp-pd-csi-driver

Length of output: 140


Pin gcloud describe to a machine-readable format.

The current code relies on parsing a human-readable output line starting with id: , which breaks if gcloud output formatting changes (e.g., via core/default_format configuration). Use --format=value(id) to request the cluster ID directly instead of scanning unstructured output.

♻️ Proposed fix
-	cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal)
+	cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal, "--format=value(id)")
 	out, err = cmd.CombinedOutput()
 	if err != nil {
-		return fmt.Errorf("failed to list cluster: %v %s", err, out)
+		return fmt.Errorf("failed to describe cluster: %v %s", err, out)
 	}
-	scanner := bufio.NewScanner(strings.NewReader(string(out)))
-	printedHash := false
-	for scanner.Scan() {
-		line := scanner.Text()
-		if strings.HasPrefix(line, "id: ") {
-			klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
-			klog.Infof("GKE cluster hash %s", line)
-			printedHash = true
-			break
-		}
-	}
-	if !printedHash {
-		return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out)
-	}
+	clusterID := strings.TrimSpace(string(out))
+	if clusterID == "" {
+		return fmt.Errorf("failed to read cluster id from gcloud output: %s", out)
+	}
+	klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
+	klog.Infof("GKE cluster hash id: %s", clusterID)

The bufio import can be removed after this change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal)
out, err = cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("failed to list cluster: %v %s", err, out)
}
scanner := bufio.NewScanner(strings.NewReader(string(out)))
printedHash := false
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "id: ") {
klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
klog.Infof("GKE cluster hash %s", line)
printedHash = true
break
}
}
if !printedHash {
return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out)
}
cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal, "--format=value(id)")
out, err = cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("failed to describe cluster: %v %s", err, out)
}
clusterID := strings.TrimSpace(string(out))
if clusterID == "" {
return fmt.Errorf("failed to read cluster id from gcloud output: %s", out)
}
klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
klog.Infof("GKE cluster hash id: %s", clusterID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/k8s-integration/cluster.go` around lines 284 - 302, The code parses
human-readable gcloud output looking for "id: " using exec.Command("gcloud",
"container", "clusters", "describe", ...) and a bufio.Scanner (variables: cmd,
out, err, printedHash, locationArg, locationVal, gkeTestClusterName); change the
command to request a machine-readable value by adding the flag
"--format=value(id)" to the exec.Command args so the command returns only the
cluster ID, then remove the bufio.Scanner loop and simply check out/err and use
the returned string (trim whitespace) as the cluster hash; also remove the
now-unused bufio import.

Comment on lines +94 to +95
return wait.Poll(5*time.Second, time.Minute, func() (bool, error) {
out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the test/remote/ssh.go file
fd -type f -name "ssh.go" | head -20

Repository: openshift/gcp-pd-csi-driver

Length of output: 241


🏁 Script executed:

# If found, read the file to see the SSHCheckAlive method around lines 94-95
cat -n test/remote/ssh.go | sed -n '85,105p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 1130


🏁 Script executed:

# Find the runSSHCommand function definition
rg -n "func.*runSSHCommand" --type go

Repository: openshift/gcp-pd-csi-driver

Length of output: 158


🏁 Script executed:

# Read the runSSHCommand function implementation
cat -n test/remote/ssh.go | sed -n '146,170p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 866


🏁 Script executed:

# Find the sshOption variable definition
rg -n "sshOption\s*=" test/remote/ssh.go | head -5

Repository: openshift/gcp-pd-csi-driver

Length of output: 236


🏁 Script executed:

# Also check the context around it
cat -n test/remote/ssh.go | sed -n '1,50p'

Repository: openshift/gcp-pd-csi-driver

Length of output: 1747


Move ConnectTimeout option before the SSH destination.

The -o ConnectTimeout=10 is positioned after i.GetSSHTarget(), which makes it part of the remote command arguments instead of SSH client options. OpenSSH expects options before the destination; options placed after are passed to the remote shell and have no effect on the SSH connection timeout. This prevents the timeout from being applied and can cause the liveness probe to wait up to the full time.Minute poll duration even on unreachable hosts.

-		out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...)
+		out, err := runSSHCommand("ssh", []string{"-o", "ConnectTimeout=10", i.GetSSHTarget(), "--", "echo"}...)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return wait.Poll(5*time.Second, time.Minute, func() (bool, error) {
out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...)
return wait.Poll(5*time.Second, time.Minute, func() (bool, error) {
out, err := runSSHCommand("ssh", []string{"-o", "ConnectTimeout=10", i.GetSSHTarget(), "--", "echo"}...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/remote/ssh.go` around lines 94 - 95, The ssh option "-o
ConnectTimeout=10" is currently passed after the SSH destination
(i.GetSSHTarget()) so it becomes part of the remote command instead of an SSH
client option; update the runSSHCommand call inside the wait.Poll lambda so the
options are placed before i.GetSSHTarget() (e.g., build the args slice with
"-o", "ConnectTimeout=10" prior to i.GetSSHTarget()), ensuring runSSHCommand and
i.GetSSHTarget() are used with the correct argument order so the ConnectTimeout
is applied.

@dfajmon
Copy link
Copy Markdown
Author

dfajmon commented Mar 12, 2026

/test e2e-gcp-csi-manual-oidc

@tsmetana
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026
@chao007
Copy link
Copy Markdown

chao007 commented Mar 17, 2026

/test e2e-gcp-csi-manual-oidc

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 17, 2026

@dfajmon: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@chao007
Copy link
Copy Markdown

chao007 commented Mar 18, 2026

/verified by CI

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@chao007: This PR has been marked as verified by CI.

Details

In response to this:

/verified by CI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 2ba1174 into openshift:master Mar 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.