STOR-2751: Rebase to upstream v1.24.1 for OCP 4.22#98
STOR-2751: Rebase to upstream v1.24.1 for OCP 4.22#98openshift-merge-bot[bot] merged 208 commits intoopenshift:masterfrom
Conversation
Change-Id: Id9befb7508b785ed1fe96efff48c5fa84a6f1d1a
…egions GCP now offers more than 10 regions in Europe. Update the regex accordingly.
Added filtering for incorrect slo boosting logs
Bumps golang from 1.24.6 to 1.25.0. --- updated-dependencies: - dependency-name: golang dependency-version: 1.25.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates in the / directory: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.22.2 to 2.23.3 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.22.2...v2.23.3) Updates `github.com/onsi/gomega` from 1.36.2 to 1.36.3 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.36.2...v1.36.3) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: onsi - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-patch dependency-group: onsi ... Signed-off-by: dependabot[bot] <support@github.com>
…ot/go_modules/onsi-f75b1e49c1 Bump the onsi group across 1 directory with 2 updates
…ot/docker/golang-1.25.0 Bump golang from 1.24.6 to 1.25.0
Bumps the k8s-dependencies group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.33.3` | `0.33.4` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.33.3` | `0.33.4` | | [k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider) | `0.33.3` | `0.33.4` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.33.3` | `0.33.4` | | [k8s.io/mount-utils](https://github.com/kubernetes/mount-utils) | `0.33.3` | `0.33.4` | Updates `k8s.io/api` from 0.33.3 to 0.33.4 - [Commits](kubernetes/api@v0.33.3...v0.33.4) Updates `k8s.io/apimachinery` from 0.33.3 to 0.33.4 - [Commits](kubernetes/apimachinery@v0.33.3...v0.33.4) Updates `k8s.io/cloud-provider` from 0.33.3 to 0.33.4 - [Commits](kubernetes/cloud-provider@v0.33.3...v0.33.4) Updates `k8s.io/component-base` from 0.33.3 to 0.33.4 - [Commits](kubernetes/component-base@v0.33.3...v0.33.4) Updates `k8s.io/mount-utils` from 0.33.3 to 0.33.4 - [Commits](kubernetes/mount-utils@v0.33.3...v0.33.4) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.33.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s-dependencies - dependency-name: k8s.io/apimachinery dependency-version: 0.33.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s-dependencies - dependency-name: k8s.io/cloud-provider dependency-version: 0.33.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s-dependencies - dependency-name: k8s.io/component-base dependency-version: 0.33.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s-dependencies - dependency-name: k8s.io/mount-utils dependency-version: 0.33.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: k8s-dependencies ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the golang-x group with 2 updates: [golang.org/x/net](https://github.com/golang/net) and [golang.org/x/tools](https://github.com/golang/tools). Updates `golang.org/x/net` from 0.42.0 to 0.43.0 - [Commits](golang/net@v0.42.0...v0.43.0) Updates `golang.org/x/tools` from 0.35.0 to 0.36.0 - [Release notes](https://github.com/golang/tools/releases) - [Commits](golang/tools@v0.35.0...v0.36.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.43.0 dependency-type: indirect update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/tools dependency-version: 0.36.0 dependency-type: indirect update-type: version-update:semver-minor dependency-group: golang-x ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.23.4 to 2.25.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.23.4...v2.25.1) Updates `github.com/onsi/gomega` from 1.36.3 to 1.37.0 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.36.3...v1.37.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.25.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: onsi - dependency-name: github.com/onsi/gomega dependency-version: 1.37.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: onsi ... Signed-off-by: dependabot[bot] <support@github.com>
…ot/go_modules/k8s-dependencies-8e611a1eb0 Bump the k8s-dependencies group with 5 updates
…ot/go_modules/golang-x-bc3441d089 Bump the golang-x group with 2 updates
…ot/go_modules/onsi-26932303e1 Bump the onsi group with 2 updates
Add a knob for the btrfs-specific `bdi/read_ahead_kb`. If my understanding is correct, this value is the upper limit of the btrfs read-ahead of the _underlying file_. Confusingly, this option is not related to `/sys/block/sdX/queue/read_ahead_kb`, which is what controls the read-ahead of the block device (regardless of what's in there). The current `read_ahead_kb` "special" mount option configures the value of the block device. The default `bdi/read_ahead_kb` is 4MiB (on all the systems I've checked). Ext4 does not have this option and relies on the block-device specific readahead, which is 128KiB (again, on all the systems I've checked). After migrating to btrfs we have experienced a notable write amplification and tracked to this setting. Once we changed `bdi/read_ahead_kb` to 128, our IO utilization (and the properties of the underlying workload) became very similar to the one of ext4. Now that we have 3 btrfs-specific "special" mount options, I refactored the code and tests to make it less repetitive.
…d_kb [btrfs] add btrfs-specific `bdi/read_ahead_kb`
Currently btrfs offers 3 methods to balance the filesystem:
1. beginning of time: `btrfs filesystem balance`.
2. kernel v5.19+: `/sys/fs/btrfs/UUID/allocation/{meta,}data/bg_reclaim_threshold`.
3. kernel v6.11+: `/sys/fs/btrfs/UUID/allocation/{meta,}data/dynamic_reclaim`.
(3) is most robust out of the 3, and best documented in the kernel commit
[f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5ff64ccf7bb7274ed66b0d835b2f6ae10af5d7a).
Once GKE rolls up to the kernel v6.11, this will be the recommended
option to set.
Bumps golang from 1.25.0 to 1.25.1. --- updated-dependencies: - dependency-name: golang dependency-version: 1.25.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the golang-x group with 6 updates: | Package | From | To | | --- | --- | --- | | [golang.org/x/oauth2](https://github.com/golang/oauth2) | `0.30.0` | `0.31.0` | | [golang.org/x/sys](https://github.com/golang/sys) | `0.35.0` | `0.36.0` | | [golang.org/x/time](https://github.com/golang/time) | `0.12.0` | `0.13.0` | | [golang.org/x/sync](https://github.com/golang/sync) | `0.16.0` | `0.17.0` | | [golang.org/x/term](https://github.com/golang/term) | `0.34.0` | `0.35.0` | | [golang.org/x/text](https://github.com/golang/text) | `0.28.0` | `0.29.0` | Updates `golang.org/x/oauth2` from 0.30.0 to 0.31.0 - [Commits](golang/oauth2@v0.30.0...v0.31.0) Updates `golang.org/x/sys` from 0.35.0 to 0.36.0 - [Commits](golang/sys@v0.35.0...v0.36.0) Updates `golang.org/x/time` from 0.12.0 to 0.13.0 - [Commits](golang/time@v0.12.0...v0.13.0) Updates `golang.org/x/sync` from 0.16.0 to 0.17.0 - [Commits](golang/sync@v0.16.0...v0.17.0) Updates `golang.org/x/term` from 0.34.0 to 0.35.0 - [Commits](golang/term@v0.34.0...v0.35.0) Updates `golang.org/x/text` from 0.28.0 to 0.29.0 - [Release notes](https://github.com/golang/text/releases) - [Commits](golang/text@v0.28.0...v0.29.0) --- updated-dependencies: - dependency-name: golang.org/x/oauth2 dependency-version: 0.31.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/sys dependency-version: 0.36.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/time dependency-version: 0.13.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/sync dependency-version: 0.17.0 dependency-type: indirect update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/term dependency-version: 0.35.0 dependency-type: indirect update-type: version-update:semver-minor dependency-group: golang-x - dependency-name: golang.org/x/text dependency-version: 0.29.0 dependency-type: indirect update-type: version-update:semver-minor dependency-group: golang-x ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the onsi group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.25.1 to 2.25.3 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.25.1...v2.25.3) Updates `github.com/onsi/gomega` from 1.37.0 to 1.38.2 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.37.0...v1.38.2) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.25.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: onsi - dependency-name: github.com/onsi/gomega dependency-version: 1.38.2 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: onsi ... Signed-off-by: dependabot[bot] <support@github.com>
…ot/docker/golang-1.25.1 Bump golang from 1.25.0 to 1.25.1
…ot/go_modules/golang-x-cdf31bdd0d Bump the golang-x group with 6 updates
…ot/go_modules/onsi-8f1ed76d56 Bump the onsi group with 2 updates
[btrfs] support dynamic_reclaim
|
/verified by CI |
|
@chao007: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
and remove .github go mod vendor && go mod tidy
Made-with: Cursor
|
@dfajmon: This pull request references STOR-2751 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis pull request introduces a comprehensive refactoring to reorganize code packages and add support for dynamic volumes and machine-type-based disk labeling. Changes include moving common utilities to new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Request
participant Controller as Controller Server
participant DiskSelection as Disk Selection
participant CloudProvider as GCE Cloud Provider
participant Node as Node Server
Client->>Controller: CreateVolumeRequest (with parameters)
activate Controller
Controller->>DiskSelection: SelectDisk (params, topology, source)
activate DiskSelection
DiskSelection->>CloudProvider: GetDisk (source volume)
CloudProvider-->>DiskSelection: disk type (if source)
DiskSelection->>DiskSelection: evaluate topology preferences
DiskSelection-->>Controller: selected disk type
deactivate DiskSelection
Controller->>Controller: extract DiskParameters
Controller->>CloudProvider: InsertDisk (with disk type)
activate CloudProvider
CloudProvider-->>Controller: disk created
deactivate CloudProvider
Controller->>Node: volume available
Node->>Node: stage volume (mount, format)
Node-->>Client: volume ready
deactivate Controller
sequenceDiagram
participant K8sAPI as Kubernetes API
participant NodeLabeler as Node Labeler
participant ConfigMap as ConfigMap (machine-pd-compatibility)
participant Node as Node Resource
K8sAPI->>NodeLabeler: ConfigMap changed event
activate NodeLabeler
NodeLabeler->>ConfigMap: fetch machine-pd-compatibility.json
activate ConfigMap
ConfigMap-->>NodeLabeler: disk compatibility matrix
deactivate ConfigMap
NodeLabeler->>NodeLabeler: UpdateMachinePDCompatibility
NodeLabeler->>K8sAPI: list all Nodes
activate K8sAPI
K8sAPI-->>NodeLabeler: node list
deactivate K8sAPI
NodeLabeler->>Node: Reconcile (per node)
activate Node
Node->>Node: extract instance-type label
Node->>Node: compute compatible disks
Node->>Node: patch labels (disk-type.gke.io/*)
K8sAPI->>Node: update applied
deactivate Node
NodeLabeler-->>K8sAPI: reconciliation complete
deactivate NodeLabeler
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
@dfajmon: This pull request references STOR-2751 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dfajmon: This pull request references STOR-2751 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
pkg/deviceutils/device-utils.go (1)
352-356:⚠️ Potential issue | 🔴 CriticalNil pointer dereference when
erris nil butdevFsSerialis empty.When
getDevFsSerialreturns an empty serial with a nil error (e.g., for an unrecognized device prefix per line 315), the condition on line 352 evaluates to true, but line 355 unconditionally callserr.Error(), causing a panic.🐛 Proposed fix to handle nil error safely
if err != nil || len(devFsSerial) == 0 { // If we get an error, ignore. Either this isn't a block device, or it // isn't something we can get a serial number from - klog.Errorf("failed to get serial num for disk %s at device path %s: %v", deviceName, devFsPath, err.Error()) + klog.V(4).Infof("skipping device path %s for disk %s: serial=%q, err=%v", devFsPath, deviceName, devFsSerial, err) continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/deviceutils/device-utils.go` around lines 352 - 356, The klog.Errorf call in the loop after getDevFsSerial can dereference a nil err because it unconditionally uses err.Error() when devFsSerial == "" but err == nil; update the error logging in the block that checks "if err != nil || len(devFsSerial) == 0" (around getDevFsSerial usage) so it only calls err.Error() when err != nil, e.g. build a message variable that includes the returned error when present and otherwise logs a descriptive message that serial is empty for deviceName/devFsPath (referencing getDevFsSerial, devFsSerial, err, deviceName, devFsPath, and klog.Errorf).test/run-k8s-integration.sh (1)
83-85:⚠️ Potential issue | 🟠 MajorInverted condition: flag added when variable is empty.
The condition
[ -z "$gke_node_version" ]is true whengke_node_versionis empty, but then it adds--gke-node-version=${gke_node_version}which would pass an empty value. This appears inverted — typically you'd add the flag when the variable is set (non-empty):🐛 Proposed fix
-if [ -z "$gke_node_version" ]; then +if [ -n "$gke_node_version" ]; then base_cmd="${base_cmd} --gke-node-version=${gke_node_version}" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/run-k8s-integration.sh` around lines 83 - 85, The condition is inverted: it appends the --gke-node-version flag to base_cmd when gke_node_version is empty; change the test to only append the flag when gke_node_version is non-empty (i.e., test for non-zero length of gke_node_version) so base_cmd only gets --gke-node-version=${gke_node_version} when the variable is set.pkg/gce-cloud-provider/compute/fake-gce.go (1)
174-190:⚠️ Potential issue | 🟡 MinorReject unresolved dynamic disk types in the fake provider.
InsertDiskcurrently turns anyparams.DiskTypeinto a URI. If the controller accidentally passesparameters.DynamicVolumeTypethrough instead of resolving it to a real PD/Hyperdisk type, tests will still succeed here even though GCE would reject that disk type.🔧 Suggested guard
func (cloud *FakeCloudProvider) InsertDisk(ctx context.Context, project string, volKey *meta.Key, params parameters.DiskParameters, capBytes int64, capacityRange *csi.CapacityRange, replicaZones []string, snapshotID string, volumeContentSourceVolumeID string, multiWriter bool, accessMode string) error { + if params.DiskType == parameters.DynamicVolumeType { + return fmt.Errorf("unresolved disk type %q", params.DiskType) + } if disk, ok := cloud.disks[volKey.String()]; ok { err := ValidateExistingDisk(ctx, disk, params, int64(capacityRange.GetRequiredBytes()),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-cloud-provider/compute/fake-gce.go` around lines 174 - 190, In InsertDisk of FakeCloudProvider, add a guard that rejects unresolved dynamic disk types: check params.DiskType against parameters.DynamicVolumeType before calling cloud.GetDiskTypeURI (or building computeDisk) and return a clear error if it's still DynamicVolumeType, so tests cannot pass with an unresolved dynamic type; reference the InsertDisk function, params.DiskType, parameters.DynamicVolumeType, and cloud.GetDiskTypeURI when making the change.pkg/gce-pd-csi-driver/controller.go (1)
1163-1169:⚠️ Potential issue | 🟠 MajorUse
instanceZonewhen repairing publish keys.The new repair helper now errors when the same zonal disk name exists in multiple zones and no fallback is provided. In this path the target node’s zone is already known, so passing
""regresses attach for legacy underspecified volume IDs that could still be resolved safely.Suggested fix
-project, volKey, err = gceCS.CloudProvider.RepairUnderspecifiedVolumeKey(ctx, project, volKey, "") +project, volKey, err = gceCS.CloudProvider.RepairUnderspecifiedVolumeKey(ctx, project, volKey, instanceZone)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/controller.go` around lines 1163 - 1169, The repair call to gceCS.CloudProvider.RepairUnderspecifiedVolumeKey passes an empty fallback zone, which breaks resolution when the same disk name exists in multiple zones; update the call in the ControllerPublishVolume path to pass the known target node zone (instanceZone) instead of "" so RepairUnderspecifiedVolumeKey can disambiguate the zonal disk; locate the call using the symbols gceCS.CloudProvider.RepairUnderspecifiedVolumeKey, project, volKey and instanceZone and replace the empty-string fallback with instanceZone, keeping the existing error handling logic unchanged.
🟡 Minor comments (32)
pkg/deviceutils/device-utils.go-367-373 (1)
367-373:⚠️ Potential issue | 🟡 Minor
os.Symlinkfails if a stale symlink already exists at the target path.If a broken or stale symlink already exists at
/dev/disk/by-id/google-<serial>,os.Symlinkreturns an error. Consider removing any existing file before creating the symlink.🛠️ Proposed fix to handle existing symlinks
func manuallySetDevicePath(deviceName string) error { devFsPath, devFsSerial, err := findDevice(deviceName) if err != nil { return err } - return os.Symlink(devFsPath, path.Join(diskByIdPath, diskGooglePrefix+devFsSerial)) + symlinkPath := path.Join(diskByIdPath, diskGooglePrefix+devFsSerial) + // Remove any existing stale symlink before creating a new one + if err := os.Remove(symlinkPath); err != nil && !os.IsNotExist(err) { + return fmt.Errorf("failed to remove existing symlink %s: %w", symlinkPath, err) + } + return os.Symlink(devFsPath, symlinkPath) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/deviceutils/device-utils.go` around lines 367 - 373, In manuallySetDevicePath, os.Symlink can fail if a stale/broken symlink already exists at the target (path.Join(diskByIdPath, diskGooglePrefix+devFsSerial)); before calling os.Symlink, compute target := path.Join(...), use os.Lstat(target) to see if something exists, if it is an existing symlink remove it (os.Remove(target)) and if a non-symlink file/dir exists return a clear error; then call os.Symlink(devFsPath, target) and propagate any error from that call. Ensure you reference devFsPath, devFsSerial, diskByIdPath and diskGooglePrefix inside manuallySetDevicePath.test/k8s-integration/cluster.go-259-265 (1)
259-265:⚠️ Potential issue | 🟡 MinorMake the exclusion duration consistent.
Line 264 adds only 2 hours, but this block says it is creating a 4-hour exclusion. Either update the comment or extend the end time to 4 hours; otherwise longer runs can still be upgraded mid-test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/k8s-integration/cluster.go` around lines 259 - 265, The comment says a 4-hour maintenance exclusion is created but the code uses startExclusionTime.Add(2*time.Hour); update the duration to match the comment by changing the end time expression to startExclusionTime.Add(4*time.Hour) (the exec.Command call that builds the gcloud args) or alternately change the comment to state 2 hours—prefer updating the end to 4 hours so startExclusionTime and the "--add-maintenance-exclusion-end" argument reflect a 4-hour window.README.md-112-112 (1)
112-112:⚠️ Potential issue | 🟡 MinorFix the duplicated word in the link text.
See more in the [in btrfs docs]reads like a typo in the rendered README.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 112, The README link text contains a duplicated word: change the link text "[in btrfs docs]" to remove the extra "in" (e.g., use "[btrfs docs]" or "[the btrfs docs]") so the sentence reads correctly while preserving the URL; locate the instance of the link text "[in btrfs docs]" in README.md and update it accordingly.docs/kubernetes/user-guides/dynamic-volumes.md-40-40 (1)
40-40:⚠️ Potential issue | 🟡 MinorFix the internal anchor in the table entry.
The fragment
#applying-disk-support-labelsdoes not match this file’s numbered heading, so the “more information” link is broken in the rendered doc.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/kubernetes/user-guides/dynamic-volumes.md` at line 40, The table row for `use-allowed-disk-topology` contains a broken internal link fragment `#applying-disk-support-labels`; open this file, find the actual heading ID for the section about applying support labels (the heading used for "Apply Support Labels" or its numbered variant) and replace `#applying-disk-support-labels` with the correct fragment so the "(See ...)" link points to the right anchor; update only the fragment in the table entry referencing `use-allowed-disk-topology` to match the exact heading slug used in the document.CHANGELOG/CHANGELOG-1.18.md-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorDon't check in an empty changelog placeholder.
This lands as docs, but it currently contains only a TODO. Either populate the notes now or omit the file until they're ready.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.18.md` around lines 1 - 2, The changelog file currently contains only a TODO placeholder under the heading "# v1.18.0 - Changelog since v1.17.12"; either populate that section with the actual release notes (replace the TODO line with the generated release notes content) or remove the file/commit so no empty placeholder is checked in — update the content under the "# v1.18.0 - Changelog since v1.17.12" heading accordingly.pkg/convert/convert_test.go-289-295 (1)
289-295:⚠️ Potential issue | 🟡 MinorThe
>50 tagscase is failing for the wrong reason.
p1…p51are invalidparent_idvalues for this parser, so this subtest never reaches the limit guard. Use valid org/project IDs here so the count check is what actually gets exercised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/convert/convert_test.go` around lines 289 - 295, The "50 tags at most" subtest is using invalid parent_id values (p1…p51) so the parser bails before hitting the 50-tag guard; update the test case's tags field (the test with name "50 tags at most" in convert_test.go) to use valid org/project-style parent IDs (i.e., real-looking org and project identifiers) for each tag so the parser accepts them and the count limit check is exercised; keep the same number of tags (51) and expectedError=true so the test fails due to exceeding the 50-tag limit rather than invalid parent IDs.pkg/convert/convert.go-72-75 (1)
72-75:⚠️ Potential issue | 🟡 MinorFix the field name in the invalid-label-key error.
This branch validates
key, but the message sayslabel value, so bad keys point operators at the wrong side of the input.Proposed fix
checkLabelKeyFn := func(key string) error { if !regexKey.MatchString(key) { - return fmt.Errorf("label value %q is invalid (should start with lowercase letter / lowercase letter, digit, _ and - chars are allowed / 1-63 characters", key) + return fmt.Errorf("label key %q is invalid (should start with lowercase letter / lowercase letter, digit, _ and - chars are allowed / 1-63 characters", key) } return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/convert/convert.go` around lines 72 - 75, The error message in checkLabelKeyFn incorrectly says "label value" while validating the key; update the fmt.Errorf call (the one using regexKey and key) to say "label key %q is invalid ..." instead of "label value %q is invalid ..." so invalid keys produce the correct field name in the error; keep the rest of the message and formatting unchanged.pkg/convert/convert_test.go-143-160 (1)
143-160:⚠️ Potential issue | 🟡 MinorThese cases never hit the label-value length check.
Both fixtures put the long string on the key side of
=, so they only re-test key validation and leave value-length behavior uncovered.Proposed fix
{ name: "label value may not have over 63 characters", - labels: "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij1234=v", + labels: "k=abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij1234", expectedError: true, }, @@ { name: "label value can have up to 63 characters", - labels: "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij123=v", + labels: "k=abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij123", expectedError: false, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/convert/convert_test.go` around lines 143 - 160, The two failing test cases ("label value may not have over 63 characters" and "label value can have up to 63 characters") currently place the long string on the key side of the '=' so they exercise key validation instead of value-length validation; update the test fixtures in convert_test.go so the long 63+/63-length string is on the value side (e.g., use "k=<long-string>" rather than "<long-string>=v") for those two cases to ensure the label-value length checks in the code paths exercised by the test are actually covered.CHANGELOG/CHANGELOG-1.10.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorFix malformed markdown link.
The markdown link syntax is incorrect - missing the opening parenthesis before the URL.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1602`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1602`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.10.md` at line 7, Fix the malformed markdown link in the CHANGELOG entry containing "Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1602`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))" by inserting the missing opening parenthesis before the PR URL so the PR reference becomes a proper markdown link (e.g., "([`#1602`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1602))") and ensure any extra closing parentheses are corrected around the PR and author references.CHANGELOG/CHANGELOG-1.11.md-94-220 (1)
94-220:⚠️ Potential issue | 🟡 MinorRemove the stray dependency block under v1.11.5.
Starting at Line 94, a large dependency list is pasted after
### Removed/_Nothing has changed._, begins mid-token (pigateway), and runs until the next real section header. That makes the v1.11.5 changelog structurally inaccurate and very hard to read.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.11.md` around lines 94 - 220, Remove the stray dependency block that was pasted into the v1.11.5 section: locate the "v1.11.5" / "### Removed" section containing the line starting with "pigateway: v1.4.0 → v1.6.1" (directly after "_Nothing has changed._") and delete the entire dependency list up to the next real section header so the v1.11.5 changelog only contains its intended content.CHANGELOG/CHANGELOG-1.11.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorFix the broken PR link in the v1.11.9 entry.
Line 7 has malformed Markdown, so the PR reference will not render correctly in the changelog.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1601`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1601`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.11.md` at line 7, The changelog entry "Change GetDisk error reporting to temporary in CreateVolume codepath" has malformed Markdown for the PR link; fix it by making the PR number a proper link like [`#1601`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1601) and ensure the trailing comma/parentheses are balanced so the username link [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot) renders correctly (remove the extra closing parenthesis and place commas/parentheses appropriately).CHANGELOG/CHANGELOG-1.13.md-29-30 (1)
29-30:⚠️ Potential issue | 🟡 MinorKeep the cherry-pick entry in a single bullet.
Line 30 starts with
#1667:on a new line, so the second cherry-picked PR falls out of the list and renders as malformed Markdown.📝 Proposed fix
-- Automated cherry pick of `#1666`: migrate hyperdisk/chd/storagepools to GCE v1 disk API -#1667: remove support for GCE Alpha Disks by `@amacaskill` in `#1669` +- Automated cherry pick of `#1666`: migrate hyperdisk/chd/storagepools to GCE v1 disk API, `#1667`: remove support for GCE Alpha Disks by `@amacaskill` in `#1669`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.13.md` around lines 29 - 30, The cherry-pick entry is split across two lines causing malformed Markdown; merge the two lines into a single bullet so the entire entry remains one list item — combine the text so the bullet reads something like the original first line followed by the second (include the PR refs `#1666`, `#1667` and the mention of `@amacaskill` and `#1669` together) ensuring the line begins with a single "-" to keep it a proper list item.CHANGELOG/CHANGELOG-1.13.md-69-72 (1)
69-72:⚠️ Potential issue | 🟡 MinorRejoin the wrapped links in the v1.13.3 section.
The PR and author URLs are split across Lines 69-72, which breaks both markdown links in the rendered changelog.
📝 Proposed fix
-- Flag --use-instance-api-to-poll-attachment-disk-types uses instances.get API when polling for disk attachment in ControllerPublish for passed in disk types ([`#1630`](https://github.com/kube -rnetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1630), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) -- Support for read_ahead_kb mount flag to change block device readahead ([`#1631`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1631), [`@k8s-infra-cherrypick-` -robot](https://github.com/k8s-infra-cherrypick-robot)) +- Flag --use-instance-api-to-poll-attachment-disk-types uses instances.get API when polling for disk attachment in ControllerPublish for passed in disk types ([`#1630`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1630), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Support for read_ahead_kb mount flag to change block device readahead ([`#1631`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1631), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.13.md` around lines 69 - 72, The changelog has wrapped/broken markdown links in the v1.13.3 bullets: the PR and author URLs for the entries starting with "Flag --use-instance-api-to-poll-attachment-disk-types" and "Support for read_ahead_kb mount flag" are split across lines (the "[`#1630`](" and "[`@k8s-infra-cherrypick-robot`](" fragments and the "[`#1631`](" and "[`@k8s-infra-cherrypick-robot`](" fragments). Fix this by rejoining each broken link so each PR link and each author link is a single continuous markdown link (e.g., [`#1630`](https://github.com/...) and [`@k8s-infra-cherrypick-robot`](https://github.com/...) and similarly for `#1631`), preserving the existing text and punctuation.CHANGELOG/CHANGELOG-1.9.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorFix the broken PR link in the v1.9.17 entry.
Line 7 has malformed Markdown, so the PR reference will not render as a clickable link in the changelog.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1603`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1603`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.9.md` at line 7, The changelog entry has malformed Markdown for the PR link in the v1.9.17 entry; update the text so the PR reference is a proper Markdown link (use [`#1603`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1603) form) and remove the stray characters so the contributor mention ([`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) remains a valid link; edit the line containing "Change GetDisk error reporting..." to replace the broken fragment with the correctly formatted PR and author links.CHANGELOG/CHANGELOG-1.14.md-47-55 (1)
47-55:⚠️ Potential issue | 🟡 MinorRemove the stray fenced block in the 1.14.0 entry.
Line 50 opens a code fence that never closes, so the rest of the changelog renders incorrectly. That also explains the downstream heading warning.
📝 Suggested fix
- The --instances-list-filters command line flag allows the driver to filter on response properties when listing VMs from the GCE API in ListVolumes RPC. - The --use-instance-api-to-list-volumes-published-nodes command line flag allows the driver to use the instances.list API instead of disks.list API when determining PublishedNodeIs in ListVolumes RPC. - ``` - **Testing**: - Validated existing behavior with: ([`#1696`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1696), [`@pwschuurman`](https://github.com/pwschuurman))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.14.md` around lines 47 - 55, Remove the stray opening fenced code block in the 1.14.0 changelog entry: locate the lone backtick fence that begins before the "Testing" section (the unmatched "```") and delete it so the "Testing" paragraph and following bullets render normally; ensure the bullet list under "Testing" is properly formatted (no stray empty list markers) after removing the fence.CHANGELOG/CHANGELOG-1.12.md-162-162 (1)
162-162:⚠️ Potential issue | 🟡 MinorOrphaned dependency content appears truncated.
Line 162 starts mid-text with
e.com/go/area120: v0.6.0 → v0.8.1which appears to be a continuation from a previous changelog entry that was improperly merged. This content block (lines 162-289) seems to belong under the v1.12.3 "Changed" section but the version header and package name prefix are missing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.12.md` at line 162, The changelog contains an orphaned line starting with "e.com/go/area120: v0.6.0 → v0.8.1" that was truncated and is missing its package prefix and the v1.12.3 "Changed" header; restore this block by moving that line (and the following lines in the block) under the v1.12.3 "Changed" section and prepend the correct package prefix so the entry reads as a complete package line (e.g., restore the missing leading characters so it becomes "github.com/go/area120: v0.6.0 → v0.8.1" or the actual original package name), and ensure the v1.12.3 "Changed" header is present above the block so the entry is properly grouped.pkg/parameters/sanitize.go-5-6 (1)
5-6:⚠️ Potential issue | 🟡 MinorTypo in comment: "sanitizeer" should be "sanitizer".
📝 Proposed fix
-// Map of sanitizers for parameters that apply to only a subset of disk types. Each sanitizeer +// Map of sanitizers for parameters that apply to only a subset of disk types. Each sanitizer // should only be responsible for a single parameter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/parameters/sanitize.go` around lines 5 - 6, Update the comment on the map of sanitizers to correct the typo: change "sanitizeer" to "sanitizer" so the sentence reads that each sanitizer should only be responsible for a single parameter; edit the comment near the map of sanitizers in pkg/parameters/sanitize.go accordingly.CHANGELOG/CHANGELOG-1.8.md-415-415 (1)
415-415:⚠️ Potential issue | 🟡 MinorTrailing space in link text.
The author link has a trailing space before the closing bracket.
📝 Proposed fix
-- Fix missing shared library libedit.so.2 caused from updating base image in [`#1162`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1162) ([`#1177`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1177), [`@sunnylovestiramisu` ](https://github.com/sunnylovestiramisu)) +- Fix missing shared library libedit.so.2 caused from updating base image in [`#1162`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1162) ([`#1177`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1177), [`@sunnylovestiramisu`](https://github.com/sunnylovestiramisu))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.8.md` at line 415, The changelog entry contains a trailing space in the author link text "[`@sunnylovestiramisu` ]" which causes an extra space before the closing bracket; edit CHANGELOG-1.8.md and remove the trailing space so the link reads "[`@sunnylovestiramisu`]" (update the specific line containing the entry shown in the diff).CHANGELOG/CHANGELOG-1.12.md-49-49 (1)
49-49:⚠️ Potential issue | 🟡 MinorDuplicate broken markdown link syntax.
Same malformed link format as line 40.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.12.md` at line 49, The changelog line contains a malformed/duplicated markdown link for the PR reference in the text "Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))"; fix it by replacing the broken combined link with a single properly formatted markdown link for the PR (e.g., [`#1600`](https://github.com/.../pull/1600)) and ensure the maintainer mention remains a valid link ([`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) with balanced parentheses and no extra punctuation.CHANGELOG/CHANGELOG-1.12.md-40-40 (1)
40-40:⚠️ Potential issue | 🟡 MinorBroken markdown link syntax.
The link format is malformed - missing opening parenthesis before the URL.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.12.md` at line 40, Fix the malformed Markdown link in the changelog line that currently reads "([`#1600`])https://github.com/...)" by converting it to the proper inline link syntax; replace "([`#1600`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600)" with "([`#1600`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1600))" and ensure the contributor reference "[`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)" has matching parentheses (remove any extra trailing parenthesis).CHANGELOG/CHANGELOG-1.8.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 MinorBroken markdown link syntax.
Same malformed link format as seen in CHANGELOG-1.12.md.
📝 Proposed fix
-- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1604`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot)) +- Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1604`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.8.md` at line 7, The changelog line contains malformed Markdown link syntax around the PR and author references in the text "Change GetDisk error reporting to temporary in CreateVolume codepath ([`#1604`])https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604), [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot))"; fix it by converting the PR reference to a proper link like [`#1604`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/1604) and ensure the author reference is a valid link [`@k8s-infra-cherrypick-robot`](https://github.com/k8s-infra-cherrypick-robot) with no extra parentheses or stray characters so the line reads correctly.pkg/gce-pd-csi-driver/disk_selection.go-59-61 (1)
59-61:⚠️ Potential issue | 🟡 MinorTypo: "unspecfied" should be "unspecified".
📝 Suggested fix
// Determine default disk type based on preference parameter. If the parameter is - // unspecfied than default to hdType. + // unspecified then default to hdType. defaultDiskType := hdType🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/disk_selection.go` around lines 59 - 61, Fix the typo in the comment above the defaultDiskType variable: change "unspecfied" to "unspecified" in the comment that explains determining default disk type (referencing defaultDiskType and hdType) so the comment reads "...If the parameter is unspecified then default to hdType."pkg/gce-pd-csi-driver/disk_selection_test.go-162-187 (1)
162-187:⚠️ Potential issue | 🟡 MinorDuplicate test cases with misleading names.
The test cases "pd default override" (lines 163-174) and "hd default override" (lines 176-187) are identical - both set
ParameterDiskPreferencetoParameterHDTypeand expectDefaultto behdTestDiskType. One of them should test the PD preference scenario.🐛 Suggested fix
{ desc: "pd default override", parameters: map[string]string{ parameters.ParameterPDType: pdTestDiskType, parameters.ParameterHDType: hdTestDiskType, - parameters.ParameterDiskPreference: parameters.ParameterHDType, + parameters.ParameterDiskPreference: parameters.ParameterPDType, }, want: &dynamicDiskTypes{ PD: pdTestDiskType, HD: hdTestDiskType, - Default: hdTestDiskType, + Default: pdTestDiskType, }, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/disk_selection_test.go` around lines 162 - 187, The two test cases in disk_selection_test.go named "pd default override" and "hd default override" are identical; update the one labeled "pd default override" (or the second duplicate) so it actually tests PD preference: set parameters.ParameterDiskPreference to parameters.ParameterPDType (instead of ParameterHDType), update the desc to "pd default override" if needed, and change the expected want.Default in the dynamicDiskTypes to pdTestDiskType to assert PD becomes the default.CHANGELOG/CHANGELOG-1.1.md-17-18 (1)
17-18:⚠️ Potential issue | 🟡 MinorMinor grammar fix: hyphenate "cherry-picked".
The compound adjective should be hyphenated.
📝 Suggested fix
-- Update GCE PD CSI Driver Docker base image to `k8s.gcr.io/build-image/debian-base-amd64:v2.1.3` (previously `gcr.io/google-containers/debian-base-amd64:v2.0.0`) to address CVEs. ([`#596`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/596), [`@saad-ali`](https://github.com/saad-ali)) - - Also cherry picked to 1.0.1. +- Update GCE PD CSI Driver Docker base image to `k8s.gcr.io/build-image/debian-base-amd64:v2.1.3` (previously `gcr.io/google-containers/debian-base-amd64:v2.0.0`) to address CVEs. ([`#596`](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/596), [`@saad-ali`](https://github.com/saad-ali)) + - Also cherry-picked to 1.0.1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.1.md` around lines 17 - 18, Fix the minor grammar by hyphenating the compound adjective: update the phrase "Also cherry picked to 1.0.1." to "Also cherry-picked to 1.0.1." (look for the sentence containing "cherry picked" in the CHANGELOG entry mentioning the GCE PD CSI Driver Docker base image).pkg/gce-pd-csi-driver/node_test.go-673-694 (1)
673-694:⚠️ Potential issue | 🟡 MinorRecreate the fake Btrfs sysfs tree per subtest.
These fixtures are initialized once but mutated by
NodeStageVolume. Later cases then inherit values written by earlier ones, so some Btrfs assertions can pass without actually exercising the write path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/node_test.go` around lines 673 - 694, The test currently creates the btrfsFixtures tree once (btrfsFixtures, btrfsPrefix, tempDir) and reuses it across multiple subtests, allowing state written by NodeStageVolume to leak into later cases; change the test to recreate (or clear and re-create) the fake Btrfs sysfs tree before each subtest iteration so each case starts with fresh files. Concretely, move the loop that creates dirs and writes files (using btrfsFixtures -> fullPath, path.Dir, os.MkdirAll, os.WriteFile) into the per-subtest setup (or add code to remove btrfsPrefix and re-run creation) so NodeStageVolume tests cannot inherit prior mutations.CHANGELOG/CHANGELOG-1.3.md-75-75 (1)
75-75:⚠️ Potential issue | 🟡 MinorNormalize the heading hierarchy in these release sections.
Both blocks jump from
#straight to###, which trips markdownlint MD001 and makes the document structure inconsistent. Either add a## Changes by Kindwrapper or promote these headings to##.Also applies to: 92-92
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-1.3.md` at line 75, The release sections jump from top-level headings to "###" which violates MD001; update the release blocks by either adding a parent "## Changes by Kind" wrapper above the existing "###" subsections or promoting each "### ..." heading (for example the "### Issues" heading and the other similar "###" heading referenced) to "##" so the hierarchy is consistent and markdownlint MD001 is satisfied; apply the same change to the second occurrence noted in the review.CHANGELOG/CHANGELOG-0.7.md-5-5 (1)
5-5:⚠️ Potential issue | 🟡 MinorFix the
PodSecurityPolicytypo.
PodSecurityPoliciyis misspelled here, which makes the release note harder to search and reuse downstream.📝 Suggested fix
-- Adding `PodSecurityPoliciy` to allow `csi-gce-pd-node` in clusters with policies enabled. +- Adding `PodSecurityPolicy` to allow `csi-gce-pd-node` in clusters with policies enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG/CHANGELOG-0.7.md` at line 5, Typo in the changelog: replace the misspelled token "PodSecurityPoliciy" with the correct "PodSecurityPolicy" in the release note sentence referring to allowing `csi-gce-pd-node`; update the string exactly so downstream searches and reuse work (search for the literal "PodSecurityPoliciy" and correct it to "PodSecurityPolicy").pkg/gce-pd-csi-driver/node_test.go-1412-1415 (1)
1412-1415:⚠️ Potential issue | 🟡 MinorThis
blkidCalledsanity check never triggers.
blkidCalledis never updated while building the fake command list, so this guard cannot catch a test case that accidentally expects the UUID lookup without any Btrfs settings.🛠️ Suggested fix
case "blkid": if strings.Contains(cmd.args, "TYPE") { cmd.stdout = fmt.Sprintf(cmd.stdout, fsType) + } else if strings.Contains(cmd.args, "--match-tag UUID") { + blkidCalled = true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/node_test.go` around lines 1412 - 1415, The test's blkidCalled guard never fires because blkidCalled is never set when the fake blkid command is appended; update the fake command-list construction to set blkidCalled = true at the moment you add the blkid command to the fake exec/commands (the same place where other fake commands are pushed when building the fakeExec), so the final check that inspects blkidCalled can correctly detect when a blkid invocation was expected; look for the blkidCalled variable and the block that appends the blkid fake command in node_test.go and set it there.pkg/parameters/utils_test.go-185-185 (1)
185-185:⚠️ Potential issue | 🟡 MinorMinor typo in test error message.
There's a typo "bu tgot" which should be "but got".
Proposed fix
- t.Errorf("test failed: the provided key %s expected to be %v bu tgot %v", tc.diskEncryptionKmsKey, tc.expectedIsValid, isValid) + t.Errorf("test failed: the provided key %s expected to be %v but got %v", tc.diskEncryptionKmsKey, tc.expectedIsValid, isValid)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/parameters/utils_test.go` at line 185, Fix the typo in the test error message: update the t.Errorf call that currently prints "the provided key %s expected to be %v bu tgot %v" (using tc.diskEncryptionKmsKey, tc.expectedIsValid, isValid) to read "the provided key %s expected to be %v but got %v" so the message correctly says "but got".pkg/parameters/parameters.go-323-332 (1)
323-332:⚠️ Potential issue | 🟡 MinorError message references "boolean" but function handles availability class.
The error message at line 331 says "unexpected boolean string" but this function converts availability class strings, not booleans.
Proposed fix
func convertStringToAvailabilityClass(str string) (string, error) { switch strings.ToLower(str) { case ParameterNoAvailabilityClass: return ParameterNoAvailabilityClass, nil case ParameterRegionalHardFailoverClass: return ParameterRegionalHardFailoverClass, nil } - return "", fmt.Errorf("unexpected boolean string %s", str) + return "", fmt.Errorf("unexpected availability class string %s", str) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/parameters/parameters.go` around lines 323 - 332, The error message in convertStringToAvailabilityClass incorrectly says "unexpected boolean string"; update the fmt.Errorf call in that function to reference availability class (e.g., "unexpected availability class string %s") so the error accurately reflects what the function handles (keep the input str included). Touch the convertStringToAvailabilityClass function only and preserve existing constants ParameterNoAvailabilityClass and ParameterRegionalHardFailoverClass.pkg/gce-pd-csi-driver/controller_test.go-1372-1509 (1)
1372-1509:⚠️ Potential issue | 🟡 MinorThese dynamic-volume cases still don't prove the dynamic path works.
The
"enabled but standard type"case never setsenableDynamicVolumes, and the selection cases only compareVolumeId/topology. They will still pass if the feature gate is off or ifpd-type/hyperdisk-typeresolution is skipped. Please flip the flag on the first case and assert the created fake disk'sTypein the selection cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/controller_test.go` around lines 1372 - 1509, Enable the dynamic-volumes feature for the test case named "success with dynamic volumes enabled but standard type" by setting enableDynamicVolumes: true on that case, and in the dynamic-selection cases (e.g., cases with parameters.ParameterKeyType == parameters.DynamicVolumeType and names like "success with dynamic volumes selecting disk" / "success with dynamic volumes selecting disk override") add assertions against the fake disk created by the test harness to verify its Type field matches the resolved disk type (e.g., "pd-balanced" or "hyperdisk-balanced") according to parameters.ParameterPDType, parameters.ParameterHDType and parameters.ParameterDiskPreference; use the same test fixtures/helpers used elsewhere in controller_test.go to access the created fake disk and assert disk.Type equals the expected value.pkg/gce-pd-csi-driver/controller.go-653-661 (1)
653-661:⚠️ Potential issue | 🟡 MinorReturn a diagnosable user error from this guard.
If this fallback check ever trips,
status.Errorf(codes.Internal, "")gives callers an emptyInternalerror for what is really an invalid access mode request.Suggested fix
} else if disksWithUnsettableAccessMode[params.DiskType] { // Disallow multi-attach for HdT and HdE. These checks were done in `createVolumeInternal`, // but repeating them here future-proves us from possible refactors. if am != constants.GCEReadWriteOnceAccessMode { - return "", status.Errorf(codes.Internal, "") + return "", status.Errorf(codes.InvalidArgument, "invalid access mode %q for disk type %s", am, params.DiskType) } } else { return am, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gce-pd-csi-driver/controller.go` around lines 653 - 661, The guard in controller.go using common.IsHyperdisk and getHyperdiskAccessModeFromCapabilities currently returns an empty Internal error (status.Errorf(codes.Internal, "")); change this to return a diagnosable user-facing error (e.g., codes.InvalidArgument) with a clear message that includes the disk type (params.DiskType) and the resolved access mode (am), stating that multi-attach is not supported for that disk type; update the status.Errorf call accordingly and keep the surrounding checks (disksWithUnsettableAccessMode and constants.GCEReadWriteOnceAccessMode) intact.
| echo "Kubernetes Integration:" | ||
| echo " To apply generated labels directly to a node:" | ||
| echo " kubectl label nodes <node-name> \$($0 -m <machine-type>)" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the script to see the actual code at the mentioned lines
cat -n deploy/disk_type_labels.sh | sed -n '20,30p'Repository: openshift/gcp-pd-csi-driver
Length of output: 471
🏁 Script executed:
# Also check lines 381-383
wc -l deploy/disk_type_labels.shRepository: openshift/gcp-pd-csi-driver
Length of output: 101
🏁 Script executed:
# Search for all echo statements that output label format
rg "echo.*disk-type" deploy/disk_type_labels.sh -nRepository: openshift/gcp-pd-csi-driver
Length of output: 121
🏁 Script executed:
# Let's look at more context around line 383 to understand the output generation
cat -n deploy/disk_type_labels.sh | sed -n '375,390p'Repository: openshift/gcp-pd-csi-driver
Length of output: 536
🏁 Script executed:
# Check if there are other locations emitting labels with quotes
rg "echo.*\\\"true\\\"" deploy/disk_type_labels.sh -nRepository: openshift/gcp-pd-csi-driver
Length of output: 157
🏁 Script executed:
# Let's find all echo statements that generate label output
rg "echo.*disk-type" deploy/disk_type_labels.sh -n -A 1 -B 1Repository: openshift/gcp-pd-csi-driver
Length of output: 166
🏁 Script executed:
# Let's also check if there's a help/usage function that might show more context
rg "usage\|help" deploy/disk_type_labels.sh -n -iRepository: openshift/gcp-pd-csi-driver
Length of output: 53
🏁 Script executed:
# Let's see what the actual script outputs when run (check for any test data or examples)
head -100 deploy/disk_type_labels.sh | tail -50Repository: openshift/gcp-pd-csi-driver
Length of output: 1602
🌐 Web query:
kubectl label nodes KEY=VALUE syntax requirements quotes
💡 Result:
kubectl label nodes basic syntax
kubectl label nodes <node-name> <key>=<value> [<key2>=<value2> ...]--overwrite is required to change an existing label’s value. (kubernetes.io)
Label key/value requirements (Kubernetes rules)
- Key: either
nameorprefix/namenameis required, max 63 chars, must start/end alphanumeric; may contain- _ .in between.prefix(optional) must be a DNS subdomain (max 253 chars), followed by/. (kubernetes.io)
- Value: max 63 chars; may be empty; if non-empty must start/end alphanumeric; may contain
- _ .in between. (kubernetes.io)
Quotes: when needed (and when they won’t help)
- For labeling (
key=value), you typically don’t need quotes. - Quotes cannot make invalid labels valid (e.g., spaces aren’t allowed in label keys/values per the rules above). (kubernetes.io)
- You do commonly use quotes for label selectors (not labeling), e.g.:
(kubernetes.io)
kubectl get pods -l 'environment in (production),tier in (frontend)'
Useful related forms
- Empty value (allowed):
mykey=(kubernetes.io) - Remove a label:
(kubernetes.io)
kubectl label nodes <node-name> mykey-
Citations:
- 1: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_label/
- 2: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 3: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 4: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 5: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 6: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
- 7: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_label/
🏁 Script executed:
# Let's also look at the usage function more completely to understand the context
cat -n deploy/disk_type_labels.sh | sed -n '1,40p'Repository: openshift/gcp-pd-csi-driver
Length of output: 1542
Emit unquoted label values.
The usage block tells users to pipe this output into kubectl label, but the script generates disk-type.gke.io/$disk="true" with literal quote characters. Kubernetes label values cannot contain " characters and must conform to [a-zA-Z0-9]([a-zA-Z0-9._-]{0,61}[a-zA-Z0-9])?, making the generated arguments invalid.
Suggested fix
- echo "disk-type.gke.io/$disk=\"true\""
+ echo "disk-type.gke.io/$disk=true"Also applies to line 383.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/disk_type_labels.sh` around lines 25 - 27, The usage and
label-generation currently emit labels with literal double quotes (e.g.,
disk-type.gke.io/<disk>="true"), which produce invalid Kubernetes label values;
update the code that prints label arguments so it emits unquoted values
(disk-type.gke.io/<disk>=true) by removing the surrounding quotes in the
printed/echoed label string and ensuring variable expansion is performed safely
(apply this change to the usage example that prints "kubectl label nodes
<node-name> \$($0 -m <machine-type>)" and to the label-emission logic referenced
near line 383 so both produce unquoted label values).
| resources: | ||
| - configmap.yaml | ||
| - rbac.yaml | ||
| - image.yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
kustom=$(fd '^kustomization\.yaml$' deploy/kubernetes/overlays/node-labeler)
image_cfg=$(fd '^image\.yaml$' deploy/kubernetes/overlays/node-labeler)
echo "== ${kustom} =="
sed -n '1,80p' "$kustom"
echo
echo "== ${image_cfg} =="
sed -n '1,120p' "$image_cfg"
echo
echo "== related kustomize declarations =="
rg -n -C2 'kind:\s*Kustomization|transformers:|resources:|ImageTagTransformer' deploy/kubernetesRepository: openshift/gcp-pd-csi-driver
Length of output: 27337
Move image.yaml from resources: to transformers:.
image.yaml contains an ImageTagTransformer, which is a transformer helper—not a rendered Kubernetes manifest. It must be configured under transformers: instead of resources: to be applied correctly.
Corrected kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- configmap.yaml
- rbac.yaml
transformers:
- image.yaml
patchesStrategicMerge:
- patch.yaml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/kubernetes/overlays/node-labeler/kustomization.yaml` around lines 3 -
6, The kustomization lists image.yaml under resources but image.yaml is a
transformer (contains an ImageTagTransformer) and must be listed under
transformers; update the Kustomization (kustomization.yaml) by removing
image.yaml from the resources: block and adding a transformers: section that
includes - image.yaml (leave configmap.yaml and rbac.yaml under resources and
keep existing patchesStrategicMerge entries).
| spec: | ||
| containers: | ||
| - name: gce-pd-node-labeler | ||
| image: cloud-provider-gcp/gce-pd-node-labeler | ||
| args: | ||
| - "--v=5" | ||
| - "--configmap-name=machine-pd-compatibility" | ||
| - "--configmap-namespace=gce-pd-csi-driver" | ||
| - "--http-endpoint=:22015" | ||
| ports: | ||
| - containerPort: 22015 | ||
| name: http-endpoint | ||
| protocol: TCP | ||
| livenessProbe: | ||
| failureThreshold: 1 | ||
| httpGet: | ||
| path: /healthz/node-labeler | ||
| port: http-endpoint | ||
| initialDelaySeconds: 10 | ||
| timeoutSeconds: 10 | ||
| periodSeconds: 20 | ||
| resources: | ||
| requests: | ||
| cpu: 20m | ||
| memory: 50Mi | ||
| limits: | ||
| cpu: 50m | ||
| memory: 100Mi |
There was a problem hiding this comment.
Harden the node-labeler container security context.
This container inherits the default pod/container security context, so it can run with a broader privilege set than necessary. For a long-lived controller process, please at least force non-root execution, disable privilege escalation, and make the root filesystem read-only.
🔐 Proposed hardening
spec:
template:
spec:
+ securityContext:
+ seccompProfile:
+ type: RuntimeDefault
containers:
- name: gce-pd-node-labeler
image: cloud-provider-gcp/gce-pd-node-labeler
+ securityContext:
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
+ runAsNonRoot: true
+ capabilities:
+ drop:
+ - ALL
args:
- "--v=5"If the binary needs scratch space, add an explicit emptyDir mount instead of keeping the whole root filesystem writable.
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 1-34: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-34: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Trivy (0.69.3)
[error] 9-34: Root file system is not read-only
Container 'gce-pd-node-labeler' of Deployment 'csi-gce-pd-controller' should set 'securityContext.readOnlyRootFilesystem' to true
Rule: KSV-0014
(IaC/Kubernetes)
[error] 9-34: Default security context configured
container csi-gce-pd-controller in default namespace is using the default security context
Rule: KSV-0118
(IaC/Kubernetes)
[error] 7-34: Default security context configured
deployment csi-gce-pd-controller in default namespace is using the default security context, which allows root privileges
Rule: KSV-0118
(IaC/Kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/kubernetes/overlays/node-labeler/patch.yaml` around lines 7 - 34, Add
a container-level securityContext to the gce-pd-node-labeler container to force
non-root and restrict privileges: set runAsNonRoot: true (and optionally
runAsUser to a non-zero uid), set allowPrivilegeEscalation: false and
privileged: false, and set readOnlyRootFilesystem: true; if the binary needs
writable scratch space, add an explicit emptyDir volume and mount it (e.g.,
volume name like node-labeler-tmp mounted at /tmp) instead of leaving the root
filesystem writable so the pod stays secure.
| ```console | ||
| $ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Comma-separated list of node service accounts | ||
| ``` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the file exists and get its structure
fd driver-install.mdRepository: openshift/gcp-pd-csi-driver
Length of output: 116
🏁 Script executed:
# Read the specific lines mentioned in the review
head -100 docs/kubernetes/user-guides/driver-install.md | tail -60Repository: openshift/gcp-pd-csi-driver
Length of output: 2597
🏁 Script executed:
# Search for setup-project.sh and deploy-driver.sh to understand how they use NODE_SERVICE_ACCOUNTS
rg "setup-project.sh|deploy-driver.sh" --type md -B 3 -A 3Repository: openshift/gcp-pd-csi-driver
Length of output: 6080
🏁 Script executed:
# Look for NODE_SERVICE_ACCOUNTS usage in the codebase
rg "NODE_SERVICE_ACCOUNTS" --type md -B 2 -A 2Repository: openshift/gcp-pd-csi-driver
Length of output: 1579
🏁 Script executed:
# Search for any shell script files to understand variable usage
fd "setup-project.sh|deploy-driver.sh" -type fRepository: openshift/gcp-pd-csi-driver
Length of output: 241
🏁 Script executed:
# Find and examine the setup-project.sh and deploy-driver.sh scripts
fd "setup-project.sh|deploy-driver.sh" -type fRepository: openshift/gcp-pd-csi-driver
Length of output: 241
🏁 Script executed:
# Search for NODE_SERVICE_ACCOUNTS usage in script files
rg "NODE_SERVICE_ACCOUNTS" --type sh -B 3 -A 3Repository: openshift/gcp-pd-csi-driver
Length of output: 2519
🏁 Script executed:
# Check if the scripts are in deploy directory
ls -la deploy/Repository: openshift/gcp-pd-csi-driver
Length of output: 555
🏁 Script executed:
# Look at the deploy directory structure
find deploy -type f -name "*.sh" | head -20Repository: openshift/gcp-pd-csi-driver
Length of output: 317
🏁 Script executed:
# Read the complete setup-project.sh script to understand how it reads NODE_SERVICE_ACCOUNTS
cat -n deploy/setup-project.sh | head -50Repository: openshift/gcp-pd-csi-driver
Length of output: 2391
🏁 Script executed:
# Check how setup-project.sh uses NODE_SERVICE_ACCOUNTS and whether it expects an exported variable
cat -n deploy/setup-project.sh | grep -A 10 "NODE_SERVICE_ACCOUNTS"Repository: openshift/gcp-pd-csi-driver
Length of output: 1019
🏁 Script executed:
# Examine the deploy-driver.sh script
cat -n deploy/kubernetes/deploy-driver.sh | grep -A 10 "NODE_SERVICE_ACCOUNTS"Repository: openshift/gcp-pd-csi-driver
Length of output: 1204
🏁 Script executed:
# Check if there are examples in development.md showing how variables are set
cat -n docs/kubernetes/development.md | grep -B 5 -A 5 "export.*ENABLE"Repository: openshift/gcp-pd-csi-driver
Length of output: 483
🏁 Script executed:
# Check if there are any inline assignment patterns used in the documentation
rg "NODE_SERVICE_ACCOUNTS\s*=.*\./(setup-project|deploy-driver)" docs/Repository: openshift/gcp-pd-csi-driver
Length of output: 53
🏁 Script executed:
# Look at more context around the deploy examples to see the full command structure
sed -n '40,75p' docs/kubernetes/user-guides/driver-install.mdRepository: openshift/gcp-pd-csi-driver
Length of output: 2101
🏁 Script executed:
# Check the common.sh file to understand how the variable is checked
cat deploy/common.shRepository: openshift/gcp-pd-csi-driver
Length of output: 1077
Export NODE_SERVICE_ACCOUNTS to child shell processes.
These examples assign NODE_SERVICE_ACCOUNTS on a separate line from invoking the scripts. Without export, the scripts won't receive the variable and the scoped impersonation logic in use_scoped_sa_role() will silently fall back to the broader project-level serviceAccountUser role. The development guide already demonstrates this pattern with export ENABLE_KMS=true.
Suggested fix
-$ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Comma-separated list of node service accounts
+$ export NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Comma-separated list of node service accounts-$ NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Same as the setup-project.sh step
+$ export NODE_SERVICE_ACCOUNTS="master-sa@project.iam.gserviceaccount.com,worker-sa@project.iam.gserviceaccount.com" # Same as the setup-project.sh stepAlso applies to: 69-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/kubernetes/user-guides/driver-install.md` around lines 47 - 49, The
example sets NODE_SERVICE_ACCOUNTS but does not export it, so child scripts (and
the use_scoped_sa_role() logic) won't see it; update the examples that assign
NODE_SERVICE_ACCOUNTS (e.g., the occurrences around the shown block and the
other instance) to export the variable (use "export NODE_SERVICE_ACCOUNTS=...")
so the scripts receive the value and scoped impersonation in
use_scoped_sa_role() works as intended.
| golang.org/x/oauth2 v0.30.0 | ||
| golang.org/x/sys v0.35.0 | ||
| golang.org/x/time v0.12.0 | ||
| go.opentelemetry.io/otel/sdk v1.36.0 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GO-2026-4394 GHSA-9h8m-3fm2-qjrq otel/sdk vulnerability
💡 Result:
GO-2026-4394 / GHSA-9h8m-3fm2-qjrq (CVE-2026-24051) is a local PATH-hijacking arbitrary code execution issue in the OpenTelemetry Go SDK on macOS/Darwin. The SDK’s host ID resource detection runs the ioreg command via PATH lookup (sdk/resource/host_id.go), so an attacker who can influence PATH can get code executed in the application’s context. [2][3]
- Affected:
go.opentelemetry.io/otel/sdk(notably the.../sdk/resourcemodule), versions 1.21.0–1.39.0 (NVD description notes 1.20.0–1.39.0). [1][2][3] - Severity: High (CVSS 3.1: 7.0) (per GitHub/CNA). [2][4]
- Fixed: upgrade to v1.40.0 (patch commit
d45961b...). [1][2][3]
Mitigation (if you can’t upgrade immediately): ensure your service doesn’t run with an attacker-controlled PATH on macOS (e.g., sanitize/lock down environment variables and execution context). [2][3]
Sources: [1] Go vuln db (pkg.go.dev) GO-2026-4394; [2] OpenTelemetry GitHub security advisory GHSA-9h8m-3fm2-qjrq; [3] NVD CVE-2026-24051; [4] NVD metrics section showing GitHub/CNA CVSS 7.0.
🏁 Script executed:
git ls-files | grep -E "go\.(mod|sum)$"Repository: openshift/gcp-pd-csi-driver
Length of output: 84
🏁 Script executed:
cat -n go.mod | head -50Repository: openshift/gcp-pd-csi-driver
Length of output: 2202
Bump otel/sdk to v1.40.0 or later.
go.opentelemetry.io/otel/sdk v1.36.0 is vulnerable to GO-2026-4394 / GHSA-9h8m-3fm2-qjrq (CVE-2026-24051)—a local PATH-hijacking arbitrary code execution issue on macOS. Affected versions range from 1.20.0 to 1.39.0, with v1.40.0 as the first fixed release. Upgrade before merging.
🧰 Tools
🪛 OSV Scanner (2.3.3)
[HIGH] 24-24: go.opentelemetry.io/otel/sdk 1.36.0: OpenTelemetry Go SDK Vulnerable to Arbitrary Code Execution via PATH Hijacking in go.opentelemetry.io/otel/sdk
(GO-2026-4394)
[HIGH] 24-24: go.opentelemetry.io/otel/sdk 1.36.0: OpenTelemetry Go SDK Vulnerable to Arbitrary Code Execution via PATH Hijacking
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` at line 24, Update the OpenTelemetry SDK dependency to a fixed
release by changing the module version for go.opentelemetry.io/otel/sdk in
go.mod to v1.40.0 (or later); after updating the version run the module tooling
(e.g., go get go.opentelemetry.io/otel/sdk@v1.40.0 and go mod tidy) to refresh
go.sum and ensure the lockfile reflects the upgraded, non‑vulnerable release.
| klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err) | ||
| if err := manuallySetDevicePath(deviceName); err != nil { | ||
| return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err) | ||
| } | ||
| } | ||
|
|
||
| return devicePath, nil |
There was a problem hiding this comment.
devicePath not updated after successful manual symlink creation.
When manuallySetDevicePath succeeds, it creates a symlink but devicePath remains empty or stale from the failed poll loop. The function then returns this incorrect value at line 299, causing callers to receive an invalid device path despite the symlink being successfully created.
🐛 Proposed fix to return the correct device path
if err != nil {
klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err)
if err := manuallySetDevicePath(deviceName); err != nil {
return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err)
}
+ // Update devicePath to the newly created symlink
+ devicePath = path.Join(diskByIdPath, diskGooglePrefix+deviceName)
}
return devicePath, nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err) | |
| if err := manuallySetDevicePath(deviceName); err != nil { | |
| return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err) | |
| } | |
| } | |
| return devicePath, nil | |
| klog.Warningf("For device %s udevadmin failed: %v. Trying to manually link", deviceName, err) | |
| if err := manuallySetDevicePath(deviceName); err != nil { | |
| return "", fmt.Errorf("failed to manually set link for disk %s: %w", deviceName, err) | |
| } | |
| // Update devicePath to the newly created symlink | |
| devicePath = path.Join(diskByIdPath, diskGooglePrefix+deviceName) | |
| } | |
| return devicePath, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/deviceutils/device-utils.go` around lines 293 - 299, The code returns a
stale/empty devicePath after manuallySetDevicePath succeeds; fix by updating
devicePath before returning: after manuallySetDevicePath(deviceName) returns
nil, resolve the newly created symlink for the deviceName (e.g., use os.Readlink
or filepath.EvalSymlinks on the symlink path derived from deviceName) and assign
that result to devicePath, handling any resolution error and returning it if
resolution fails so callers get the correct, current device path.
| params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters()) | ||
| if gceCS.enableDynamicVolumes && params.IsDiskDynamic() { | ||
| params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider) | ||
| if err != nil { | ||
| return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error()) | ||
| } | ||
| parameters.SanitizeDiskParameters(¶ms) | ||
| } | ||
| metrics.UpdateRequestMetadataFromParams(ctx, params) | ||
| if err != nil { | ||
| return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error()) |
There was a problem hiding this comment.
Fail fast on parameter extraction errors.
ExtractAndDefaultParameters can already return a non-nil err here. Because the dynamic-volume block reuses err before it is checked, a successful SelectDisk can clear the original extraction failure and let invalid parameters/data-cache settings through.
Suggested fix
params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters())
+if err != nil {
+ return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err)
+}
if gceCS.enableDynamicVolumes && params.IsDiskDynamic() {
params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error())
}
parameters.SanitizeDiskParameters(¶ms)
}
metrics.UpdateRequestMetadataFromParams(ctx, params)
-if err != nil {
- return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error())
-}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters()) | |
| if gceCS.enableDynamicVolumes && params.IsDiskDynamic() { | |
| params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error()) | |
| } | |
| parameters.SanitizeDiskParameters(¶ms) | |
| } | |
| metrics.UpdateRequestMetadataFromParams(ctx, params) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err.Error()) | |
| params, dataCacheParams, err := gceCS.parameterProcessor().ExtractAndDefaultParameters(req.GetParameters()) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "failed to extract parameters: %v", err) | |
| } | |
| if gceCS.enableDynamicVolumes && params.IsDiskDynamic() { | |
| params.DiskType, err = SelectDisk(ctx, req, gceCS.CloudProvider) | |
| if err != nil { | |
| return nil, status.Errorf(codes.InvalidArgument, "failed to select disk type: %v", err.Error()) | |
| } | |
| parameters.SanitizeDiskParameters(¶ms) | |
| } | |
| metrics.UpdateRequestMetadataFromParams(ctx, params) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/gce-pd-csi-driver/controller.go` around lines 327 - 337, The code calls
gceCS.parameterProcessor().ExtractAndDefaultParameters(...) and then reuses the
err variable inside the dynamic-volume branch, which can overwrite an earlier
extraction error; fix by immediately checking the returned err from
ExtractAndDefaultParameters and returning a status error if non-nil before any
further processing (before the enableDynamicVolumes/IsDiskDynamic block and
before calling metrics.UpdateRequestMetadataFromParams), then proceed to
SelectDisk and parameters.SanitizeDiskParameters only when extraction succeeded.
| // Some architectures don't have local ssd. Give way to opt out of tests like datacache. | ||
| skipLocalSsdTests = flag.Bool("skip-local-ssd-tests", false, "Skip local ssd tests like datacache") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
rg -n -C3 'skipLocalSsdTests|getLocalSsdCount|SetupDataCachingConfig' test/e2e/tests/setup_e2e_test.go
rg -n -C5 '\bfunc\s+SetupDataCachingConfig\s*\(' test/e2e/utilsRepository: openshift/gcp-pd-csi-driver
Length of output: 2036
🏁 Script executed:
cat -n test/e2e/utils/utils.go | sed -n '299,350p'Repository: openshift/gcp-pd-csi-driver
Length of output: 2804
🏁 Script executed:
rg -n 'LocalSSDCount' test/e2e/ -A2 -B2Repository: openshift/gcp-pd-csi-driver
Length of output: 1086
Incomplete opt-out: SetupDataCachingConfig runs regardless of the skip-local-ssd-tests flag.
The flag only changes LocalSSDCount to 0 but doesn't prevent SetupDataCachingConfig from executing unconditionally at line 238. This function always modifies /etc/lvm/lvm.conf via SSH without checking whether the instance actually has a local SSD or supports data caching, so the test suite will still fail on unsupported machine types.
This also affects the hardcoded case at line 211 where LocalSSDCount is set to 0 for hdMachineType—that path will hit the same error.
Either skip the SetupDataCachingConfig call when LocalSSDCount is 0, or make that function conditional on local SSD availability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/tests/setup_e2e_test.go` around lines 65 - 67, The test currently
calls SetupDataCachingConfig unconditionally which modifies /etc/lvm/lvm.conf
even when no local SSD is present; change the call sites so
SetupDataCachingConfig is only invoked when LocalSSDCount > 0 and the
skip-local-ssd-tests flag is false (i.e., check both *skipLocalSsdTests and the
test context's LocalSSDCount before calling SetupDataCachingConfig), and ensure
the hdMachineType branch that sets LocalSSDCount = 0 also avoids calling
SetupDataCachingConfig; alternatively make SetupDataCachingConfig early-return
if LocalSSDCount == 0 so it will be a no-op when no local SSDs exist.
| cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal) | ||
| out, err = cmd.CombinedOutput() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to list cluster: %v %s", err, out) | ||
| } | ||
| scanner := bufio.NewScanner(strings.NewReader(string(out))) | ||
| printedHash := false | ||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
| if strings.HasPrefix(line, "id: ") { | ||
| klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal) | ||
| klog.Infof("GKE cluster hash %s", line) | ||
| printedHash = true | ||
| break | ||
| } | ||
| } | ||
| if !printedHash { | ||
| return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Google Cloud CLI docs: for gcloud container clusters describe, can the output be affected by the core/formatproperty, and is--format='value(id)' the supported way to return only the cluster ID?
💡 Result:
Yes—the default output of gcloud container clusters describe can be affected by the core/default_format property (there isn’t a core/format property; the documented one is core/default_format). The command also supports the global --format flag, which overrides the default output formatting. [1], [2]
And yes—using the global format flag like this is the supported way to return a single field:
gcloud container clusters describe CLUSTER_NAME --format="value(id)"value(...) is a supported format expression in Google Cloud CLI output formatting. [1], [4]
🏁 Script executed:
# First, check the imports and bufio usage in the file
cat -n test/k8s-integration/cluster.go | head -30Repository: openshift/gcp-pd-csi-driver
Length of output: 880
🏁 Script executed:
# Check if bufio is imported
rg "^import" test/k8s-integration/cluster.go -A 20Repository: openshift/gcp-pd-csi-driver
Length of output: 459
🏁 Script executed:
# Verify the exact lines 284-302
sed -n '284,302p' test/k8s-integration/cluster.goRepository: openshift/gcp-pd-csi-driver
Length of output: 722
🏁 Script executed:
# Check for any other uses of bufio.Scanner in the file to see if removal is safe
rg "bufio" test/k8s-integration/cluster.goRepository: openshift/gcp-pd-csi-driver
Length of output: 140
Pin gcloud describe to a machine-readable format.
The current code relies on parsing a human-readable output line starting with id: , which breaks if gcloud output formatting changes (e.g., via core/default_format configuration). Use --format=value(id) to request the cluster ID directly instead of scanning unstructured output.
♻️ Proposed fix
- cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal)
+ cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal, "--format=value(id)")
out, err = cmd.CombinedOutput()
if err != nil {
- return fmt.Errorf("failed to list cluster: %v %s", err, out)
+ return fmt.Errorf("failed to describe cluster: %v %s", err, out)
}
- scanner := bufio.NewScanner(strings.NewReader(string(out)))
- printedHash := false
- for scanner.Scan() {
- line := scanner.Text()
- if strings.HasPrefix(line, "id: ") {
- klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
- klog.Infof("GKE cluster hash %s", line)
- printedHash = true
- break
- }
- }
- if !printedHash {
- return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out)
- }
+ clusterID := strings.TrimSpace(string(out))
+ if clusterID == "" {
+ return fmt.Errorf("failed to read cluster id from gcloud output: %s", out)
+ }
+ klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal)
+ klog.Infof("GKE cluster hash id: %s", clusterID)The bufio import can be removed after this change.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal) | |
| out, err = cmd.CombinedOutput() | |
| if err != nil { | |
| return fmt.Errorf("failed to list cluster: %v %s", err, out) | |
| } | |
| scanner := bufio.NewScanner(strings.NewReader(string(out))) | |
| printedHash := false | |
| for scanner.Scan() { | |
| line := scanner.Text() | |
| if strings.HasPrefix(line, "id: ") { | |
| klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal) | |
| klog.Infof("GKE cluster hash %s", line) | |
| printedHash = true | |
| break | |
| } | |
| } | |
| if !printedHash { | |
| return fmt.Errorf("failed to find cluster hash in cluster describe: %s", out) | |
| } | |
| cmd = exec.Command("gcloud", "container", "clusters", "describe", *gkeTestClusterName, locationArg, locationVal, "--format=value(id)") | |
| out, err = cmd.CombinedOutput() | |
| if err != nil { | |
| return fmt.Errorf("failed to describe cluster: %v %s", err, out) | |
| } | |
| clusterID := strings.TrimSpace(string(out)) | |
| if clusterID == "" { | |
| return fmt.Errorf("failed to read cluster id from gcloud output: %s", out) | |
| } | |
| klog.Infof("GKE cluster: %s %s", *gkeTestClusterName, locationVal) | |
| klog.Infof("GKE cluster hash id: %s", clusterID) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/k8s-integration/cluster.go` around lines 284 - 302, The code parses
human-readable gcloud output looking for "id: " using exec.Command("gcloud",
"container", "clusters", "describe", ...) and a bufio.Scanner (variables: cmd,
out, err, printedHash, locationArg, locationVal, gkeTestClusterName); change the
command to request a machine-readable value by adding the flag
"--format=value(id)" to the exec.Command args so the command returns only the
cluster ID, then remove the bufio.Scanner loop and simply check out/err and use
the returned string (trim whitespace) as the cluster hash; also remove the
now-unused bufio import.
| return wait.Poll(5*time.Second, time.Minute, func() (bool, error) { | ||
| out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the test/remote/ssh.go file
fd -type f -name "ssh.go" | head -20Repository: openshift/gcp-pd-csi-driver
Length of output: 241
🏁 Script executed:
# If found, read the file to see the SSHCheckAlive method around lines 94-95
cat -n test/remote/ssh.go | sed -n '85,105p'Repository: openshift/gcp-pd-csi-driver
Length of output: 1130
🏁 Script executed:
# Find the runSSHCommand function definition
rg -n "func.*runSSHCommand" --type goRepository: openshift/gcp-pd-csi-driver
Length of output: 158
🏁 Script executed:
# Read the runSSHCommand function implementation
cat -n test/remote/ssh.go | sed -n '146,170p'Repository: openshift/gcp-pd-csi-driver
Length of output: 866
🏁 Script executed:
# Find the sshOption variable definition
rg -n "sshOption\s*=" test/remote/ssh.go | head -5Repository: openshift/gcp-pd-csi-driver
Length of output: 236
🏁 Script executed:
# Also check the context around it
cat -n test/remote/ssh.go | sed -n '1,50p'Repository: openshift/gcp-pd-csi-driver
Length of output: 1747
Move ConnectTimeout option before the SSH destination.
The -o ConnectTimeout=10 is positioned after i.GetSSHTarget(), which makes it part of the remote command arguments instead of SSH client options. OpenSSH expects options before the destination; options placed after are passed to the remote shell and have no effect on the SSH connection timeout. This prevents the timeout from being applied and can cause the liveness probe to wait up to the full time.Minute poll duration even on unreachable hosts.
- out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...)
+ out, err := runSSHCommand("ssh", []string{"-o", "ConnectTimeout=10", i.GetSSHTarget(), "--", "echo"}...)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return wait.Poll(5*time.Second, time.Minute, func() (bool, error) { | |
| out, err := runSSHCommand("ssh", []string{i.GetSSHTarget(), "-o", "ConnectTimeout=10", "--", "echo"}...) | |
| return wait.Poll(5*time.Second, time.Minute, func() (bool, error) { | |
| out, err := runSSHCommand("ssh", []string{"-o", "ConnectTimeout=10", i.GetSSHTarget(), "--", "echo"}...) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/remote/ssh.go` around lines 94 - 95, The ssh option "-o
ConnectTimeout=10" is currently passed after the SSH destination
(i.GetSSHTarget()) so it becomes part of the remote command instead of an SSH
client option; update the runSSHCommand call inside the wait.Poll lambda so the
options are placed before i.GetSSHTarget() (e.g., build the args slice with
"-o", "ConnectTimeout=10" prior to i.GetSSHTarget()), ensuring runSSHCommand and
i.GetSSHTarget() are used with the correct argument order so the ConnectTimeout
is applied.
|
/test e2e-gcp-csi-manual-oidc |
|
/lgtm |
|
/test e2e-gcp-csi-manual-oidc |
|
@dfajmon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by CI |
|
@chao007: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Issue link
https://issues.redhat.com/browse/STOR-2751
Diff to upstream v1.24.1
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.24.1...dfajmon:rebase-v1.24.1
Notes for reviewers
The rebase was initially based on v1.23.3. A new version was released recently, and because of merge conflicts introduced by cherry-picks, we have decided to rebase to a newer version.
Summary of changes
Major Features
bdi/read_ahead_kb(#2156)Notable Bug Fixes
Cherry-picked commits
Upstream changelogs
Full changelog
kubernetes-sigs/gcp-compute-persistent-disk-csi-driver@v1.21.6...v1.24.1
Last rebase
#85
@openshift/storage
Summary by CodeRabbit
New Features
Documentation
Improvements