-
Notifications
You must be signed in to change notification settings - Fork 524
STOR-2682: option to recreate LSO symlinks #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
dobsonj
wants to merge
2
commits into
openshift:master
Choose a base branch
from
dobsonj:STOR-2682
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
| --- | ||
| title: lso-recreate-symlinks | ||
| authors: | ||
| - "@dobsonj" | ||
| reviewers: | ||
| - "@openshift/storage" | ||
| approvers: | ||
| - "@jsafrane" | ||
| - "@gnufied" | ||
| api-approvers: | ||
| - "None" | ||
| creation-date: 2025-11-06 | ||
| last-updated: 2025-11-06 | ||
| tracking-link: | ||
| - "https://issues.redhat.com/browse/STOR-2682" | ||
| see-also: | ||
| - "https://issues.redhat.com/browse/OCPBUGS-61988" | ||
| replaces: | ||
| superseded-by: | ||
| --- | ||
|
|
||
| # LSO: Recreate Symlinks | ||
|
|
||
| ## Summary | ||
|
|
||
| Local Storage Operator (LSO) creates symlinks under `/mnt/local-storage` pointing to local disks by their `/dev/disk/by-id` path, and then creates a PersistentVolume (PV) pointing to each `/mnt/local-storage` symlink. This design presumed that the `/dev/disk/by-id` links were stable, and while they are more stable across reboots than `/dev/sdb` for example, it is still possible for the underlying by-id symlinks to change based on udev rule changes or even firmware updates from hardware vendors. LSO needs a documented and supported way to recover from these `/dev/disk/by-id` symlink changes on the node. | ||
|
|
||
| ## Motivation | ||
|
|
||
| [RHEL docs](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_storage_devices/persistent-naming-attributes_managing-storage-devices#persistent-attributes-for-identifying-file-systems-and-block-devices_persistent-naming-attributes) state that "Device names managed by udev in /dev/disk/ can change between major releases, requiring link updates." and we have a specific case that became apparent in [OCPBUGS-61988](https://issues.redhat.com/browse/OCPBUGS-61988) that we need to mitigate for RHEL 10 (OCP 4.22). | ||
|
|
||
| sg3_utils 1.48 in RHEL 10 disables a udev rule that creates `/dev/disk/by-id/scsi-0NVME_*` symlinks on RHEL 9.x. This udev rule is already problematic for customers affected by OCPBUGS-61988 but it is also problematic for others who may already be using those symlinks successfully today because upgrading to RHEL 10 will cause those `/dev/disk/by-id` symlinks to disappear. LSO needs some way to recreate symlinks for existing PV's when this happens. | ||
|
|
||
| ### User Stories | ||
|
|
||
| * As an administrator, I want the option to switch to more stable symlinks for LSO PV's prior to upgrading OCP to a version which may cause existing by-id symlinks to change. | ||
| * As an administrator, I want a way to recover existing LSO PV's when existing by-id symlinks change unexpectedly. | ||
|
|
||
| ### Goals | ||
|
|
||
| * Preventative opt-in to recreate symlinks for existing LSO PV's based on the by-id symlink recommended by LSO as the most stable in relative terms. | ||
| * Recovery mechanism to recreate symlinks for existing LSO PV's after the by-id symlinks have already changed. | ||
| * Avoid manual changes on the node or other one-off recovery workarounds. | ||
|
|
||
| ### Non-Goals | ||
|
|
||
| * LSO will not format or partition the device to maintain its own on-disk identifier outside of consumer control because this fundamentally alters the contract LSO offers to consumers. | ||
| * LSO will not automatically recreate symlinks without explicit administrator opt-in because there may be other valid ways to resolve the issue and LSO does not have the context to make the right call in all cases. | ||
| * LSO will not (at this time) offer a more fine-grained filtering mechanism to allow administrators to influence which symlinks should be preferred by LSO. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Workflow Description | ||
|
|
||
| LSO's diskmaker will detect all valid by-id symlinks for each PV and add them as an annotation on the PV. It will also annotate the by-id symlink currently in use. For filesystem volumes, it will annotate the UUID of the filesystem to help find the corresponding disk. This means each PV will have up to three new annotations added by diskmaker: | ||
|
|
||
| * `storage.openshift.com/current-link-target`: current by-id symlink in use for the device | ||
| * `storage.openshift.com/dev-disk-by-id-list`: list of valid by-id symlinks for the device | ||
dobsonj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `storage.openshift.com/dev-disk-by-uuid`: by-uuid symlink corresponding to the device (when applicable) | ||
|
|
||
| LSO will throw an alert if `current-link-target` does not match the recommended symlink in `dev-disk-by-id-list`. `dev-disk-by-uuid` is not available for raw block volumes, but may still be used for filesystem volumes to ensure detected symlinks match the device with the correct on-disk identifier. | ||
|
|
||
| In response to the alert, the administrator can review the annotations on the PV, and then add a `storage.openshift.com/recreate-symlink` annotation to the PV to tell diskmaker to recreate the symlink. This can be done to proactively switch to the recommended symlink, or it can be done reactively when a symlink is no longer valid (assuming there is another known valid by-id symlink). Diskmaker will recreate the symlink pointing to the new by-id symlink and remove the `recreate-symlink` annotation when it is complete. | ||
|
|
||
dobsonj marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### API Extensions | ||
|
|
||
| No new API changes, just the diskmaker annotations for each PV as described above. | ||
|
|
||
| ### Topology Considerations | ||
|
|
||
| #### Hypershift / Hosted Control Planes | ||
|
|
||
| N/A, not topology specific. | ||
|
|
||
| #### Standalone Clusters | ||
|
|
||
| N/A, not topology specific. | ||
|
|
||
| #### Single-node Deployments or MicroShift | ||
|
|
||
| N/A, not topology specific. | ||
|
|
||
| ### Implementation Details/Notes/Constraints | ||
|
|
||
| Diskmaker will use the following selection criteria when choosing the recommended symlink for each PV: | ||
|
|
||
| 1. The link must be in the sorted by-id list, starting from highest priority based on the [prefix priority list](https://github.com/openshift/local-storage-operator/blob/397dcaa02032f52916aa3ed49e5b9ecd1b96cc01/pkg/internal/diskutil.go#L177-L199). | ||
| 2. If the current link target still exists, the new by-id target must point to the same device. | ||
| 3. If a UUID was previously discovered and recorded, the new by-id target must point to the same device. | ||
| 4. Return the first valid link from the by-id list that meets this criteria. Throw an error if no valid symlink can be found. | ||
|
|
||
| Diskmaker already watches for PV changes, but it will also need to watch for udev changes (?) to trigger reconcile to update the new annotations. It will need a reconcile trigger for detecting new UUID symlinks as well. | ||
dobsonj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Diskmaker will keep the link name and only change the link target. For example, if a PV has an existing symlink `/mnt/local-storage/localblock/scsi-0NVME_MODEL_abcde` pointing to `/dev/disk/by-id/scsi-0NVME_MODEL_abcde`, but there is a by-id link `/dev/disk/by-id/scsi-2ace42e0035eabcde`, setting `storage.openshift.com/recreate-symlink` will cause diskmaker to replace `/mnt/local-storage/localblock/scsi-0NVME_MODEL_abcde` with a new symlink pointing to `/dev/disk/by-id/scsi-2ace42e0035eabcde`. | ||
dobsonj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| What happens if a symlink is changed while a workload is still running? This actually works as long as the new symlink is created first and then moved to the existing `/mnt/local-storage` link name. An existing process can hold a reference to the old (replaced) symlink until the process exits, and a new process will read the new symlink after `mv`. | ||
|
|
||
| Why opt-in? There may be other valid ways to resolve the issue and LSO does not have the context to make the right call in all cases. Instead of letting diskmaker change symlinks automatically, we want to call administrator attention to the problem and let them make an informed decision about their environment. | ||
|
|
||
| ### Drawbacks | ||
|
|
||
| The biggest drawback is that without relying on some on-disk metadata LSO still relies on at least one symlink remaining stable across upgrades. This design is of limited help if _all_ of the by-id symlinks change between two releases. | ||
|
|
||
| ## Alternatives (Not Implemented) | ||
|
|
||
| * The `/mnt/local-storage` disk path stored in the PV is immutable, and we want the ability to fix existing PV's, which means we must keep the same `/mnt/local-storage` link name and update only the `/dev/disk/by-id` link target. | ||
| * We can't partition a device without breaking consumers or introducing separate API and does not help existing PV's recover. | ||
| * If diskmaker added a label to each device, it could easily be wiped by the disk consumer. It could label a device after kubelet creates a filesystem on it and save it in PV annotations but this only helps with filesystem volumes. | ||
| * LSO could store udev attributes instead of symlinks but we would need to have some logic to filter useful and useless attributes, not clear if it helps rebuilding symlinks in all cases. | ||
|
|
||
| ## Open Questions [optional] | ||
|
|
||
| We want to annotate the PV with the UUID if an on-disk identifier can be found, for informational purposes at the very least. There are some open questions on how LSO can use the UUID, but these may be addressed in future revisions: | ||
|
|
||
| 1. In addition to filesystem volumes, it would help to get the UUID of Ceph OSD volumes and annotate them. Ceph OSD volumes don't have by-uuid symlinks, can this be changed? | ||
dobsonj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 2. If diskmaker uses UUID to recreate symlinks, we need to figure out how to solve snapshot restore for LocalVolume object -- is it possible to have UUID conflicts that resolve to the wrong disk? | ||
dobsonj marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Test Plan | ||
|
|
||
| We'll extend existing LSO unit tests, e2e, extended test suite, and manual testing where needed. | ||
|
|
||
| We do not have easy access to many hardware configurations and much of our testing will be limited to simulating these symlink issues. | ||
|
|
||
| ## Graduation Criteria | ||
|
|
||
| This is an optional OLM operator, and since it does not introduce API changes or feature flags, it will be considered GA-ready when the feature is merged. | ||
|
|
||
| ### Dev Preview -> Tech Preview | ||
|
|
||
| ### Tech Preview -> GA | ||
|
|
||
| ### Removing a deprecated feature | ||
|
|
||
| ## Upgrade / Downgrade Strategy | ||
|
|
||
| N/A | ||
|
|
||
| ## Version Skew Strategy | ||
|
|
||
| LSO already sets [maxOpenShiftVersion](https://github.com/openshift/local-storage-operator/blob/397dcaa02032f52916aa3ed49e5b9ecd1b96cc01/config/manifests/stable/local-storage-operator.clusterserviceversion.yaml#L118) to N+1, meaning if the cluster is running 4.21 OCP with 4.20 LSO, upgrades to 4.22 OCP will be blocked until LSO is upgraded to 4.21. This is important because we want to deliver these changes to alert of potential symlink issues prior to upgrades to RHEL 10 (OCP 4.22). | ||
|
|
||
| ## Operational Aspects of API Extensions | ||
|
|
||
| N/A | ||
|
|
||
| ## Support Procedures | ||
|
|
||
| TBD | ||
|
|
||
| ## Infrastructure Needed [optional] | ||
|
|
||
| N/A | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.