-
Notifications
You must be signed in to change notification settings - Fork 537
Adds proposal for AutoDetection on LSO #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,283 @@ | ||
| --- | ||
| title: automatic-detection-of-disks-in-Local-Storage-Operator | ||
| authors: | ||
| - "@aranjan" | ||
| - "@rohantmp" | ||
| reviewers: | ||
| - "@jrivera" | ||
| - "@jsafrane" | ||
| - "@hekumar" | ||
| - "@chuffman" | ||
| - "@rojoseph" | ||
| - "@sapillai" | ||
| - "@leseb" | ||
| - "@travisn" | ||
| approvers: | ||
| - "@jrivera" | ||
| - "@jsafrane" | ||
| - "@hekumar" | ||
| - "@chuffman" | ||
| creation-date: 2020-01-21 | ||
| last-updated: 2020-02-24 | ||
| status: implementable | ||
| --- | ||
|
|
||
| # Automatic detection of Disks in Local-Storage-Operator | ||
|
|
||
| ## Release Signoff Checklist | ||
|
|
||
| - [ ] Enhancement is `implementable` | ||
| - [ ] Design details are appropriately documented from clear requirements | ||
| - [ ] Test plan is defined | ||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
|
||
| ## Summary | ||
|
|
||
| When working with bare-metal OpenShift clusters, due to the absence of storage provisioners like those in the cloud, there is a lot of manual work to be done for consuming local storage from nodes. | ||
| The idea is to automate the manual steps that are required for consuming local storage in an OpenShift cluster by extending the Local Storage Operator (LSO). | ||
|
|
||
| ## Motivation | ||
|
|
||
| To automatically detect local devices and provision local volumes out of them. | ||
|
|
||
| ### Goals | ||
|
|
||
| - Automatic detection of available disks from nodes which can be used as PVs for OpenShift-cluster. | ||
| - Respond to the attach/detach events of the disks/devices. | ||
| - Have options for filtering particular kind of disks based on properties such as name, size, manufacturer, etc. | ||
|
|
||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ### Non-Goals | ||
|
|
||
| - Inherently protect against confilict with provisioners that own local devices on the same nodes automatic detection is configured to run. | ||
|
|
||
| ## Proposal | ||
|
|
||
| The `local-storage-operator` is already capable of consuming local disks from the nodes and provisioning PVs out of them, | ||
| but the disk/device paths needs to explicitly specified in the `LocalVolume` CR. | ||
| The idea is to introduce a new Custom Resource, `AutoDetectVolumes`, that enables automated discovery and provisioning of local volume based PVs on a set of nodes addressed by one or more labelSelectors. | ||
|
|
||
| ### Risks and Mitigations | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - LSO will detect disks that contain data and/or are in-use via ensuring that the device: | ||
| - can be openened exclusively. | ||
| - is not read-only. | ||
| - is not removable. | ||
| - has no child partitions. | ||
| - has no FS signature. | ||
| - state (as outputted by `lsblk`) is not `suspended` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - Ensuring disks aren't re-detected as new or otherwise destroyed if their device path changes. | ||
| - This is already ensured by the current LSO approach of consuming disks by their `UUID` | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to document recovery from errors. What happens if user has made a mistake and wants to undo "creation" of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If they want to remove a volume that was automatically created, they would need to:
@gnufied Is that along the lines of what you are asking?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and does this documentation need to come with this PR? @gnufied
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is fine, we just need to ensure that this is captured somewhere. As limitation or something, so as this can be documented later. |
||
| ## Design Details | ||
|
|
||
| API scheme for `AutoDetectVolumes`: | ||
|
|
||
| ```go | ||
|
|
||
| type AutoDetectVolumeList struct { | ||
| metav1.TypeMeta `json:",inline"` | ||
| metav1.ListMeta `json:"metadata"` | ||
| Items []AutoDetectVolume `json:"items"` | ||
| } | ||
|
|
||
| type AutoDetectVolume struct { | ||
| metav1.TypeMeta `json:",inline"` | ||
| metav1.ObjectMeta `json:"metadata"` | ||
| Spec AutoDetectVolumeSpec `json:"spec"` | ||
| Status AutoDetectVolumeStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| type AutoDetectVolumeSpec struct { | ||
| // Nodes on which the automatic detection policies must run. | ||
| // +optional | ||
| NodeSelector *corev1.NodeSelector `json:"nodeSelector,omitempty"` | ||
| // StorageClassName to use for set of matched devices | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| StorageClassName string `json:"storageClassName"` | ||
| // MinDeviceCount is the minumum number of devices that needs to be detected per node. | ||
| // If the match devices are less than the minCount specified then no PVs will be created. | ||
| MinDeviceCount int `json:"minDeviceCount"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // +optional | ||
| // Maximum number of Devices that needs to be detected per node. | ||
| MaxDeviceCount int `json:"maxDeviceCount"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // VolumeMode determines whether the PV created is Block or Filesystem. By default it will | ||
| // be block | ||
| // + optional | ||
| VolumeMode PersistentVolumeMode `json:"volumeMode,omitempty"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // FSType type to create when volumeMode is Filesystem | ||
| // +optional | ||
| FSType string `json:"fsType,omitempty"` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we also discussed to have a boolean "dry-run" value which just finds the devices and not really creates them. We are also going to need to design a type that could hold the devices found on the node. We can use configmap or a new type... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disagree with the dry-run value approach. CRs already represent desired state. Using them for dry-run state just doesn't seem right. Discovery should store the available device list somewhere that the UI can load it, present the devices to the user, then create CRs when the user is ready to commit the devices. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, but then that still leaves us with an unsolved problem. We want the ability for the UI to trigger a filtering of the set of all discovered devices without 1. Commiting to whatever the result it or 2. Forcing the UI to reimplement the filtering logic of the backend. What's your counter proposal here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The UI is given all the metadata it needs to present the available devices to the user and the UI has to display a reasonable UI around those properties. The filtering itself seems like it would already need to be baked into the UI, such as knowing which nodes hold which devices and how much storage is on each node. Where is the filtering complicated enough that we wouldn't have reasonable confidence that the desired state generated in the CR wouldn't end up with the expected PVs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem for me lies in the duplication of effort that needs to be maintained in sync by two separate teams that may not be in regular contact with each other. It's not that it's difficult, it's that it's inefficient and more of a long-term maintenance headache than the proposed solution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jarrpa and I had a discussion on this... Most importantly, the discussion shouldn't block this design PR. Let's keep pushing to get this merged soon. Jose will start a new thread with this topic.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This discussion is the root of this PR. If the filtering is done in UI, why do we need complicated rules in LSO? UI, knowing which devices user selected / filtered, can create LocalVolume CRs directly, without any Note that output of LSO are Kubernetes PVs to be consumed by OCS (or anything else), not hardware inventory of all disks on a node. From what I read above, some UI wants to present hardware inventory and once user selects / filters the devices then UI does something so PVs are created. We are discussing the something here in this PR, however, I haven't read anything about how the HW inventory gets to UI in the first place. Will there be another daemon running on all nodes collecting information about block devices, just like LSO, only presenting them to Kubernetes in another way than LSO does??? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jsafrane The reason we discussed having a separate thread for this topic is that this design doc is independent from the UI. The UI will need to build on top of this new design, but fundamentally this AutoDetectVolume CR doesn't need any UI around it in order to configure the devices. Though if something in this design is blocking the UI design, then I would certainly want to resolve it here.
That something for the UI to create PVs would be to create the AutoDetectVolume CR(s). This is the signal that the UI is ready to commit to the device selection properties that the user decided on. That something could also be for the UI to create individual CRs for each of the devices, and there would be no need for the AutoDetectVolume CR. IMHO, the AutoDetectVolume is a better approach for the UI, but if we settle on the other approach during UI design, then so be it.
The thought is that the LSO daemons that are already running on all the nodes would be able to discover the available devices and store their properties somewhere for the UI to discover. That somewhere could be a ConfigMap for each node, stored as JSON. Or maybe a new CRD type should be created. The CRD seems like overkill since there is nothing for an operator to reconcile. LSO just needs to communicate metadata to the UI.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This was not part of this enhancement. Can we get all requirements on LSO into this enhancement, with all API objects and a complete work flow from a "blank" OCP cluster to a cluster with OCS installed (or at least ready to be installed)? Otherwise we may end up with half designed solution that won't make anyone happy. To me it looks like this PR is trying to reinvent OpenEBS disk manager There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I were to summarize all the meetings and design discussions...
We don't see #1 as being under the OCS scope since the local PVs could also be used outside of OCS. @gnufied Since you've been in most of the discussions, does this match your expectation as well? @jsafrane There are at least several other solutions in the community for local PV configuration and LSO overlaps with them. Our discussions have been to take a dependency on LSO. Are you suggesting LSO isn't the right place for this? Or simply pointing out a similar design? |
||
| // If specified, a list of tolerations to pass to the discovery daemons. | ||
| // +optional | ||
| Tolerations []corev1.Toleration `json:"tolerations,omitempty"` | ||
| // DeviceInclusionSpec is the filtration rule for including a device in the device discovery | ||
| // +optional | ||
| DeviceInclusionSpec *DeviceInclusionSpec `json:"deviceInclusionSpec"` | ||
| } | ||
|
|
||
| type AutoDetectVolumeStatus struct { | ||
| Status string `json:"status"` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please expand this type. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about adding pod conditions instead of a single status string?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure adding pod conditions will be a good idea here. I think the status should be reflecting the discovery progress. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think expanding this, whatever you had in mind, would be overkill for an MVP. We should be able ot get by with just a simple string for the time being, unless you have something else specific in mind.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what will be the support status of this new feature? I proposed to keep this alpha for now. but even then keeping this field as a simple string is not going to be enough. In LSO we typically try to stick to specs provided to us by api team - https://github.com/openshift/api/blob/master/operator/v1/types.go#L97 I am not sure if all fields in that type will be necessary but even for MVP/alpha feature, we should pick and propose fields we are going to use, so that this can be discussed during the design phase. |
||
| } | ||
|
|
||
| // DeviceMechanicalProperty holds the device's mechanical spec. It can be rotational, nonRotational or | ||
| // RotationalAndNonRotational | ||
| type DeviceMechanicalProperty string | ||
|
|
||
| const ( | ||
| // The mechanical properties of the devices | ||
| // Rotational refers to magnetic disks | ||
| Rotational DeviceMechanicalProperty = "Rotational" | ||
| // NonRotational refers to ssds | ||
| NonRotational DeviceMechanicalProperty = "NonRotational" | ||
| // RotationalAndNonRotational refers to both magnetic and ssds | ||
| RotationalAndNonRotational DeviceMechanicalProperty = "" | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ) | ||
|
|
||
| type DeviceDiscoveryPolicyType string | ||
|
|
||
| const ( | ||
| // The DeviceDiscoveryPolicies that will be supported by the LSO. | ||
| // These Discovery policies will based on lsblk's type output | ||
| // Disk represents a device-type of disk | ||
| Disk DeviceDiscoveryPolicyType = "disk" | ||
| // Part represents a device-type of partion | ||
| Part DeviceDiscoveryPolicyType = "part" | ||
| ) | ||
|
|
||
| type DeviceInclusionSpec struct { | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // DeviceTypes that should be used for automatic detection. This would be one of the types supported | ||
| // by the local-storage operator. Currently the supported types are: disk,part | ||
| // If the list is empty the default value will be `[disk]`. | ||
| DeviceTypes []DeviceDiscoveryPolicyType `json:"deviceType"` | ||
|
|
||
| // DeviceMechanicalProperty denotes whether Rotational or NonRotational disks should be used. | ||
| // by default, RotationalAndNonRotational which matches all disks. | ||
| // +optional | ||
| DeviceMechanicalProperty DeviceMechanicalProperty `json:"deviceMechanicalProperty"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // MinSize is the minimum size of the device which needs to be included | ||
| // +optional | ||
| MinSize resource.Quantity `json:"minSize"` | ||
|
|
||
| // MaxSize is the maximum size of the device which needs to be included | ||
| // +optional | ||
| MaxSize resource.Quantity `json:"maxSize"` | ||
|
|
||
| // Models is a list of device models. If not empty, the device's model as outputted by lsblk needs | ||
| // to contain at least one of these strings. | ||
| // +optional | ||
| Models []string `json:"models"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // Vendors is a list of device vendors. If not empty, the device's model as outputted by lsblk needs | ||
| // to contain at least one of these strings. | ||
| // +optional | ||
| Vendors []string `json:"vendors"` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - Once the `AutoDetectVolume` is created, the controller will: | ||
| - configure the local-static-provisioner to make a new StorageClass based on certain directories on the selected nodes. | ||
| - assign diskmaker daemons to the selected nodes. | ||
| - The diskmaker daemon will find devices that match the disovery policy and symlink them into the directory that the local-static-provisioner is watching. | ||
|
|
||
|
|
||
| ### Test Plan | ||
|
|
||
| - The integration tests for the LSO already exist. These tests will need to be updated to test this feature. | ||
| - The tests must ensure that detection of devices are working/updating correctly. | ||
| - The tests must ensure that data corruption are not happening during auto detection of devices. | ||
|
|
||
| ### Graduation Criteria | ||
|
|
||
| - Documentation exists for the behaviour of each configuration item. | ||
| - Unit and End to End tests coverage is sufficient. | ||
|
|
||
| #### Examples | ||
|
|
||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ```yaml | ||
| apiVersion: local.storage.openshift.io/v1alpha1 | ||
| kind: AutoDetectVolume | ||
| metadata: | ||
| name: example-autodetect | ||
| spec: | ||
| nodeSelector: | ||
| nodeSelectorTerms: | ||
| - matchExpressions: | ||
| - key: kubernetes.io/hostname | ||
| operator: In | ||
| values: | ||
| - worker-0 | ||
| - worker-1 | ||
| storageClassName: example-storageclass | ||
| volumeMode: Block | ||
| minDeviceCount: 5 | ||
| maxDeviceCount: 10 | ||
| deviceInclusionSpec: | ||
| deviceTypes: | ||
| - disk | ||
| deviceMechanicalProperty: Rotational | ||
| minSize: 10G | ||
| maxSize: 100G | ||
| ``` | ||
|
|
||
| ```yaml | ||
| apiVersion: local.storage.openshift.io/v1alpha1 | ||
| kind: AutoDetectVolume | ||
| metadata: | ||
| name: example-autodetect | ||
| spec: | ||
| nodeSelector: | ||
| nodeSelectorTerms: | ||
| - matchExpressions: | ||
| - key: kubernetes.io/hostname | ||
| operator: In | ||
| values: | ||
| - worker-0 | ||
| - worker-1 | ||
| storageClassName: example-storageclass | ||
| volumeMode: filesystem | ||
| fstype: ext4 | ||
| minDeviceCount: 5 | ||
| maxDeviceCount: 10 | ||
| deviceInclusionSpec: | ||
| deviceTypes: | ||
| - disk | ||
| - nvme | ||
| deviceMechanicalProperty: NonRotational | ||
| minSize: 10G | ||
| maxSize: 100G | ||
| models: | ||
| - SAMSUNG | ||
| - Crucial_CT525MX3 | ||
| vendors: | ||
| - ATA | ||
| - ST2000LM | ||
| ``` | ||
|
|
||
| ##### Removing a deprecated feature | ||
|
|
||
| - None of the features are getting deprecated | ||
|
|
||
| ### Upgrade / Downgrade Strategy | ||
|
|
||
| Since this requires a new implementation no new upgrade strategy will be required. | ||
|
|
||
| ### Version Skew Strategy | ||
|
|
||
| N/A | ||
|
|
||
| ## Implementation History | ||
|
|
||
| N/A | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| N/A | ||
|
|
||
| ## Alternatives | ||
ashishranjan738 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - Existing manual creation of LocalVolume CRs. With the node selector on the LocalVolume, a single CR can apply to an entire class of nodes (i.e., a machineset or a physical rack of homogeneous hardware). When a machineset is defined, a corresponding LocalVolume can also be created. | ||
| - Directly enhancing the LocalVolume CR to allow for auto discovery | ||
|
|
||
| The first approach requires some manual work and knowledge of underlying nodes, this makes it inefficient for large clusters. The second approach can introduce breaking change to the existing GA API. | ||
| Therefore this approach makes sense. | ||
Uh oh!
There was an error while loading. Please reload this page.