Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions enhancements/local-storage/automatic-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
---
title: automatic-detection-of-disks-in-Local-Storage-Operator
authors:
- "@aranjan"
- "@rohantmp"
reviewers:
- "@jrivera"
- "@jsafrane"
- "@hekumar"
- "@chuffman"
- "@rojoseph"
- "@sapillai"
- "@leseb"
- "@travisn"
approvers:
- "@jrivera"
- "@jsafrane"
- "@hekumar"
- "@chuffman"
creation-date: 2020-01-21
last-updated: 2020-02-24
status: implementable
---

# Automatic detection of Disks in Local-Storage-Operator

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

When working with bare-metal OpenShift clusters, due to the absence of storage provisioners like those in the cloud, there is a lot of manual work to be done for consuming local storage from nodes.
The idea is to automate the manual steps that are required for consuming local storage in an OpenShift cluster by extending the Local Storage Operator (LSO).

## Motivation

To automatically detect local devices and provision local volumes out of them.

### Goals

- Automatic detection of available disks from nodes which can be used as PVs for OpenShift-cluster.
- Respond to the attach/detach events of the disks/devices.
- Have options for filtering particular kind of disks based on properties such as name, size, manufacturer, etc.

### Non-Goals

- Inherently protect against confilict with provisioners that own local devices on the same nodes automatic detection is configured to run.

## Proposal

The `local-storage-operator` is already capable of consuming local disks from the nodes and provisioning PVs out of them,
but the disk/device paths needs to explicitly specified in the `LocalVolume` CR.
The idea is to introduce a new Custom Resource, `AutoDetectVolumes`, that enables automated discovery and provisioning of local volume based PVs on a set of nodes addressed by one or more labelSelectors.

### Risks and Mitigations

- LSO will detect disks that contain data and/or are in-use via ensuring that the device:
- can be openened exclusively.
- is not read-only.
- is not removable.
- has no child partitions.
- has no FS signature.
- state (as outputted by `lsblk`) is not `suspended`
- Ensuring disks aren't re-detected as new or otherwise destroyed if their device path changes.
- This is already ensured by the current LSO approach of consuming disks by their `UUID`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document recovery from errors. What happens if user has made a mistake and wants to undo "creation" of AutodetectVolume object?

Copy link

@travisn travisn Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they want to remove a volume that was automatically created, they would need to:

  1. Update the CR so it doesn't pick up the device(s) automatically that aren't desired
  2. The admin would delete/cleanup the PV

@gnufied Is that along the lines of what you are asking?

Copy link
Contributor Author

@ashishranjan738 ashishranjan738 Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and does this documentation need to come with this PR? @gnufied

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, we just need to ensure that this is captured somewhere. As limitation or something, so as this can be documented later.

## Design Details

API scheme for `AutoDetectVolumes`:

```go

type AutoDetectVolumeList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata"`
Items []AutoDetectVolume `json:"items"`
}

type AutoDetectVolume struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata"`
Spec AutoDetectVolumeSpec `json:"spec"`
Status AutoDetectVolumeStatus `json:"status,omitempty"`
}

type AutoDetectVolumeSpec struct {
// Nodes on which the automatic detection policies must run.
// +optional
NodeSelector *corev1.NodeSelector `json:"nodeSelector,omitempty"`
// StorageClassName to use for set of matched devices
StorageClassName string `json:"storageClassName"`
// MinDeviceCount is the minumum number of devices that needs to be detected per node.
// If the match devices are less than the minCount specified then no PVs will be created.
MinDeviceCount int `json:"minDeviceCount"`
// +optional
// Maximum number of Devices that needs to be detected per node.
MaxDeviceCount int `json:"maxDeviceCount"`
// VolumeMode determines whether the PV created is Block or Filesystem. By default it will
// be block
// + optional
VolumeMode PersistentVolumeMode `json:"volumeMode,omitempty"`
// FSType type to create when volumeMode is Filesystem
// +optional
FSType string `json:"fsType,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we also discussed to have a boolean "dry-run" value which just finds the devices and not really creates them. We are also going to need to design a type that could hold the devices found on the node. We can use configmap or a new type...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the dry-run value approach. CRs already represent desired state. Using them for dry-run state just doesn't seem right. Discovery should store the available device list somewhere that the UI can load it, present the devices to the user, then create CRs when the user is ready to commit the devices.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but then that still leaves us with an unsolved problem. We want the ability for the UI to trigger a filtering of the set of all discovered devices without 1. Commiting to whatever the result it or 2. Forcing the UI to reimplement the filtering logic of the backend. What's your counter proposal here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI is given all the metadata it needs to present the available devices to the user and the UI has to display a reasonable UI around those properties. The filtering itself seems like it would already need to be baked into the UI, such as knowing which nodes hold which devices and how much storage is on each node.

Where is the filtering complicated enough that we wouldn't have reasonable confidence that the desired state generated in the CR wouldn't end up with the expected PVs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem for me lies in the duplication of effort that needs to be maintained in sync by two separate teams that may not be in regular contact with each other. It's not that it's difficult, it's that it's inefficient and more of a long-term maintenance headache than the proposed solution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jarrpa and I had a discussion on this... Most importantly, the discussion shouldn't block this design PR. Let's keep pushing to get this merged soon. Jose will start a new thread with this topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most importantly, the discussion shouldn't block this design PR

This discussion is the root of this PR. If the filtering is done in UI, why do we need complicated rules in LSO? UI, knowing which devices user selected / filtered, can create LocalVolume CRs directly, without any DeviceDiscoveryPolicy.

Note that output of LSO are Kubernetes PVs to be consumed by OCS (or anything else), not hardware inventory of all disks on a node.

From what I read above, some UI wants to present hardware inventory and once user selects / filters the devices then UI does something so PVs are created. We are discussing the something here in this PR, however, I haven't read anything about how the HW inventory gets to UI in the first place. Will there be another daemon running on all nodes collecting information about block devices, just like LSO, only presenting them to Kubernetes in another way than LSO does???

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane The reason we discussed having a separate thread for this topic is that this design doc is independent from the UI. The UI will need to build on top of this new design, but fundamentally this AutoDetectVolume CR doesn't need any UI around it in order to configure the devices. Though if something in this design is blocking the UI design, then I would certainly want to resolve it here.

From what I read above, some UI wants to present hardware inventory and once user selects / filters the devices then UI does something so PVs are created. We are discussing the something here in this PR

That something for the UI to create PVs would be to create the AutoDetectVolume CR(s). This is the signal that the UI is ready to commit to the device selection properties that the user decided on. That something could also be for the UI to create individual CRs for each of the devices, and there would be no need for the AutoDetectVolume CR. IMHO, the AutoDetectVolume is a better approach for the UI, but if we settle on the other approach during UI design, then so be it.

however, I haven't read anything about how the HW inventory gets to UI in the first place. Will there be another daemon running on all nodes collecting information about block devices, just like LSO, only presenting them to Kubernetes in another way than LSO does???

The thought is that the LSO daemons that are already running on all the nodes would be able to discover the available devices and store their properties somewhere for the UI to discover. That somewhere could be a ConfigMap for each node, stored as JSON. Or maybe a new CRD type should be created. The CRD seems like overkill since there is nothing for an operator to reconcile. LSO just needs to communicate metadata to the UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought is that the LSO daemons that are already running on all the nodes would be able to discover the available devices and store their properties somewhere for the UI to discover.

This was not part of this enhancement. Can we get all requirements on LSO into this enhancement, with all API objects and a complete work flow from a "blank" OCP cluster to a cluster with OCS installed (or at least ready to be installed)? Otherwise we may end up with half designed solution that won't make anyone happy.

To me it looks like this PR is trying to reinvent OpenEBS disk manager

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to summarize all the meetings and design discussions...

  1. OpenShift admins need a UI to select which devices will be exposed as local PVs
    • The UI needs to display available storage for devices that don't yet have PVs. The UI (pending UI design) would not show individual devices, but would show some representation of available storage and allow the user to filter devices out of the selection based on ssd/hdd, size, or other properties.
    • The UI would drive LSO to create the local PVs
    • Different types of devices should be available through different storage classes (ssds vs nvme vs hdd)
  2. OCS UI needs to detect what local PVs are available and allow the user to configure OCS to use some subset of the available storage.

We don't see #1 as being under the OCS scope since the local PVs could also be used outside of OCS.

@gnufied Since you've been in most of the discussions, does this match your expectation as well?

@jsafrane There are at least several other solutions in the community for local PV configuration and LSO overlaps with them. Our discussions have been to take a dependency on LSO. Are you suggesting LSO isn't the right place for this? Or simply pointing out a similar design?

// If specified, a list of tolerations to pass to the discovery daemons.
// +optional
Tolerations []corev1.Toleration `json:"tolerations,omitempty"`
// DeviceInclusionSpec is the filtration rule for including a device in the device discovery
// +optional
DeviceInclusionSpec *DeviceInclusionSpec `json:"deviceInclusionSpec"`
}

type AutoDetectVolumeStatus struct {
Status string `json:"status"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand this type.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding pod conditions instead of a single status string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure adding pod conditions will be a good idea here. I think the status should be reflecting the discovery progress.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think expanding this, whatever you had in mind, would be overkill for an MVP. We should be able ot get by with just a simple string for the time being, unless you have something else specific in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the support status of this new feature? I proposed to keep this alpha for now. but even then keeping this field as a simple string is not going to be enough. In LSO we typically try to stick to specs provided to us by api team - https://github.com/openshift/api/blob/master/operator/v1/types.go#L97

I am not sure if all fields in that type will be necessary but even for MVP/alpha feature, we should pick and propose fields we are going to use, so that this can be discussed during the design phase.

}

// DeviceMechanicalProperty holds the device's mechanical spec. It can be rotational, nonRotational or
// RotationalAndNonRotational
type DeviceMechanicalProperty string

const (
// The mechanical properties of the devices
// Rotational refers to magnetic disks
Rotational DeviceMechanicalProperty = "Rotational"
// NonRotational refers to ssds
NonRotational DeviceMechanicalProperty = "NonRotational"
// RotationalAndNonRotational refers to both magnetic and ssds
RotationalAndNonRotational DeviceMechanicalProperty = ""
)

type DeviceDiscoveryPolicyType string

const (
// The DeviceDiscoveryPolicies that will be supported by the LSO.
// These Discovery policies will based on lsblk's type output
// Disk represents a device-type of disk
Disk DeviceDiscoveryPolicyType = "disk"
// Part represents a device-type of partion
Part DeviceDiscoveryPolicyType = "part"
)

type DeviceInclusionSpec struct {
// DeviceTypes that should be used for automatic detection. This would be one of the types supported
// by the local-storage operator. Currently the supported types are: disk,part
// If the list is empty the default value will be `[disk]`.
DeviceTypes []DeviceDiscoveryPolicyType `json:"deviceType"`

// DeviceMechanicalProperty denotes whether Rotational or NonRotational disks should be used.
// by default, RotationalAndNonRotational which matches all disks.
// +optional
DeviceMechanicalProperty DeviceMechanicalProperty `json:"deviceMechanicalProperty"`

// MinSize is the minimum size of the device which needs to be included
// +optional
MinSize resource.Quantity `json:"minSize"`

// MaxSize is the maximum size of the device which needs to be included
// +optional
MaxSize resource.Quantity `json:"maxSize"`

// Models is a list of device models. If not empty, the device's model as outputted by lsblk needs
// to contain at least one of these strings.
// +optional
Models []string `json:"models"`

// Vendors is a list of device vendors. If not empty, the device's model as outputted by lsblk needs
// to contain at least one of these strings.
// +optional
Vendors []string `json:"vendors"`
}
```

- Once the `AutoDetectVolume` is created, the controller will:
- configure the local-static-provisioner to make a new StorageClass based on certain directories on the selected nodes.
- assign diskmaker daemons to the selected nodes.
- The diskmaker daemon will find devices that match the disovery policy and symlink them into the directory that the local-static-provisioner is watching.


### Test Plan

- The integration tests for the LSO already exist. These tests will need to be updated to test this feature.
- The tests must ensure that detection of devices are working/updating correctly.
- The tests must ensure that data corruption are not happening during auto detection of devices.

### Graduation Criteria

- Documentation exists for the behaviour of each configuration item.
- Unit and End to End tests coverage is sufficient.

#### Examples

```yaml
apiVersion: local.storage.openshift.io/v1alpha1
kind: AutoDetectVolume
metadata:
name: example-autodetect
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
- worker-1
storageClassName: example-storageclass
volumeMode: Block
minDeviceCount: 5
maxDeviceCount: 10
deviceInclusionSpec:
deviceTypes:
- disk
deviceMechanicalProperty: Rotational
minSize: 10G
maxSize: 100G
```

```yaml
apiVersion: local.storage.openshift.io/v1alpha1
kind: AutoDetectVolume
metadata:
name: example-autodetect
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-0
- worker-1
storageClassName: example-storageclass
volumeMode: filesystem
fstype: ext4
minDeviceCount: 5
maxDeviceCount: 10
deviceInclusionSpec:
deviceTypes:
- disk
- nvme
deviceMechanicalProperty: NonRotational
minSize: 10G
maxSize: 100G
models:
- SAMSUNG
- Crucial_CT525MX3
vendors:
- ATA
- ST2000LM
```

##### Removing a deprecated feature

- None of the features are getting deprecated

### Upgrade / Downgrade Strategy

Since this requires a new implementation no new upgrade strategy will be required.

### Version Skew Strategy

N/A

## Implementation History

N/A

## Drawbacks

N/A

## Alternatives

- Existing manual creation of LocalVolume CRs. With the node selector on the LocalVolume, a single CR can apply to an entire class of nodes (i.e., a machineset or a physical rack of homogeneous hardware). When a machineset is defined, a corresponding LocalVolume can also be created.
- Directly enhancing the LocalVolume CR to allow for auto discovery

The first approach requires some manual work and knowledge of underlying nodes, this makes it inefficient for large clusters. The second approach can introduce breaking change to the existing GA API.
Therefore this approach makes sense.