Term | Meaning |
---|---|
CR | Custom Resource |
CRD | Custom Resource Definition |
NFS | Network File System |
PV | Persistent Volume |
PVC | Persistent Volume Claim |
Term | Meaning |
---|---|
Controller | The process which drives a Kubernetes object from its current state to the desired state |
Source Type | The source type for the data being stored (e.g., S3, NFS) |
volumemanager |
The CRD Kind for VCK |
VCK provides basic volume and data management using volumes and volume sources in a Kubernetes cluster. It uses CRDs and controllers to create the volumes and volume sources and perform operations necessary for the data to be available to users. The user needs to have interactions only with CRs. The rest of the details are abstracted away by VCK.
The end goals of this project are listed below:
- Data source support: VCK should support exposing data from different sources such as S3, NFS and local disk as volumes.
- Data distribution: VCK should support data replication for distributed job types.
- Data affinity: VCK should enable data affinity and gravity. It should use existing mechanisms such as volume scheduling and node affinity when possible.
- Data caching: VCK should enable the pre-population of data if required.
- Data streaming: VCK should provide abstraction for streaming data services. Jobs should be able to start as soon as the first stream or batch of data is available.
- Job output: VCK should allow output data to be gathered when required.
- Garbage collection: VCK should evict data in case of disk pressure.
- VCK does not aim to be a solution to all your volume and data management problems.
- VCK does not solve any of the shortcomings or drawbacks with Kubernetes. If there is an issue in Kubernetes, the same issue exists with VCK.
Using VCK we extend the Kubernetes API to include a new CRD called
volumemanager
. The schema to create a volumemanager
CR is described
below:
Field Name | Type | definition |
---|---|---|
apiVersion * |
string |
API version of volume manager |
kind * |
enum: VolumeManager |
Type. Only allowed value is VolumeManager |
metadata.name * |
string |
Name of the volume manager instance |
spec.volumes * |
array of volumeConfig |
Volumes and data information |
volumeConfig.id * |
string |
An identifier for the volume |
volumeConfig.replicas * |
int |
Number of replicas required on distinct compute nodes |
volumeConfig.sourceType * |
string |
Source type of the dataset to be used by the volume (e.g., S3, NFS) |
volumeConfig.accessMode * |
string |
Type of access mode |
volumeConfig.capacity * |
string |
Size requested for the volume |
volumeConfig.labels * |
map[string]string |
Any labels required for the volume |
volumeConfig.options |
map[string]string |
Any options required for the volume |
volumeConfig.nodeAffinity |
NodeAffinity | The node affinity to restrict or prefer the data placement |
volumeConfig.tolerations |
Tolerations | Any tolerations the CR should respect |
spec.state* |
enum: Pending , Running , Failed , Completed |
The desired state for this volume manager instance |
status.volumes |
array of volume |
volume information |
volume.id |
string |
An identifier for the volume. There is a one-to-one mapping between volumeConfig.id and volumeClaim.id |
volume.volumeSource |
VolumeSource | A volume source associated with the volume |
volume.message |
string |
A message associated with the state of this volume |
volume.nodeAffinity |
NodeAffinity | A node affinity to guide the pod scheduling for data gravity |
status.state |
enum: Pending , Running , Failed , Completed |
The current state of this volume manager instance |
status.message |
string |
A message associated with the current state of this volume manager instance |
Fields marked with *
are mandatory.
The VCK controller uses volumes, volume sources, Pods to manage volumes and the associated data in Kubernetes. The following are the responsibilities of the controller:
Data source support: The controller will transparently support different data sources. Some of the data sources such as NFS are natively supported by PVs.
Data distribution: In case of a shared file system, data distribution will be handled using access modes in volumes. There are three different types of access modes:
- ReadWriteOnce – the volume can be mounted as read-write by a single node.
- ReadOnlyMany – the volume can be mounted read-only by many nodes.
- ReadWriteMany – the volume can be mounted as read-write by many nodes.
These access modes can be used as long as the shared file system supports it.
If the data is stored somewhere else (e.g., S3) and it needs to be available in
the source path, the controller is responsible to download the data and
replicate it across N
number of nodes as specified by the
volumeConfig.replicas
field in the API schema. Depending on the source type, either
PVs of local volume source type or hostPath volumes are
created.
Data affinity: When required, data affinity will be transparently supported using either volume scheduling or node affinity features in Kubernetes.
Data caching: As long as the backing volume is available, it can be used
in any pod. The controller will be responsible to provide the volume source
and the node affinity associated with a volume
.
Data streaming: Data services, such as Aeon, use a caching mechanism to provide data streaming services. As an example, if Aeon is used for data streaming service, it uses a cache to stream the data to a compute node. The location and size of the cache is determined by the parameters provided by the job and it is located in the local host. When possible Aeon uses these cache for data caching.
For supporting data streaming in the above cases, a fixed
hostPath or local
volume source type backed PV can act as a cache for the data.
In this case, controller will be responsible to make the PVC ready as soon as a
hostPath
or local
backed PV is created.
Job output: When required, the job output can be gathered in a volume as
long as the backing file system supports ReadWriteOnce
or ReadWriteMany
access
mode.
Garbage collection: The controller is responsible to evict unused data from the node based on metrics such as disk pressure. Similarly, the controller will also delete unused PVs and PVCs. When deleting PVs and PVCs, we will take the reclaiming policies for a PV into consideration. A simple mechanism such as least recently used (LRU) will be used to determine the order in which the data set, PV and PVCs should be evicted.
The relationship between a volume and data is established using
volumeConfig.sourceType
and a new data handler for that source type.
As the name implies, volumeConfig.sourceType
provides the type of the data source
(e.g., S3 or NFS). The data handler for each source type provides the call-back
functions for a volumeConfig
of that particular source type. These call-back
functions provide the logic to be executed when the CR containing the
volumeConfig
is added, updated or deleted. This data handler should implement
the DataHandler
interface in handlers.go.
For each sourceType
, a new data handler must be implemented. For more
information on adding a new data handler, read the developer manual.
Brief description of source type support is provided below. For more information on usage, refer to the [user manual][user-doc].
Source Type | Phase | Description |
---|---|---|
S3-Dev | Deprecated | VCK will download the files from a specified S3 bucket and make it available for consumption in a node. This source type should only be used for development and testing purposes. |
S3 | Supported | VCK will download the files from a specified S3 bucket and provide nodes where hostPath volumes can be used. |
NFS | Supported | VCK will make the specified path from an NFS server available for consumption. |
Pachyderm | Supported | VCK will download the pachyderm repo data and make it available for consumption on a specified number of nodes |
Aeon | Design | - |