Skip to content
This repository has been archived by the owner on May 11, 2024. It is now read-only.

Latest commit

 

History

History
192 lines (159 loc) · 13.1 KB

arch.md

File metadata and controls

192 lines (159 loc) · 13.1 KB

Architecture: Volume Controller for Kubernetes (VCK)

Acronyms

Term Meaning
CR Custom Resource
CRD Custom Resource Definition
NFS Network File System
PV Persistent Volume
PVC Persistent Volume Claim

Concepts

Term Meaning
Controller The process which drives a Kubernetes object from its current state to the desired state
Source Type The source type for the data being stored (e.g., S3, NFS)
volumemanager The CRD Kind for VCK

Overview

VCK provides basic volume and data management using volumes and volume sources in a Kubernetes cluster. It uses CRDs and controllers to create the volumes and volume sources and perform operations necessary for the data to be available to users. The user needs to have interactions only with CRs. The rest of the details are abstracted away by VCK.

Goals

The end goals of this project are listed below:

  • Data source support: VCK should support exposing data from different sources such as S3, NFS and local disk as volumes.
  • Data distribution: VCK should support data replication for distributed job types.
  • Data affinity: VCK should enable data affinity and gravity. It should use existing mechanisms such as volume scheduling and node affinity when possible.
  • Data caching: VCK should enable the pre-population of data if required.
  • Data streaming: VCK should provide abstraction for streaming data services. Jobs should be able to start as soon as the first stream or batch of data is available.
  • Job output: VCK should allow output data to be gathered when required.
  • Garbage collection: VCK should evict data in case of disk pressure.

Non-Goals

  • VCK does not aim to be a solution to all your volume and data management problems.
  • VCK does not solve any of the shortcomings or drawbacks with Kubernetes. If there is an issue in Kubernetes, the same issue exists with VCK.

API Schema

Using VCK we extend the Kubernetes API to include a new CRD called volumemanager. The schema to create a volumemanager CR is described below:

Field Name Type definition
apiVersion* string API version of volume manager
kind* enum: VolumeManager Type. Only allowed value is VolumeManager
metadata.name* string Name of the volume manager instance
spec.volumes* array of volumeConfig Volumes and data information
volumeConfig.id* string An identifier for the volume
volumeConfig.replicas* int Number of replicas required on distinct compute nodes
volumeConfig.sourceType* string Source type of the dataset to be used by the volume (e.g., S3, NFS)
volumeConfig.accessMode* string Type of access mode
volumeConfig.capacity* string Size requested for the volume
volumeConfig.labels* map[string]string Any labels required for the volume
volumeConfig.options map[string]string Any options required for the volume
volumeConfig.nodeAffinity NodeAffinity The node affinity to restrict or prefer the data placement
volumeConfig.tolerations Tolerations Any tolerations the CR should respect
spec.state* enum: Pending, Running, Failed, Completed The desired state for this volume manager instance
status.volumes array of volume volume information
volume.id string An identifier for the volume. There is a one-to-one mapping between volumeConfig.id and volumeClaim.id
volume.volumeSource VolumeSource A volume source associated with the volume
volume.message string A message associated with the state of this volume
volume.nodeAffinity NodeAffinity A node affinity to guide the pod scheduling for data gravity
status.state enum: Pending, Running, Failed, Completed The current state of this volume manager instance
status.message string A message associated with the current state of this volume manager instance

Fields marked with * are mandatory.

The VCK Controller

The VCK controller uses volumes, volume sources, Pods to manage volumes and the associated data in Kubernetes. The following are the responsibilities of the controller:

Data source support: The controller will transparently support different data sources. Some of the data sources such as NFS are natively supported by PVs.

Data distribution: In case of a shared file system, data distribution will be handled using access modes in volumes. There are three different types of access modes:

  • ReadWriteOnce – the volume can be mounted as read-write by a single node.
  • ReadOnlyMany – the volume can be mounted read-only by many nodes.
  • ReadWriteMany – the volume can be mounted as read-write by many nodes.

These access modes can be used as long as the shared file system supports it.

If the data is stored somewhere else (e.g., S3) and it needs to be available in the source path, the controller is responsible to download the data and replicate it across N number of nodes as specified by the volumeConfig.replicas field in the API schema. Depending on the source type, either PVs of local volume source type or hostPath volumes are created.

Data affinity: When required, data affinity will be transparently supported using either volume scheduling or node affinity features in Kubernetes.

Data caching: As long as the backing volume is available, it can be used in any pod. The controller will be responsible to provide the volume source and the node affinity associated with a volume.

Data streaming: Data services, such as Aeon, use a caching mechanism to provide data streaming services. As an example, if Aeon is used for data streaming service, it uses a cache to stream the data to a compute node. The location and size of the cache is determined by the parameters provided by the job and it is located in the local host. When possible Aeon uses these cache for data caching.

For supporting data streaming in the above cases, a fixed hostPath or local volume source type backed PV can act as a cache for the data. In this case, controller will be responsible to make the PVC ready as soon as a hostPath or local backed PV is created.

Job output: When required, the job output can be gathered in a volume as long as the backing file system supports ReadWriteOnce or ReadWriteMany access mode.

Garbage collection: The controller is responsible to evict unused data from the node based on metrics such as disk pressure. Similarly, the controller will also delete unused PVs and PVCs. When deleting PVs and PVCs, we will take the reclaiming policies for a PV into consideration. A simple mechanism such as least recently used (LRU) will be used to determine the order in which the data set, PV and PVCs should be evicted.

Relationship Between Volume and Data

The relationship between a volume and data is established using volumeConfig.sourceType and a new data handler for that source type.

As the name implies, volumeConfig.sourceType provides the type of the data source (e.g., S3 or NFS). The data handler for each source type provides the call-back functions for a volumeConfig of that particular source type. These call-back functions provide the logic to be executed when the CR containing the volumeConfig is added, updated or deleted. This data handler should implement the DataHandler interface in handlers.go.

For each sourceType, a new data handler must be implemented. For more information on adding a new data handler, read the developer manual.

Source Type Support Status

Brief description of source type support is provided below. For more information on usage, refer to the [user manual][user-doc].

Source Type Phase Description
S3-Dev Deprecated VCK will download the files from a specified S3 bucket and make it available for consumption in a node. This source type should only be used for development and testing purposes.
S3 Supported VCK will download the files from a specified S3 bucket and provide nodes where hostPath volumes can be used.
NFS Supported VCK will make the specified path from an NFS server available for consumption.
Pachyderm Supported VCK will download the pachyderm repo data and make it available for consumption on a specified number of nodes
Aeon Design -