Existing data disks follow the following path:
- Disks are attached to hosts which performs all network calls to Azure Storage on behalf of VMs.
- Disk are then availed via hypervisor to VM SCSI devices.
Dysk are block devices running in your VMs are all network calls to Azure Storage are originated from your kernel.
- Dysk LKM
- Char Device: Because we don't have H/W involved, there are not IRQ to notify kernel when disks are pluggedi/unplugged. We relay on IOCTL performed against this char device.
- Dysk bdd: Manages current list of mount dysks and integrates dysks with kernel block I/O interfaces.
- Dysk Block Device: Created dynamically in resonse to
mount
IOCTL. - Worker: Performs asynchronous execution.
- AZ: manages Azure page blob REST API calls. All calls are nonblocking.
- Dysk Client
- Go based client side package (executes IOCTL) that can be wrapped in any executable.
- CLI that wraps the above package.
Due to the fact that Linux kernel does not support TLS all calls are executed against the HTTP endpoint its highly advisable that you use Azure VNET service endpoints. This will not expose your storage account (nor its traffic) outside your VNET. On-Prem VMs can VPN into this VNET to access the storage accounts.
Disks can fail for many reasons such as network(non transient failure), page blob deletion and breaking Azure Storage lease. Once any of these conditions is true, the following is executed:
- The failed disk is marked as failed.
- Any new requests will be canceled with -EIO returned to userspace.
- Any pending request will be canceled and -EIO is returned to userspace.
- Disk is removed gracefuly.
at which point processes attempting to read/write from disks will handle the error using standard error handling.
Once Azure Storage throttle a disk, dysk gracefully handles this event and pauses new I/O requests for 3 seconds before retrying the requests.
Dysk is designed to work in high density orchesterated compute envrionment. Specifically, containers orchesterted by Kubernetes. In this scenario pods declare thier storage requirements via specs (PV/PVC)[https://kubernetes.io/docs/concepts/storage/persistent-volumes/]. At any point of time a node or more carrying a large number of containers and disk might be in a network split. Where containers keep on running but nodes fail to report healthy state to master. Because disks are not attached perse a volume driver can break the existing lease and create new one then mount dysks on healthy nodes. Existing dysks will gracefull fail as described above.