Skip to content

Add Sharding Support #877

@jstriebel

Description

@jstriebel

Update 2023-11-19:

Sharding is now formalized as a codec for Zarr v3 via the Zarr Enhancement Proposal (ZEP) 2.

A new efficient implementation of sharding is being discussed as part of #1569.


TLDR: We want to add sharding to zarr. There are some PRs as starting points for further discussions, as well as issue zarr-developers/zarr-specs#127 to update the spec.

Please see #877 (comment) for a comparison of the implementation approaches.


Currently, zarr maps one array chunk to one storage key, e.g. one file for the DirectoryStore. It would be great to decouple the concept of chunks (e.g. one compressible unit) and storage keys, since the storage might be optimized for larger data per entry and might have an upper limit for the number of entries, such as the file block size and maximum inode number for disk-storage. This does not necessarily fit the access patterns of the data, so chunks might need to be smaller than one storage key.

For those reasons, we would like to add sharding support to zarr. One shard corresponds to one storage key, but can contain multiple chunks:
sharding

We, this is scalable minds, will provide a PR for initial sharding support in zarr-python. This work is funded by the CZI through the EOSS program.

We see the following requirements to implement sharding support:

  1. Add shard abstraction (multiple chunks in a single file/storage object)
  2. Chunks should continue to be compressible (blosc, lz4, zstd, …)
  3. Chunks should be stored in a continuous byte stream within the shard file
  4. Shard abstraction should be transparent when reading/writing
  5. Support arbitrary array sizes (not necessarily chunk/shard aligned)
  6. Support arbitrary number of dimensions
  7. Possibly store chunks in Morton-order within a shard file

With this issue and an accompanying prototype PR we want to start a discussion about possible implementation approaches. Currently, we see four approaches, which have different pros and cons:

  1. Implement shard abstraction as a storage handler wrapper
    • Pro: Can combine logical chunks into single shard keys, which are mapped to chunks of an underlying storage handler. Translation of chunks to shard only needs to happen within the minimal API of the storage handler.
    • Pro: Partial reads and writes to the underlying store are encapsulated in a single place.
    • Con: Sharding should be configured and persisted per array rather than per store.
  2. Implement shard abstraction as a compressor wrapper (current chunks = shards which contain subchunks).
    • Pro: The current notion of chunks as a storage-key stays unchanged in the storage layer. They just correspond to what we normally call "shards" then.
    • Con: Partial reads and writes per storage-key need to be passed through all intermediate layers in the array implementation.
    • Con: Addressing a sub-chunk is not intended. This would need to change the concept that a chunk is usually read or written as a whole. This is already partially the case for blosc compression, but still asserts that compression handles data of the size of a chunk, even if not all data is needed. This approach would break this assertion and needs significant changes in the data retrieval of the array implementation.
  3. Implement shard abstraction via a translation layer on the array
    • Pro: Fits the conceptual level where sharding is configured and persisted per array.
    • Con: Adding sharding logic to the array make the array implementation even more complex. Chunk/Shard-translation and partial reads/writes would need to be added at multiple points.
  4. Implement blosc2 into Zarr (special case of 2)
    • Pro: Sharding is already implemented in the compression.
    • Con: Still, partial read and writes need to happen efficiently throughout the array implementation, see cons of approach 2.
    • Con: One core feature of zarr is the decoupling of the manipulation API, storage layer and compression. This approach tightly couples a single compression with sharding, which rather concerns storage handling. It is not possible to use sharding with other compressions then.

Based on this assessment, we currently favor a combination of approach 1 and 3. This keeps the pros of implementing sharding as a wrapper to the storage handler, which is a nice abstraction level. However, sharding should be configured and persisted per array rather than with the store config, so we propose to use a ShardingStore only internally, whereas user-facing configuration happens on the array, e.g. similar to the shape.

To make further discussions more concrete and tangible, I added an initial prototype for this implementation approach: #876. This prototype still lacks a number of features which are noted in the PR itself, but contains clear paths to tackle those.

We invite all interested parties to discuss our initial proposals and assessment, as well as the initial draft PR. Based on the discussion and design we will implement the sharding support, as well as add automated tests and documentation for the new feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions