Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header of zarr3 array with bytes codec without configuration cannot be parsed #8233

Open
amotta opened this issue Nov 27, 2024 · 0 comments
Open
Assignees

Comments

@amotta
Copy link

amotta commented Nov 27, 2024

I have used TensorStore and its zarr3 driver to create and write out a sharded, three-dimensional array of unsigned 8 bit integers. This resulted in the following zarr.json metadata file (shown is the output of cat zarr.json | python -m json.tool)

{
    "chunk_grid": {
        "configuration": {
            "chunk_shape": [
                1024,
                1024,
                1024
            ]
        },
        "name": "regular"
    },
    "chunk_key_encoding": {
        "name": "default"
    },
    "codecs": [
        {
            "configuration": {
                "chunk_shape": [
                    32,
                    32,
                    32
                ],
                "codecs": [
                    {
                        "name": "bytes"
                    }
                ],
                "index_codecs": [
                    {
                        "configuration": {
                            "endian": "little"
                        },
                        "name": "bytes"
                    },
                    {
                        "name": "crc32c"
                    }
                ]
            },
            "name": "sharding_indexed"
        }
    ],
    "data_type": "uint8",
    "fill_value": 0,
    "node_type": "array",
    "shape": [
        5400,
        2000,
        4000
    ],
    "zarr_format": 3
}

Note that the inner "codecs" contains a "bytes" codec without "configuration". This is not currently handled by webKnossos. The dataset with the problematic layer may be imported, but no data can be loaded and errors are reported on the console.

According to the Zarr v3 specification, the "endian" configuration value is optional for byte-sized values. This is already handled by webKnossos. However, it's unclear to me whether this means that the entire "configuration" may be omitted. The specification does seem to imply that the "configuration" is optional:

The codec object may also contain a configuration object which consists of the parameter names and values as defined by the corresponding codec specification.

As a workaround, I have added an empty "configuration" to the problematic bytes codec. This way, reading from the Zarr3 layer works.

@amotta amotta added the bug label Nov 27, 2024
@fm3 fm3 added the backend label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants