-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing quiet and signalling NaNs in Zarr #194
Comments
Sorry for the quiet here, @tomwhite. That should very much be interpreted as "nice question!" 😄 In discussing today during https://zarr.dev/zeps/meetings/, the decision was to transfer this to the zarr-spec repo and try to deal with it as a part of v3. That likely doesn't fix your immediate issue, but it has a lot more chance of actually getting solved. I'll defer to @jstriebel for detailing the proposed solution. |
Hey @tomwhite, Thanks for bringing up that question! In the current v3 spec, the
This would also allow to define any NaN in the fill value (even specifying other bits in the NaN is possible then). The comparison with provided values should probably be done on a binary level by the implementation in this case. Does this suffice for your use-case? |
Thanks for picking this up @joshmoore and @jstriebel.
This sounds perfect! |
@jstriebel: what labels are appropriate here then? |
A IEEE 754 NaN is not a single value, but a set of possible values. While it's possible to store different NaN values in Zarr, there are some subtleties, particularly with fill values.
In sgkit's VCF Zarr format we use a quiet NaN to indicate that a float value is missing, and a signalling NaN for padding to encode variable length (ragged) arrays. Similarly tskit uses a quiet NaN to indicate missing.
Since the Zarr fill value encoding only allows a single
NaN
value, we can't specify a fill value since it's not possible to make it a quiet or signalling NaN.In Zarr 2.11.0 there was a change where chunks with data equal to the fill value are no longer written to disk by default. This doesn't work for applications using quiet or signalling NaNs, since Zarr doesn't distinguish NaN values when determining if all the elements of a chunk are equal. So a chunk that has a mixture of regular and quiet/signalling NaNs will not be stored. The workaround is to set
write_empty_chunks=True
. (However, this is not possible in xarray until the next release after v2022.03.0 pydata/xarray#6348. Also I'm not sure it's possible to set the fill value for floats in xarray.)Are there changes that could be made to Zarr to make working with different NaN values easier?
The text was updated successfully, but these errors were encountered: