Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shards to valid_encodings to enable sharded Zarr writing #9948

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jacobbieker
Copy link

@jacobbieker jacobbieker commented Jan 13, 2025

Adds shards to the list of valid_encodings in the zarr backend, so that sharded Zarr V3s can be written.

Copy link

welcome bot commented Jan 13, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@jhamman
Copy link
Member

jhamman commented Jan 17, 2025

Thanks @jacobbieker for opening a PR. Do you have time to add a roundtrip test for this? This test would be a good test to emulate:

def test_chunk_encoding(self) -> None:
# These datasets have no dask chunks. All chunking specified in
# encoding
data = create_test_data()
chunks = (5, 5)
data["var2"].encoding.update({"chunks": chunks})
with self.roundtrip(data) as actual:
assert chunks == actual["var2"].encoding["chunks"]

@jacobbieker
Copy link
Author

jacobbieker commented Jan 17, 2025

Thanks @jacobbieker for opening a PR. Do you have time to add a roundtrip test for this? This test would be a good test to emulate:

def test_chunk_encoding(self) -> None:
# These datasets have no dask chunks. All chunking specified in
# encoding
data = create_test_data()
chunks = (5, 5)
data["var2"].encoding.update({"chunks": chunks})
with self.roundtrip(data) as actual:
assert chunks == actual["var2"].encoding["chunks"]

Yep! I think I've added one that works for that, and updated the Zarr V3 loading to include the shards. It seems to pass the tests locally so far at least.

Edit: Ah it seems to cause some issues with threads on 3.12 potentially?

@jhamman
Copy link
Member

jhamman commented Jan 21, 2025

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Jan 21, 2025

@jhamman The dask distributed zarr integration test is also failing on main. Would be great to have a fix applied.

@kmuehlbauer
Copy link
Contributor

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

See #9967

@jacobbieker
Copy link
Author

I seem to remember fixing this in the Dask test suite by manually cleaning up thread resources. This may help: https://github.com/dask/dask/blob/e04734b4d8959ba259801f2e2a490cb4ee8d891f/dask/tests/test_distributed.py#L338-L358

See #9967

@jhamman yeah, this change in #9967 fixes the tests and the failing ones pass.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @jacobbieker !

@jacobbieker
Copy link
Author

Thanks for this @jacobbieker !

No problem! Excited to use this a lot! I am not able to merge this PR, is it possible you can? Its saying I'm not authorized.

@jhamman jhamman requested a review from dcherian January 24, 2025 14:21
@jhamman
Copy link
Member

jhamman commented Jan 24, 2025

We'll let @dcherian do a final review and/or merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to write to sharded Zarr V3 because shards not in valid_encodings
4 participants