-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading utf-8 encoded json files #1312
Conversation
@MSanKeys963 @jakirkham This PR is ready for review. Also, because I am a first-time contributor, I am not able to trigger Github Actions myself. |
Thanks for sending the PR @nhz2. |
Thanks for this, @nhz2! I've triggered the actions and will go through the code ASAP. |
Codecov Report
@@ Coverage Diff @@
## main #1312 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 36 36
Lines 14744 14748 +4
=========================================
+ Hits 14744 14748 +4
|
Relaunched the Windows built after:
|
Green 🎉 in case anyone else wants to now take a closer look. ( For my part, reading this makes complete sense, but I do wonder what the impact on downstream code will be and therefore what types of warnings/modifications might be needed. |
This PR should have no effect on the JSON files zarr-python writes, they will still be ASCII only with non-ASCII characters escaped. It only affects what kind of JSON files zarr-python can read. |
Thanks, @nhz2. Guess I was more concerned whether any calling code (and/or subclasses) would need modification. |
In terms of the effect on other code, the main change is def json_loads(s: str) -> Dict[str, Any]:
"""Read JSON in a consistent way."""
return json.loads(ensure_text(s, 'ascii')) to def json_loads(s: bytes) -> Dict[str, Any]:
"""Read JSON in a consistent way."""
return json.loads(ensure_text(s, 'utf-8')) Functions called by
|
key: json_loads(store[key]) |
parse_metadata
:
Line 108 in 0bf0b3b
meta = json_loads(s) |
It is also called a few times in n5.py
ConsolidatedMetadataStore.__init__
:
Line 2884 in 0bf0b3b
meta = json_loads(self.store[metadata_key]) |
ConsolidatedMetadataStoreV3.__init__
:
zarr-python/zarr/_storage/v3.py
Line 543 in 0bf0b3b
meta = json_loads(self.store[metadata_key]) |
All of these cases directly, or indirectly through parse_metadata
, call json_loads
on an object obtained from a store. Since store values are bytes, not strings, I changed the type hints to match the current usage.
Thanks for the analysis, @nhz2! Sounds like it should not impact external consumers. Any objections to rolling this into 2.14, anyone? |
Thanks for the PR Nathan! 🙏 This is a good improvement to include. Noting some changes in the typing to still allow Ideally we would capture any |
Co-authored-by: jakirkham <[email protected]>
I think this has failed on the zarr-feedstock: conda-forge/zarr-feedstock#72 |
def test_utf8_encoding(self, zarr_version): | ||
|
||
# fixture data | ||
fixture = group(store=DirectoryStore('fixture')) | ||
assert fixture['utf8attrs'].attrs.asdict() == dict(foo='た') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is neither of these files were included in the Python sdist
@@ -0,0 +1 @@ | |||
{"foo": "た"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is missing from the sdist
on PyPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jakirkham. Argh, I thought I had caught these issues but likely only if there's code to generate it them as well. Are you already working on a fix or shall I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to go ahead. I might not get to this for a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget if we already discussed this before, but maybe we can make some fixes to CI to ensure we catch these issues in PRs. Wrote up some thoughts on this in issue ( #1347 )
{ | ||
"zarr_format": 2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this one
This is a temporary fix for the larger issue of out-of-tree testing described in zarr-developers#1347, but this should allow a release of 2.14.1 which passes on conda.
This is a temporary fix for the larger issue of out-of-tree testing described in #1347, but this should allow a release of 2.14.1 which passes on conda.
Fixes #1308
Currently, zarr-python errors when reading a json file with non-ascii characters encoded with utf-8, however, Zarr.jl writes json files that include non-ascii characters using utf-8 encoding.
This PR will enable zarr attributes written in Zarr.jl to be read by zarr-python.
TODO: