-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow override of dup. field error, and zlib compression level #65
Conversation
Codecov Report
@@ Coverage Diff @@
## master #65 +/- ##
==========================================
+ Coverage 86.64% 87.39% +0.74%
==========================================
Files 6 6
Lines 352 357 +5
Branches 113 114 +1
==========================================
+ Hits 305 312 +7
+ Misses 23 22 -1
+ Partials 24 23 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good contribution.
My major issue with it is the fact that it requires changing private members in the module.
I would make these parameters as apart of the read function instead.
Additionally, would be nice if you could add documentation regarding this (edit the docstring or look at the RST files in the docs directory).
Also, issue with Travis failing on Python 2.7, need to resolve this as well.
nrrd/tests/test_reading.py
Outdated
|
||
dup_message = "Duplicate header field: 'type'" | ||
try: | ||
header = nrrd.read_header(header_txt_tuple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can make this one line with the assertRaisesRegex function. Check here for an example using https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_writing.py#L178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
nrrd/tests/test_reading.py
Outdated
expected_header = {u'type': 'float', u'dimension': 3} | ||
header_txt_tuple = ('NRRD0005', 'type: float', 'dimension: 3', 'type: float') | ||
|
||
dup_message = "Duplicate header field: 'type'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with comment below, would remove dup_message and just put as string in assertRaisesRegex function
Addressed.
I made the compression level a parameter of |
3eb3c44
to
7a93bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that in practice that duplicate fields happen in NRRD files. If you have a duplicate field, how do you want it to handle it? Right now it overwrites and saves the last field given. Is this the best way to handle it?
Also, I was thinking maybe it would be worthwhile to write a test for testing the compression on gzip and bzip2. My thoughts are to set the compression level to something besides 9 for both of these and then test the file size is a certain size?
nrrd/writer.py
Outdated
@@ -263,7 +267,7 @@ def _write_data(data, fh, header): | |||
|
|||
# Construct the compressor object based on encoding | |||
if header['encoding'] in ['gzip', 'gz']: | |||
compressobj = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16) | |||
compressobj = zlib.compressobj(compression_level, zlib.DEFLATED, zlib.MAX_WBITS | 16) | |||
elif header['encoding'] in ['bzip2', 'bz2']: | |||
compressobj = bz2.BZ2Compressor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the compression level to parameter here so that it applies for bzip2 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
nrrd/writer.py
Outdated
@@ -124,6 +124,10 @@ def write(filename, data, header={}, detached_header=False, custom_field_map=Non | |||
custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional | |||
Dictionary used for parsing custom field types where the key is the custom field name and the value is a | |||
string identifying datatype for the custom field. | |||
compression_level : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use :class:int
here just so Sphinx will link to it. Just to stay consistent with the other parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, for consistency. But linking docs for native types is excessive IMHO, and it makes the docstrings ugly outside of sphinx.
nrrd/writer.py
Outdated
@@ -124,6 +124,10 @@ def write(filename, data, header={}, detached_header=False, custom_field_map=Non | |||
custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional | |||
Dictionary used for parsing custom field types where the key is the custom field name and the value is a | |||
string identifying datatype for the custom field. | |||
compression_level : int | |||
Int specifying compression level, when applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of when applicable, maybe be more specific and say when encoding is set to compression for BZIP or GZIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
nrrd/reader.py
Outdated
@@ -15,6 +16,10 @@ | |||
|
|||
_NRRD_REQUIRED_FIELDS = ['dimension', 'type', 'encoding', 'sizes'] | |||
|
|||
# Duplicated fields are prohibited by the spec, but do occur in the wild. | |||
# Set True to allow duplicate fields, with a warning. | |||
_NRRD_ALLOW_DUPLICATE_FIELD = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a parameter to the read & read_header functions. As I said, I'm not a fan of having to edit a private member in a module, one that especially isn't documented.
I understand your apprehension not to document this because it is not supported by the NRRD specification, but I think whatever parameters we add, they should be documented.
What you can do though, in the docstring, you can put a warning saying it is not supported. Something like this (I'm not sure if it can go in the argument list, but you can put it in the body of the docstring for sure).
.. warning::
Allowing duplicate fields with :obj:`allow_duplicate_fields` is not explicitly supported in the NRRD specification. Use this parameter at your own risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It is no longer a private field, but I will not be adding it to read
. This really is a special case, just like the existing chunk size parameter.
nrrd/reader.py
Outdated
if not _NRRD_ALLOW_DUPLICATE_FIELD: | ||
raise NRRDError(dup_message) | ||
|
||
warnings.warn(dup_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with a duplicated field, the old field will be overwritten. Would it be better if we added an integer to the end of the field. Something like 2 so that it doesn't conflict but both of the values are saved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to overwrite. If the values were duplicated then there would probably need to be a hack in the save path to remove the second key while saving, which would just be confusing.
6fb7586
to
4183a59
Compare
Codecov Report
@@ Coverage Diff @@
## master #65 +/- ##
==========================================
+ Coverage 86.64% 87.39% +0.74%
==========================================
Files 6 6
Lines 352 357 +5
Branches 113 114 +1
==========================================
+ Hits 305 312 +7
+ Misses 23 22 -1
+ Partials 24 23 -1
Continue to review full report at Codecov.
|
Yes
Done. For now I commented out the assert about file size difference for bz2, because for the binary ball data, output is the same size at all levels. |
LGTM 👍 I guess I wasn't fully thinking of the application for the duplicate field section. I think I agree that it should not be a function parameter in The tests look good! |
Thanks! |
Allow override of duplicate field error, with a warning instead (occasionally occur with buggy writers)
Allow changing the compression level - can significantly improve write performance at minimal space cost. As an illustrative example:
timing test details
Output files below (test.nrrd is original, and zl_default is roughly level 6 per the docs):
So, level 9 is ~10x slower than level 1, but saves only 1 MB. This is brain MRI data.