Allow override of dup. field error, and zlib compression level #65

ihnorton · 2018-08-24T20:53:09Z

Allow override of duplicate field error, with a warning instead (occasionally occur with buggy writers)
Allow changing the compression level - can significantly improve write performance at minimal space cost. As an illustrative example:

timing test details

import nrrd
nrrd.reader._NRRD_ALLOW_DUPLICATE_FIELD = True

img,hdr = nrrd.read("/path/to/dwi.nhdr")
%timeit -n1 -r1 nrrd.write("test.nrrd", img, hdr)

193 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

hdr['encoding'] = 'gz'
%timeit -n1 -r1 nrrd.write("test_zl_9.nrrd", img, hdr)

20.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

nrrd.writer._ZLIB_LEVEL = -1
hdr['encoding'] = 'gz'
%timeit -n1 -r1 nrrd.write("test_zl_default.nrrd", img, hdr)

9.94 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

nrrd.writer._ZLIB_LEVEL = 1
hdr['encoding'] = 'gz'
%timeit -n1 -r1 nrrd.write("test_zl_1.nrrd", img, hdr)

2.68 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Output files below (test.nrrd is original, and zl_default is roughly level 6 per the docs):

!ls -lah *.nrrd

    -rw-r--r--  1 inorton  staff    61M Aug 24 16:38 test.nrrd
    -rw-r--r--  1 inorton  staff    27M Aug 24 16:39 test_zl_1.nrrd
    -rw-r--r--  1 inorton  staff    26M Aug 24 16:38 test_zl_9.nrrd
    -rw-r--r--  1 inorton  staff    26M Aug 24 16:38 test_zl_default.nrrd

So, level 9 is ~10x slower than level 1, but saves only 1 MB. This is brain MRI data.

codecov-io · 2018-08-24T21:03:53Z

Codecov Report

Merging #65 into master will increase coverage by 0.74%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #65      +/-   ##
==========================================
+ Coverage   86.64%   87.39%   +0.74%     
==========================================
  Files           6        6              
  Lines         352      357       +5     
  Branches      113      114       +1     
==========================================
+ Hits          305      312       +7     
+ Misses         23       22       -1     
+ Partials       24       23       -1

Impacted Files	Coverage Δ
nrrd/writer.py	`87.17% <100%> (ø)`	⬆️
nrrd/reader.py	`81.98% <100%> (+1.85%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caf8506...4183a59. Read the comment docs.

addisonElliott

This is a good contribution.

My major issue with it is the fact that it requires changing private members in the module.

I would make these parameters as apart of the read function instead.

Additionally, would be nice if you could add documentation regarding this (edit the docstring or look at the RST files in the docs directory).

Also, issue with Travis failing on Python 2.7, need to resolve this as well.

addisonElliott · 2018-08-24T22:53:43Z

nrrd/tests/test_reading.py

+
+        dup_message = "Duplicate header field: 'type'"
+        try:
+            header = nrrd.read_header(header_txt_tuple)


Can make this one line with the assertRaisesRegex function. Check here for an example using https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_writing.py#L178

addisonElliott · 2018-08-24T22:54:24Z

nrrd/tests/test_reading.py

+        expected_header = {u'type': 'float', u'dimension': 3}
+        header_txt_tuple = ('NRRD0005', 'type: float', 'dimension: 3', 'type: float')
+
+        dup_message = "Duplicate header field: 'type'"


Along with comment below, would remove dup_message and just put as string in assertRaisesRegex function

ihnorton · 2018-08-25T02:46:45Z

Also, issue with Travis failing on Python 2.7, need to resolve this as well.

Addressed.

I would make these parameters as apart of the read function instead.

I made the compression level a parameter of write and added the docstring.
But the field error override is kind of odd/special, and violates the nrrd spec, so it probably shouldn't be advertised.

addisonElliott

You mentioned that in practice that duplicate fields happen in NRRD files. If you have a duplicate field, how do you want it to handle it? Right now it overwrites and saves the last field given. Is this the best way to handle it?

Also, I was thinking maybe it would be worthwhile to write a test for testing the compression on gzip and bzip2. My thoughts are to set the compression level to something besides 9 for both of these and then test the file size is a certain size?

addisonElliott · 2018-08-25T03:30:08Z

nrrd/writer.py

@@ -263,7 +267,7 @@ def _write_data(data, fh, header):

        # Construct the compressor object based on encoding
        if header['encoding'] in ['gzip', 'gz']:
-            compressobj = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
+            compressobj = zlib.compressobj(compression_level, zlib.DEFLATED, zlib.MAX_WBITS | 16)
        elif header['encoding'] in ['bzip2', 'bz2']:
            compressobj = bz2.BZ2Compressor()


Add the compression level to parameter here so that it applies for bzip2 as well

addisonElliott · 2018-08-25T03:31:23Z

nrrd/writer.py

@@ -124,6 +124,10 @@ def write(filename, data, header={}, detached_header=False, custom_field_map=Non
    custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional
        Dictionary used for parsing custom field types where the key is the custom field name and the value is a
        string identifying datatype for the custom field.
+    compression_level : int


Use :class:int here just so Sphinx will link to it. Just to stay consistent with the other parameters.

Ok, for consistency. But linking docs for native types is excessive IMHO, and it makes the docstrings ugly outside of sphinx.

addisonElliott · 2018-08-25T03:32:00Z

nrrd/writer.py

@@ -124,6 +124,10 @@ def write(filename, data, header={}, detached_header=False, custom_field_map=Non
    custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional
        Dictionary used for parsing custom field types where the key is the custom field name and the value is a
        string identifying datatype for the custom field.
+    compression_level : int
+        Int specifying compression level, when applicable.


Instead of when applicable, maybe be more specific and say when encoding is set to compression for BZIP or GZIP

addisonElliott · 2018-08-25T03:35:57Z

nrrd/reader.py

@@ -15,6 +16,10 @@

 _NRRD_REQUIRED_FIELDS = ['dimension', 'type', 'encoding', 'sizes']

+# Duplicated fields are prohibited by the spec, but do occur in the wild.
+# Set True to allow duplicate fields, with a warning.
+_NRRD_ALLOW_DUPLICATE_FIELD = False


I would make this a parameter to the read & read_header functions. As I said, I'm not a fan of having to edit a private member in a module, one that especially isn't documented.

I understand your apprehension not to document this because it is not supported by the NRRD specification, but I think whatever parameters we add, they should be documented.

What you can do though, in the docstring, you can put a warning saying it is not supported. Something like this (I'm not sure if it can go in the argument list, but you can put it in the body of the docstring for sure).

.. warning:: Allowing duplicate fields with :obj:`allow_duplicate_fields` is not explicitly supported in the NRRD specification. Use this parameter at your own risk.

Ok. It is no longer a private field, but I will not be adding it to read. This really is a special case, just like the existing chunk size parameter.

addisonElliott · 2018-08-25T03:37:14Z

nrrd/reader.py

+            if not _NRRD_ALLOW_DUPLICATE_FIELD:
+                raise NRRDError(dup_message)
+
+            warnings.warn(dup_message)


So with a duplicated field, the old field will be overwritten. Would it be better if we added an integer to the end of the field. Something like 2 so that it doesn't conflict but both of the values are saved?

I think it's fine to overwrite. If the values were duplicated then there would probably need to be a hack in the save path to remove the second key while saving, which would just be confusing.

codecov-io · 2018-08-25T18:23:21Z

Codecov Report

Merging #65 into master will increase coverage by 0.74%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #65      +/-   ##
==========================================
+ Coverage   86.64%   87.39%   +0.74%     
==========================================
  Files           6        6              
  Lines         352      357       +5     
  Branches      113      114       +1     
==========================================
+ Hits          305      312       +7     
+ Misses         23       22       -1     
+ Partials       24       23       -1

Impacted Files	Coverage Δ
nrrd/reader.py	`81.98% <100%> (+1.85%)`	⬆️
nrrd/writer.py	`87.17% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caf8506...77d5d2c. Read the comment docs.

ihnorton · 2018-08-25T18:23:52Z

Right now it overwrites and saves the last field given. Is this the best way to handle it?

Yes

Also, I was thinking maybe it would be worthwhile to write a test for testing the compression on gzip and bzip2. My thoughts are to set the compression level to something besides 9 for both of these and then test the file size is a certain size?

Done. For now I commented out the assert about file size difference for bz2, because for the binary ball data, output is the same size at all levels.

addisonElliott · 2018-08-25T19:06:24Z

LGTM 👍

I guess I wasn't fully thinking of the application for the duplicate field section. I think I agree that it should not be a function parameter in read or read_header.

The tests look good!

ihnorton · 2018-08-25T19:22:28Z

Thanks!

ihnorton added 3 commits August 24, 2018 16:51

Make zlib compression level configurable

cd702cc

Allow suppression of duplicate-field error

9e9e49a

Fix Python 2.7 error

293d59f

addisonElliott reviewed Aug 24, 2018

View reviewed changes

addisonElliott self-assigned this Aug 24, 2018

Cleaner test for error; really fix py27 test failure

90cde92

Move compression level to function keyword arg

7a93bff

ihnorton force-pushed the dupfield_and_zliblevel branch from 3eb3c44 to 7a93bff Compare August 25, 2018 02:47

addisonElliott reviewed Aug 25, 2018

View reviewed changes

Address more review comments

4183a59

ihnorton force-pushed the dupfield_and_zliblevel branch from 6fb7586 to 4183a59 Compare August 25, 2018 04:07

Add test to exercise compression level parameter

77d5d2c

addisonElliott merged commit b744a63 into mhe:master Aug 25, 2018

ihnorton deleted the dupfield_and_zliblevel branch August 25, 2018 19:22

ihnorton mentioned this pull request Sep 13, 2018

Release? #67

Closed

addisonElliott mentioned this pull request Nov 7, 2018

I am not able to read a nrrd file saved from Slicer (volume sequence) #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow override of dup. field error, and zlib compression level #65

Allow override of dup. field error, and zlib compression level #65

ihnorton commented Aug 24, 2018

codecov-io commented Aug 24, 2018 •

edited

Loading

addisonElliott left a comment

addisonElliott Aug 24, 2018

ihnorton Aug 25, 2018

addisonElliott Aug 24, 2018

ihnorton commented Aug 25, 2018

addisonElliott left a comment

addisonElliott Aug 25, 2018

ihnorton Aug 25, 2018

addisonElliott Aug 25, 2018

ihnorton Aug 25, 2018

addisonElliott Aug 25, 2018

ihnorton Aug 25, 2018

addisonElliott Aug 25, 2018

ihnorton Aug 25, 2018

addisonElliott Aug 25, 2018

ihnorton Aug 25, 2018

codecov-io commented Aug 25, 2018

ihnorton commented Aug 25, 2018

addisonElliott commented Aug 25, 2018

ihnorton commented Aug 25, 2018

Allow override of dup. field error, and zlib compression level #65

Allow override of dup. field error, and zlib compression level #65

Conversation

ihnorton commented Aug 24, 2018

codecov-io commented Aug 24, 2018 • edited Loading

Codecov Report

addisonElliott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ihnorton commented Aug 25, 2018

addisonElliott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Aug 25, 2018

Codecov Report

ihnorton commented Aug 25, 2018

addisonElliott commented Aug 25, 2018

ihnorton commented Aug 25, 2018

codecov-io commented Aug 24, 2018 •

edited

Loading