Byteskip minus1 fixing #74

tashrifbillah · 2018-11-10T17:48:10Z

Fixed byteskip -1 reading issue byteskip -1 fails #70
Added test for both nrrd and nhdr files with byte skip: -1
Added test data with nrrd gzip and nhdr raw files with byte skip: -1

codecov-io · 2018-11-10T17:49:23Z

Codecov Report

Merging #74 into master will increase coverage by 0.72%.
The diff coverage is 62.5%.

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
+ Coverage   87.39%   88.12%   +0.72%     
==========================================
  Files           6        6              
  Lines         357      362       +5     
  Branches      114      117       +3     
==========================================
+ Hits          312      319       +7     
+ Misses         22       21       -1     
+ Partials       23       22       -1

Impacted Files	Coverage Δ
nrrd/reader.py	`83.73% <62.5%> (+1.74%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5310894...f82bfcb. Read the comment docs.

tashrifbillah · 2018-11-10T17:51:51Z

cc: @ihnorton

tashrifbillah · 2018-11-10T17:56:53Z

Let me know what do you think @addisonElliott

addisonElliott

Good start but needs some changes.

Also, try to keep the code coverage approximately the same which should indicate we are fully testing the code.

I think the reason why this dropped was because of your large if/else.

nrrd/reader.py

addisonElliott · 2018-11-10T18:27:03Z

nrrd/reader.py

+    for _ in range(line_skip):
+        fh.readline()
+
+    if byte_skip>=0:


Try to stick with the syntax used in the library. To stay consistent, you would want the if statement to be:

if byte_skip >= 0:

addisonElliott · 2018-11-10T18:30:50Z

nrrd/reader.py

+        fh.readline()
+
+    if byte_skip>=0:
+


Although this code functionally works correctly, I think it can be made much cleaner. You have a large if/else statement with a lot of repeated code. I would stick to the principle of DRY

In other words, a cleaner way is to just change what needs to be changed.

addisonElliott · 2018-11-10T18:35:03Z

nrrd/reader.py

+        # If a compression encoding is used, then byte skip AFTER decompressing
+        if header['encoding'] == 'raw':
+            # Skip the requested number of bytes and then parse the data using NumPy
+            fh.seek(byte_skip, os.SEEK_CUR)


For example, this could be done in one line like this:

Suggested change

fh.seek(byte_skip, os.SEEK_CUR)

if byte_skip == -1:

fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)

else:

fh.seek(byte_skip if byte_skip != -1 else total_data_points, os.SEEK_CUR)

addisonElliott · 2018-11-10T18:36:46Z

nrrd/tests/test_reading.py

+
+        data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH)
+
+        print('Given')


No need for print's in tests.

Sorry, I forgot to remove that.

addisonElliott · 2018-11-10T18:39:26Z

nrrd/tests/test_reading.py

+        print(header)
+
+        np.testing.assert_equal(self.expected_header, header)
+        np.testing.assert_equal(data, self.expected_data)       


I'm not entirely sure how this tests the behavior of the byte_skip = -1. Since it is just seeking backwards to the beginning of the file.

I would change the data_file and add some garbage to the front of it. That way seeking from the front won't work but seeking backwards will! Then have one test using byte_skip = -1, and one test that should fail using no byte_skip.

With this in mind, don't be afraid to add a new NRRD file to test with. Just use some fake/synthesized data that works for your purpose.

I don't think it is seeking from the front of the file. Rather, it is seeking only from the byte where image data begins. The correct seek point is set by fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)

addisonElliott · 2018-11-10T18:40:45Z

nrrd/tests/test_reading.py

@@ -187,9 +194,6 @@ def test_read_simple_4d_nrrd(self):
        np.testing.assert_equal(data.dtype, np.float64)
        np.testing.assert_equal(data, np.array([[[[0.76903426]]]]))

-        # Test that the data read is able to be edited
-        self.assertTrue(data.flags['WRITEABLE'])


Why did you remove this?

This assert has a purpose to ensure that the data is writeable. If the data is not writeable, then something is wrong with the code. I had an issue and that's why I added this.

Honestly I didn't ⭕️ ⭕️

Will fix this ASAP.

tashrifbillah · 2018-11-10T23:21:10Z

How does it look now? I understand for all compressed data, data.flags['WRITEABLE'] is False for some reason, but don't know why.

nrrd/tests/test_reading.py

tashrifbillah · 2018-11-10T23:36:31Z

I have taken care of the previous failures. Sorry, it took me some time to realize that the version installed in my computer was different from the cloned one. This mismatch attributed to all the mysterious missing lines so far.

We should be good now.

addisonElliott

Getting closer!

I would definitely think about adding some additional tests and thinking about how to test this.

Here are some tests I would add:

Test that it throws an exception when byte_skip is negative (but not -1)
Change the byte_skip dataset to add fake data to the front of it so that you have to do a byte_skip=-1 to get it to work. Then add a test with/without byte_skip being -1 to ensure it works.
Repeat the above test for raw, ASCII & GZIP

I haven't checked out codecov, but that's a good thing to reference to see what lines of code have/haven't been tested yet.

nrrd/reader.py

addisonElliott · 2018-11-10T23:25:27Z

nrrd/reader.py

+
+    # Skip the requested number of bytes or seek backward, and then parse the data using NumPy
+    if byte_skip < -1:
+        raise Exception('Invalid byteskip, allowed values are greater than or equal to -1')


Switch to NRRDError rather than Exception

Suggested change

raise Exception('Invalid byteskip, allowed values are greater than or equal to -1')

raise NRRDError('Invalid byteskip, allowed values are greater than or equal to -1')

I was thinking about that, but then I thought shouldn't it be just Exception? But yeah, we can do that.

I'm open to hearing your reasoning, but my thought process is to use NRRDError for any errors in pynrrd. I'm treating NRRDError as the exceptions for the module.

addisonElliott · 2018-11-10T23:27:39Z

nrrd/reader.py

    with open(filename, 'rb') as fh:
        header = read_header(fh, custom_field_map)
        data = read_data(header, fh, filename)

-        return data, header
+        return data, header 


Would remove extra space at the end of this line.

Any whitespace at the end of lines are not necessary

addisonElliott · 2018-11-10T23:30:03Z

nrrd/reader.py

        fh.seek(byte_skip, os.SEEK_CUR)
+    elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']:
+        fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
+    else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip'


This is a nitpicky thing, but just to stick with the syntax of the repository (and my personal preference), I would move this line comment to the next line to be on it's own line.

Okay, doing it.

addisonElliott · 2018-11-10T23:30:11Z

nrrd/reader.py

+    elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']:
+        fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
+    else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip'
+        byte_skip = -dtype.itemsize*total_data_points


Suggested change

byte_skip = -dtype.itemsize*total_data_points

byte_skip = -dtype.itemsize * total_data_points

addisonElliott · 2018-11-10T23:32:19Z

nrrd/reader.py

@@ -380,7 +382,7 @@ def read_data(header, fh=None, filename=None):

        # Byte skip is applied AFTER the decompression. Skip first x bytes of the decompressed data and parse it using
        # NumPy
-        data = np.fromstring(decompressed_data[byte_skip:], dtype)
+        data = np.frombuffer(decompressed_data[byte_skip:], dtype)           


Why was this changed to fromstring to frombuffer?

Remove whitespace at end of line.

Also, shouldn't this have an if/else statement for byte_skip == -1 to do something like the following if byte_skip == -1...

decompressed_data[-dtype.itemsize * total_data_points:]

I have already taken care of this condition in the if above.

The change was a mistake that resulted in failures:
when np.frombuffer is used:
data.flag['WRITEABLE'] = False

But when np.fromstring is used:
data.flag['WRITEABLE'] = True

The latter is desired in tests.

Totally missed that even though I commented on that line...yes, you're right. I like the way you're doing it...

Yeah, I think we should make the data writeable so we want to use np.fromstring. I had an issue where I loaded gzip NRRD data and tried to edit it but it didn't work because it wasn't writeable.

nrrd/tests/test_reading.py

addisonElliott · 2018-11-10T23:48:34Z

How does it look now? I understand for all compressed data, data.flags['WRITEABLE'] is False for some reason, but don't know why.

Take a look there:
#68

Here is the relevant information I discovered:

Use fromstring function when dealing with compressed data because it will copy the data from the string into a Numpy array and allow the data to be editable. If you use frombuffer, then it will simply reference the memory. For a binary string, this ends up with the numpy array not being editable.

tashrifbillah · 2018-11-10T23:52:06Z

Done, sorry about that.

addisonElliott · 2018-11-11T00:05:19Z

Code looks good, just add a few quick tests to test it and we should be good to go! 👍

tashrifbillah · 2018-11-11T00:10:43Z

Test that it throws an exception when byte_skip is negative (but not -1)
Change the byte_skip dataset to add fake data to the front of it so that you have to do a byte_skip=-1 to get it to work. Then add a test with/without byte_skip being -1 to ensure it works.
Repeat the above test for raw, ASCII & GZIP

Your test data addition suggestion should be straightforward. I'll try to complete them soon.

tashrifbillah · 2018-11-11T22:12:21Z

Test that it throws an exception when byte_skip is negative (but not -1)

This is done using numpy.testing.assert_raises.

I am working on the rest.

addisonElliott · 2018-11-11T22:58:49Z

Wasn't aware of numpy.testing.assert_raises so that's great, but the NRRD test util has it's own function, see here:
https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_reading.py#L122

I would use that just to be consistent.

You edited your comment but you make a good point about adding random data. Not sure...but it would be nice to verify that byte_skip=-1 is working correctly in a sense that it won't work without byte_skip = -1.

If it can't be done then that's okay, just stick with the normal tests.

tashrifbillah · 2018-11-11T23:00:37Z

Yeah, after putting my comment, I realized there might be a way of doing the way you suggested. So, let's see if I can come up with that design. Otherwise, you have my all ears.

tashrifbillah · 2018-11-11T23:22:59Z

I have the following questions:

Is this reassigning the file pointer to the beginning of the file after reading header?
If answer to 1 is yes, is this reading only numpy data from the entire file content?

addisonElliott · 2018-11-11T23:25:53Z

Not entirely sure but based on the comment, I think it is seeking to right where the header stops. It seems here so the next parts are where the data is located

Sounds like it was a special case added to fix something

tashrifbillah · 2018-11-12T01:32:17Z

Hey @addisonElliott , I have been able to form a test case that demonstrates the use of byte_skip: -1.
I am at the final stage of my editing. However, can you help a little but about https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_reading.py#L122 ?

I mean how do I apply it instead of

    np.testing.assert_raises(nrrd.errors.NRRDError, nrrd.read, GZ_NIFTI_NHDR_FILE_PATH)

that I have used?

tashrifbillah · 2018-11-12T01:34:19Z

In my opinion, since you have used np.testing all through the test_reading.py, I think you are better off using np.testing.assert_raises.

addisonElliott · 2018-11-12T01:50:09Z

Yeah, I understand. The benefit of the utility assertRaises function is that you can ensure that the error message is exactly what you expect. Not sure if np.testing has that capability.

Either way works, but for consistency, just stick with the utility function.

It works like this: take that line of code, change the message to what it should be, and then inside the context of the with statement, place your core that should cause an exception

addisonElliott · 2018-11-12T01:53:26Z

Should be something like:

with self.assertRaises(nrrd.NRRDError, 'Message here of exception'):
    nrrd.read(GZ_NIFTI_NHDR_FILE_PATH)

tashrifbillah · 2018-11-12T03:49:21Z

nrrd/reader.py

-        # Skip the requested number of bytes and then parse the data using NumPy
+    else:
+        raise NRRDError('Invalid lineskip, allowed values are greater than or equal to 0')
+


Also added a sanity check for line skip as noted here

tashrifbillah · 2018-11-12T03:54:49Z

nrrd/tests/data/BallBinary30x30x30_byteskip_minus_one_nifti.nhdr

+dimension: 3
+space: left-posterior-superior
+sizes: 30 30 30
+byte skip: -1


The nifti file has the following header at the beginning before accessing image data:

sizeof_hdr : 348 data_type : b'' db_name : b'' extents : 0 session_error : 0 regular : b'r' dim_info : 0 dim : [ 3 30 30 30 1 1 1 1] intent_p1 : 0.0 intent_p2 : 0.0 intent_p3 : 0.0 intent_code : none datatype : int16 bitpix : 16 slice_start : 0 pixdim : [1. 1. 1. 1. 1. 0. 0. 0.] vox_offset : 0.0 scl_slope : nan scl_inter : nan slice_end : 0 slice_code : unknown xyzt_units : 10 cal_max : 0.0 cal_min : 0.0 slice_duration : 0.0 toffset : 0.0 glmax : 0 glmin : 0 descrip : b'BYTESKIP -1 TEST' aux_file : b'' qform_code : scanner sform_code : scanner quatern_b : 0.0 quatern_c : 0.0 quatern_d : 1.0 qoffset_x : 0.0 qoffset_y : 0.0 qoffset_z : 0.0 srow_x : [-1. 0. 0. 0.] srow_y : [ 0. -1. 0. 0.] srow_z : [0. 0. 1. 0.] intent_name : b'' magic : b'n+1'

When we add byteskip: -1, the nifti data file as accessed from the end, only the required number of bytes. This way, we are introducing a great way of nifti to nrrd conversion. We should not need to use DWIConvert anymore. Just make a detached header, and point to existing nifti file as the data file and we are done. At the PNL, BWH we are developing our current nifti pipeline dealing with nrrd files this way.

tashrifbillah · 2018-11-12T03:56:04Z

nrrd/tests/data/BallBinary30x30x30_nifti.nhdr

+dimension: 3
+space: left-posterior-superior
+sizes: 30 30 30
+# byte skip: -1


The byte skip has been intentionally commented out here to point out the error when nifti file is attempted to read from the beginning.

tashrifbillah · 2018-11-12T03:58:25Z

I hope we have done what we wanted. Now Pynrrd supports byte skip: -1 even for compressed data.

addisonElliott · 2018-11-12T14:07:55Z

Great job @tashrifbillah!

The test cases are great, and I'm especially pleased by the byte_skip=-1 with NIFTI TAR file as the data. Great practical test.

I'm going to merge this into master and I'll wait a few weeks before publishing a new version in case any other PRs are submitted. If I forget in a few weeks, bug me somehow and I'll publish a new version.

👍

tashrifbillah · 2018-11-12T14:14:08Z

No problem, happy to work on it. @ihnorton may also put his two cents in when he gets back.

tashrifbillah added 2 commits November 10, 2018 12:36

byteskip -1 fixed

7339cff

adding test data for byteskip -1

a54994b

addisonElliott requested changes Nov 10, 2018

View reviewed changes

code path optimized, missing lines added

eef5b62

np.fromstring()

f0fa781

tashrifbillah commented Nov 10, 2018

View reviewed changes

nrrd/tests/test_reading.py Show resolved Hide resolved

minor correction

2bbe06a

addisonElliott requested changes Nov 10, 2018

View reviewed changes

tashrifbillah added 2 commits November 10, 2018 18:56

making statements consistent with repo

7cc511b

unnecessary w/space removal

dae4782

tashrifbillah added 2 commits November 11, 2018 22:39

byte skip -1 tests added

4a144ef

byte skip -1 tests added

f82bfcb

tashrifbillah commented Nov 12, 2018

View reviewed changes

addisonElliott merged commit 3b084ed into mhe:master Nov 12, 2018

neurolabusc added a commit to rordenlab/MRIcroGL that referenced this pull request Jan 20, 2019

Support corrupt NRRD with compressed byteskip -1 (mhe/pynrrd#74)

83acb46


		data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH)

		print('Given')

	raise Exception('Invalid byteskip, allowed values are greater than or equal to -1')
	raise NRRDError('Invalid byteskip, allowed values are greater than or equal to -1')

	byte_skip = -dtype.itemsize*total_data_points
	byte_skip = -dtype.itemsize * total_data_points

Byteskip minus1 fixing #74

Byteskip minus1 fixing #74

Conversation

tashrifbillah commented Nov 10, 2018

codecov-io commented Nov 10, 2018 • edited Loading

Codecov Report

tashrifbillah commented Nov 10, 2018

tashrifbillah commented Nov 10, 2018

addisonElliott left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tashrifbillah commented Nov 10, 2018

tashrifbillah commented Nov 10, 2018

addisonElliott left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tashrifbillah Nov 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addisonElliott commented Nov 10, 2018

tashrifbillah commented Nov 10, 2018

addisonElliott commented Nov 11, 2018

tashrifbillah commented Nov 11, 2018

tashrifbillah commented Nov 11, 2018 • edited Loading

addisonElliott commented Nov 11, 2018 • edited Loading

tashrifbillah commented Nov 11, 2018

tashrifbillah commented Nov 11, 2018 • edited Loading

addisonElliott commented Nov 11, 2018 • edited Loading

tashrifbillah commented Nov 12, 2018

tashrifbillah commented Nov 12, 2018

addisonElliott commented Nov 12, 2018

addisonElliott commented Nov 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tashrifbillah commented Nov 12, 2018

addisonElliott commented Nov 12, 2018

tashrifbillah commented Nov 12, 2018

codecov-io commented Nov 10, 2018 •

edited

Loading

addisonElliott left a comment •

edited

Loading

addisonElliott left a comment •

edited

Loading

tashrifbillah Nov 10, 2018 •

edited

Loading

tashrifbillah commented Nov 11, 2018 •

edited

Loading

addisonElliott commented Nov 11, 2018 •

edited

Loading

tashrifbillah commented Nov 11, 2018 •

edited

Loading

addisonElliott commented Nov 11, 2018 •

edited

Loading