Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byteskip minus1 fixing #74

Merged
merged 9 commits into from
Nov 12, 2018

Conversation

tashrifbillah
Copy link
Contributor

  1. Fixed byteskip -1 reading issue byteskip -1 fails #70

  2. Added test for both nrrd and nhdr files with byte skip: -1

  3. Added test data with nrrd gzip and nhdr raw files with byte skip: -1

@codecov-io
Copy link

codecov-io commented Nov 10, 2018

Codecov Report

Merging #74 into master will increase coverage by 0.72%.
The diff coverage is 62.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
+ Coverage   87.39%   88.12%   +0.72%     
==========================================
  Files           6        6              
  Lines         357      362       +5     
  Branches      114      117       +3     
==========================================
+ Hits          312      319       +7     
+ Misses         22       21       -1     
+ Partials       23       22       -1
Impacted Files Coverage Δ
nrrd/reader.py 83.73% <62.5%> (+1.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5310894...f82bfcb. Read the comment docs.

@tashrifbillah
Copy link
Contributor Author

cc: @ihnorton

@tashrifbillah
Copy link
Contributor Author

Let me know what do you think @addisonElliott

Copy link
Collaborator

@addisonElliott addisonElliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start but needs some changes.

Also, try to keep the code coverage approximately the same which should indicate we are fully testing the code.

I think the reason why this dropped was because of your large if/else.

nrrd/reader.py Outdated Show resolved Hide resolved
nrrd/reader.py Outdated
for _ in range(line_skip):
fh.readline()

if byte_skip>=0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to stick with the syntax used in the library. To stay consistent, you would want the if statement to be:

if byte_skip >= 0:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

nrrd/reader.py Outdated
fh.readline()

if byte_skip>=0:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this code functionally works correctly, I think it can be made much cleaner. You have a large if/else statement with a lot of repeated code. I would stick to the principle of DRY

In other words, a cleaner way is to just change what needs to be changed.

nrrd/reader.py Outdated
# If a compression encoding is used, then byte skip AFTER decompressing
if header['encoding'] == 'raw':
# Skip the requested number of bytes and then parse the data using NumPy
fh.seek(byte_skip, os.SEEK_CUR)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this could be done in one line like this:

Suggested change
fh.seek(byte_skip, os.SEEK_CUR)
if byte_skip == -1:
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
else:
fh.seek(byte_skip if byte_skip != -1 else total_data_points, os.SEEK_CUR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH)

print('Given')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for print's in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to remove that.

print(header)

np.testing.assert_equal(self.expected_header, header)
np.testing.assert_equal(data, self.expected_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how this tests the behavior of the byte_skip = -1. Since it is just seeking backwards to the beginning of the file.

I would change the data_file and add some garbage to the front of it. That way seeking from the front won't work but seeking backwards will! Then have one test using byte_skip = -1, and one test that should fail using no byte_skip.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this in mind, don't be afraid to add a new NRRD file to test with. Just use some fake/synthesized data that works for your purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is seeking from the front of the file. Rather, it is seeking only from the byte where image data begins. The correct seek point is set by fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)

@@ -187,9 +194,6 @@ def test_read_simple_4d_nrrd(self):
np.testing.assert_equal(data.dtype, np.float64)
np.testing.assert_equal(data, np.array([[[[0.76903426]]]]))

# Test that the data read is able to be edited
self.assertTrue(data.flags['WRITEABLE'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this?

This assert has a purpose to ensure that the data is writeable. If the data is not writeable, then something is wrong with the code. I had an issue and that's why I added this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I didn't ⭕️ ⭕️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this ASAP.

@tashrifbillah
Copy link
Contributor Author

How does it look now? I understand for all compressed data, data.flags['WRITEABLE'] is False for some reason, but don't know why.

@tashrifbillah
Copy link
Contributor Author

I have taken care of the previous failures. Sorry, it took me some time to realize that the version installed in my computer was different from the cloned one. This mismatch attributed to all the mysterious missing lines so far.

We should be good now.

Copy link
Collaborator

@addisonElliott addisonElliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer!

I would definitely think about adding some additional tests and thinking about how to test this.

Here are some tests I would add:

  • Test that it throws an exception when byte_skip is negative (but not -1)
  • Change the byte_skip dataset to add fake data to the front of it so that you have to do a byte_skip=-1 to get it to work. Then add a test with/without byte_skip being -1 to ensure it works.
  • Repeat the above test for raw, ASCII & GZIP

I haven't checked out codecov, but that's a good thing to reference to see what lines of code have/haven't been tested yet.

nrrd/reader.py Outdated Show resolved Hide resolved
nrrd/reader.py Outdated

# Skip the requested number of bytes or seek backward, and then parse the data using NumPy
if byte_skip < -1:
raise Exception('Invalid byteskip, allowed values are greater than or equal to -1')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to NRRDError rather than Exception

Suggested change
raise Exception('Invalid byteskip, allowed values are greater than or equal to -1')
raise NRRDError('Invalid byteskip, allowed values are greater than or equal to -1')

Copy link
Contributor Author

@tashrifbillah tashrifbillah Nov 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, but then I thought shouldn't it be just Exception? But yeah, we can do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to hearing your reasoning, but my thought process is to use NRRDError for any errors in pynrrd. I'm treating NRRDError as the exceptions for the module.

nrrd/reader.py Outdated
with open(filename, 'rb') as fh:
header = read_header(fh, custom_field_map)
data = read_data(header, fh, filename)

return data, header
return data, header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would remove extra space at the end of this line.

Any whitespace at the end of lines are not necessary

nrrd/reader.py Outdated
fh.seek(byte_skip, os.SEEK_CUR)
elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']:
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpicky thing, but just to stick with the syntax of the repository (and my personal preference), I would move this line comment to the next line to be on it's own line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, doing it.

nrrd/reader.py Outdated
elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']:
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip'
byte_skip = -dtype.itemsize*total_data_points
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte_skip = -dtype.itemsize*total_data_points
byte_skip = -dtype.itemsize * total_data_points

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

nrrd/reader.py Outdated
@@ -380,7 +382,7 @@ def read_data(header, fh=None, filename=None):

# Byte skip is applied AFTER the decompression. Skip first x bytes of the decompressed data and parse it using
# NumPy
data = np.fromstring(decompressed_data[byte_skip:], dtype)
data = np.frombuffer(decompressed_data[byte_skip:], dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed to fromstring to frombuffer?

Remove whitespace at end of line.

Also, shouldn't this have an if/else statement for byte_skip == -1 to do something like the following if byte_skip == -1...

decompressed_data[-dtype.itemsize * total_data_points:]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already taken care of this condition in the if above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change was a mistake that resulted in failures:
when np.frombuffer is used:
data.flag['WRITEABLE'] = False

But when np.fromstring is used:
data.flag['WRITEABLE'] = True

The latter is desired in tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally missed that even though I commented on that line...yes, you're right. I like the way you're doing it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should make the data writeable so we want to use np.fromstring. I had an issue where I loaded gzip NRRD data and tried to edit it but it didn't work because it wasn't writeable.

nrrd/tests/test_reading.py Show resolved Hide resolved
@addisonElliott
Copy link
Collaborator

How does it look now? I understand for all compressed data, data.flags['WRITEABLE'] is False for some reason, but don't know why.

Take a look there:
#68

Here is the relevant information I discovered:

Use fromstring function when dealing with compressed data because it will copy the data from the string into a Numpy array and allow the data to be editable. If you use frombuffer, then it will simply reference the memory. For a binary string, this ends up with the numpy array not being editable.

@tashrifbillah
Copy link
Contributor Author

Done, sorry about that.

@addisonElliott
Copy link
Collaborator

Code looks good, just add a few quick tests to test it and we should be good to go! 👍

@tashrifbillah
Copy link
Contributor Author

Test that it throws an exception when byte_skip is negative (but not -1)
Change the byte_skip dataset to add fake data to the front of it so that you have to do a byte_skip=-1 to get it to work. Then add a test with/without byte_skip being -1 to ensure it works.
Repeat the above test for raw, ASCII & GZIP

Your test data addition suggestion should be straightforward. I'll try to complete them soon.

@tashrifbillah
Copy link
Contributor Author

tashrifbillah commented Nov 11, 2018

Test that it throws an exception when byte_skip is negative (but not -1)

This is done using numpy.testing.assert_raises.

I am working on the rest.

@addisonElliott
Copy link
Collaborator

addisonElliott commented Nov 11, 2018

Wasn't aware of numpy.testing.assert_raises so that's great, but the NRRD test util has it's own function, see here:
https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_reading.py#L122

I would use that just to be consistent.

You edited your comment but you make a good point about adding random data. Not sure...but it would be nice to verify that byte_skip=-1 is working correctly in a sense that it won't work without byte_skip = -1.

If it can't be done then that's okay, just stick with the normal tests.

@tashrifbillah
Copy link
Contributor Author

Yeah, after putting my comment, I realized there might be a way of doing the way you suggested. So, let's see if I can come up with that design. Otherwise, you have my all ears.

@tashrifbillah
Copy link
Contributor Author

tashrifbillah commented Nov 11, 2018

I have the following questions:

  1. Is this reassigning the file pointer to the beginning of the file after reading header?

  2. If answer to 1 is yes, is this reading only numpy data from the entire file content?

@addisonElliott
Copy link
Collaborator

addisonElliott commented Nov 11, 2018

Not entirely sure but based on the comment, I think it is seeking to right where the header stops. It seems here so the next parts are where the data is located

Sounds like it was a special case added to fix something

@tashrifbillah
Copy link
Contributor Author

Hey @addisonElliott , I have been able to form a test case that demonstrates the use of byte_skip: -1.
I am at the final stage of my editing. However, can you help a little but about https://github.com/mhe/pynrrd/blob/master/nrrd/tests/test_reading.py#L122 ?

I mean how do I apply it instead of

    np.testing.assert_raises(nrrd.errors.NRRDError, nrrd.read, GZ_NIFTI_NHDR_FILE_PATH)

that I have used?

@tashrifbillah
Copy link
Contributor Author

In my opinion, since you have used np.testing all through the test_reading.py, I think you are better off using np.testing.assert_raises.

@addisonElliott
Copy link
Collaborator

Yeah, I understand. The benefit of the utility assertRaises function is that you can ensure that the error message is exactly what you expect. Not sure if np.testing has that capability.

Either way works, but for consistency, just stick with the utility function.

It works like this: take that line of code, change the message to what it should be, and then inside the context of the with statement, place your core that should cause an exception

@addisonElliott
Copy link
Collaborator

Should be something like:

with self.assertRaises(nrrd.NRRDError, 'Message here of exception'):
    nrrd.read(GZ_NIFTI_NHDR_FILE_PATH)

# Skip the requested number of bytes and then parse the data using NumPy
else:
raise NRRDError('Invalid lineskip, allowed values are greater than or equal to 0')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a sanity check for line skip as noted here

dimension: 3
space: left-posterior-superior
sizes: 30 30 30
byte skip: -1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nifti file has the following header at the beginning before accessing image data:

sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b'r'
dim_info        : 0
dim             : [ 3 30 30 30  1  1  1  1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : int16
bitpix          : 16
slice_start     : 0
pixdim          : [1. 1. 1. 1. 1. 0. 0. 0.]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 0
slice_code      : unknown
xyzt_units      : 10
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b'BYTESKIP -1 TEST'
aux_file        : b''
qform_code      : scanner
sform_code      : scanner
quatern_b       : 0.0
quatern_c       : 0.0
quatern_d       : 1.0
qoffset_x       : 0.0
qoffset_y       : 0.0
qoffset_z       : 0.0
srow_x          : [-1.  0.  0.  0.]
srow_y          : [ 0. -1.  0.  0.]
srow_z          : [0. 0. 1. 0.]
intent_name     : b''
magic           : b'n+1'

When we add byteskip: -1, the nifti data file as accessed from the end, only the required number of bytes. This way, we are introducing a great way of nifti to nrrd conversion. We should not need to use DWIConvert anymore. Just make a detached header, and point to existing nifti file as the data file and we are done. At the PNL, BWH we are developing our current nifti pipeline dealing with nrrd files this way.

dimension: 3
space: left-posterior-superior
sizes: 30 30 30
# byte skip: -1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte skip has been intentionally commented out here to point out the error when nifti file is attempted to read from the beginning.

@tashrifbillah
Copy link
Contributor Author

I hope we have done what we wanted. Now Pynrrd supports byte skip: -1 even for compressed data.

@addisonElliott
Copy link
Collaborator

Great job @tashrifbillah!

The test cases are great, and I'm especially pleased by the byte_skip=-1 with NIFTI TAR file as the data. Great practical test.

I'm going to merge this into master and I'll wait a few weeks before publishing a new version in case any other PRs are submitted. If I forget in a few weeks, bug me somehow and I'll publish a new version.

👍

@addisonElliott addisonElliott merged commit 3b084ed into mhe:master Nov 12, 2018
@tashrifbillah
Copy link
Contributor Author

No problem, happy to work on it. @ihnorton may also put his two cents in when he gets back.

neurolabusc added a commit to rordenlab/MRIcroGL that referenced this pull request Jan 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants