-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byteskip minus1 fixing #74
Byteskip minus1 fixing #74
Conversation
Codecov Report
@@ Coverage Diff @@
## master #74 +/- ##
==========================================
+ Coverage 87.39% 88.12% +0.72%
==========================================
Files 6 6
Lines 357 362 +5
Branches 114 117 +3
==========================================
+ Hits 312 319 +7
+ Misses 22 21 -1
+ Partials 23 22 -1
Continue to review full report at Codecov.
|
cc: @ihnorton |
Let me know what do you think @addisonElliott |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start but needs some changes.
Also, try to keep the code coverage approximately the same which should indicate we are fully testing the code.
I think the reason why this dropped was because of your large if/else.
nrrd/reader.py
Outdated
for _ in range(line_skip): | ||
fh.readline() | ||
|
||
if byte_skip>=0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to stick with the syntax used in the library. To stay consistent, you would want the if statement to be:
if byte_skip >= 0:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
nrrd/reader.py
Outdated
fh.readline() | ||
|
||
if byte_skip>=0: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this code functionally works correctly, I think it can be made much cleaner. You have a large if/else statement with a lot of repeated code. I would stick to the principle of DRY
In other words, a cleaner way is to just change what needs to be changed.
nrrd/reader.py
Outdated
# If a compression encoding is used, then byte skip AFTER decompressing | ||
if header['encoding'] == 'raw': | ||
# Skip the requested number of bytes and then parse the data using NumPy | ||
fh.seek(byte_skip, os.SEEK_CUR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this could be done in one line like this:
fh.seek(byte_skip, os.SEEK_CUR) | |
if byte_skip == -1: | |
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END) | |
else: | |
fh.seek(byte_skip if byte_skip != -1 else total_data_points, os.SEEK_CUR) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
nrrd/tests/test_reading.py
Outdated
|
||
data, header = nrrd.read(RAW_BYTESKIP_NHDR_FILE_PATH) | ||
|
||
print('Given') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for print's in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to remove that.
nrrd/tests/test_reading.py
Outdated
print(header) | ||
|
||
np.testing.assert_equal(self.expected_header, header) | ||
np.testing.assert_equal(data, self.expected_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure how this tests the behavior of the byte_skip = -1. Since it is just seeking backwards to the beginning of the file.
I would change the data_file and add some garbage to the front of it. That way seeking from the front won't work but seeking backwards will! Then have one test using byte_skip = -1, and one test that should fail using no byte_skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this in mind, don't be afraid to add a new NRRD file to test with. Just use some fake/synthesized data that works for your purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is seeking from the front of the file. Rather, it is seeking only from the byte where image data begins. The correct seek point is set by fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END)
nrrd/tests/test_reading.py
Outdated
@@ -187,9 +194,6 @@ def test_read_simple_4d_nrrd(self): | |||
np.testing.assert_equal(data.dtype, np.float64) | |||
np.testing.assert_equal(data, np.array([[[[0.76903426]]]])) | |||
|
|||
# Test that the data read is able to be edited | |||
self.assertTrue(data.flags['WRITEABLE']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this?
This assert has a purpose to ensure that the data is writeable. If the data is not writeable, then something is wrong with the code. I had an issue and that's why I added this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I didn't ⭕️ ⭕️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this ASAP.
How does it look now? I understand for all compressed data, |
I have taken care of the previous failures. Sorry, it took me some time to realize that the version installed in my computer was different from the cloned one. This mismatch attributed to all the mysterious missing lines so far. We should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer!
I would definitely think about adding some additional tests and thinking about how to test this.
Here are some tests I would add:
- Test that it throws an exception when byte_skip is negative (but not -1)
- Change the byte_skip dataset to add fake data to the front of it so that you have to do a byte_skip=-1 to get it to work. Then add a test with/without byte_skip being -1 to ensure it works.
- Repeat the above test for raw, ASCII & GZIP
I haven't checked out codecov, but that's a good thing to reference to see what lines of code have/haven't been tested yet.
nrrd/reader.py
Outdated
|
||
# Skip the requested number of bytes or seek backward, and then parse the data using NumPy | ||
if byte_skip < -1: | ||
raise Exception('Invalid byteskip, allowed values are greater than or equal to -1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to NRRDError rather than Exception
raise Exception('Invalid byteskip, allowed values are greater than or equal to -1') | |
raise NRRDError('Invalid byteskip, allowed values are greater than or equal to -1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, but then I thought shouldn't it be just Exception? But yeah, we can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to hearing your reasoning, but my thought process is to use NRRDError for any errors in pynrrd. I'm treating NRRDError as the exceptions for the module.
nrrd/reader.py
Outdated
with open(filename, 'rb') as fh: | ||
header = read_header(fh, custom_field_map) | ||
data = read_data(header, fh, filename) | ||
|
||
return data, header | ||
return data, header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would remove extra space at the end of this line.
Any whitespace at the end of lines are not necessary
nrrd/reader.py
Outdated
fh.seek(byte_skip, os.SEEK_CUR) | ||
elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']: | ||
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END) | ||
else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nitpicky thing, but just to stick with the syntax of the repository (and my personal preference), I would move this line comment to the next line to be on it's own line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, doing it.
nrrd/reader.py
Outdated
elif byte_skip == -1 and header['encoding'] not in ['gzip', 'gz', 'bzip2', 'bz2']: | ||
fh.seek(-dtype.itemsize * total_data_points, os.SEEK_END) | ||
else: # The only case left should be: byte_skip == -1 and header['encoding'] == 'gzip' | ||
byte_skip = -dtype.itemsize*total_data_points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte_skip = -dtype.itemsize*total_data_points | |
byte_skip = -dtype.itemsize * total_data_points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
nrrd/reader.py
Outdated
@@ -380,7 +382,7 @@ def read_data(header, fh=None, filename=None): | |||
|
|||
# Byte skip is applied AFTER the decompression. Skip first x bytes of the decompressed data and parse it using | |||
# NumPy | |||
data = np.fromstring(decompressed_data[byte_skip:], dtype) | |||
data = np.frombuffer(decompressed_data[byte_skip:], dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed to fromstring to frombuffer?
Remove whitespace at end of line.
Also, shouldn't this have an if/else statement for byte_skip == -1 to do something like the following if byte_skip == -1...
decompressed_data[-dtype.itemsize * total_data_points:]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already taken care of this condition in the if
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was a mistake that resulted in failures:
when np.frombuffer
is used:
data.flag['WRITEABLE'] = False
But when np.fromstring
is used:
data.flag['WRITEABLE'] = True
The latter is desired in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally missed that even though I commented on that line...yes, you're right. I like the way you're doing it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should make the data writeable so we want to use np.fromstring. I had an issue where I loaded gzip NRRD data and tried to edit it but it didn't work because it wasn't writeable.
Take a look there: Here is the relevant information I discovered:
|
Done, sorry about that. |
Code looks good, just add a few quick tests to test it and we should be good to go! 👍 |
Your test data addition suggestion should be straightforward. I'll try to complete them soon. |
This is done using numpy.testing.assert_raises. I am working on the rest. |
Wasn't aware of numpy.testing.assert_raises so that's great, but the NRRD test util has it's own function, see here: I would use that just to be consistent. You edited your comment but you make a good point about adding random data. Not sure...but it would be nice to verify that byte_skip=-1 is working correctly in a sense that it won't work without byte_skip = -1. If it can't be done then that's okay, just stick with the normal tests. |
Yeah, after putting my comment, I realized there might be a way of doing the way you suggested. So, let's see if I can come up with that design. Otherwise, you have my all ears. |
Not entirely sure but based on the comment, I think it is seeking to right where the header stops. It seems here so the next parts are where the data is located Sounds like it was a special case added to fix something |
Hey @addisonElliott , I have been able to form a test case that demonstrates the use of I mean how do I apply it instead of
that I have used? |
In my opinion, since you have used |
Yeah, I understand. The benefit of the utility assertRaises function is that you can ensure that the error message is exactly what you expect. Not sure if np.testing has that capability. Either way works, but for consistency, just stick with the utility function. It works like this: take that line of code, change the message to what it should be, and then inside the context of the with statement, place your core that should cause an exception |
Should be something like:
|
# Skip the requested number of bytes and then parse the data using NumPy | ||
else: | ||
raise NRRDError('Invalid lineskip, allowed values are greater than or equal to 0') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a sanity check for line skip as noted here
dimension: 3 | ||
space: left-posterior-superior | ||
sizes: 30 30 30 | ||
byte skip: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nifti file has the following header at the beginning before accessing image data:
sizeof_hdr : 348
data_type : b''
db_name : b''
extents : 0
session_error : 0
regular : b'r'
dim_info : 0
dim : [ 3 30 30 30 1 1 1 1]
intent_p1 : 0.0
intent_p2 : 0.0
intent_p3 : 0.0
intent_code : none
datatype : int16
bitpix : 16
slice_start : 0
pixdim : [1. 1. 1. 1. 1. 0. 0. 0.]
vox_offset : 0.0
scl_slope : nan
scl_inter : nan
slice_end : 0
slice_code : unknown
xyzt_units : 10
cal_max : 0.0
cal_min : 0.0
slice_duration : 0.0
toffset : 0.0
glmax : 0
glmin : 0
descrip : b'BYTESKIP -1 TEST'
aux_file : b''
qform_code : scanner
sform_code : scanner
quatern_b : 0.0
quatern_c : 0.0
quatern_d : 1.0
qoffset_x : 0.0
qoffset_y : 0.0
qoffset_z : 0.0
srow_x : [-1. 0. 0. 0.]
srow_y : [ 0. -1. 0. 0.]
srow_z : [0. 0. 1. 0.]
intent_name : b''
magic : b'n+1'
When we add byteskip: -1
, the nifti data file as accessed from the end, only the required number of bytes. This way, we are introducing a great way of nifti
to nrrd
conversion. We should not need to use DWIConvert
anymore. Just make a detached header, and point to existing nifti file as the data file
and we are done. At the PNL, BWH we are developing our current nifti pipeline dealing with nrrd files this way.
dimension: 3 | ||
space: left-posterior-superior | ||
sizes: 30 30 30 | ||
# byte skip: -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The byte skip
has been intentionally commented out here to point out the error when nifti file is attempted to read from the beginning.
I hope we have done what we wanted. Now |
Great job @tashrifbillah! The test cases are great, and I'm especially pleased by the byte_skip=-1 with NIFTI TAR file as the data. Great practical test. I'm going to merge this into master and I'll wait a few weeks before publishing a new version in case any other PRs are submitted. If I forget in a few weeks, bug me somehow and I'll publish a new version. 👍 |
No problem, happy to work on it. @ihnorton may also put his two cents in when he gets back. |
Fixed byteskip -1 reading issue byteskip -1 fails #70
Added test for both
nrrd
andnhdr
files withbyte skip: -1
Added test data with
nrrd
gzip andnhdr
raw files withbyte skip: -1