-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter for changing index order #87
Conversation
nrrd/writer.py
Outdated
|
||
# Write the raw data directly to the file | ||
fh.write(raw_data) | ||
elif header['encoding'].lower() in ['ascii', 'text', 'txt']: | ||
# savetxt only works for 1D and 2D arrays, so reshape any > 2 dim arrays into one long 1D array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any reasons for using savetxt
for 2D? I mostly removed it to simplify my solution but I can put it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, and it has been awhile, it's so it will make the ASCII data look like:
2D:
1 2 3
4 5 6
7 8 9
Would be nice to have to be consistent with NRRD spec, but not crucial. Maybe there's a cleaner way to set this up so that it works well with the index order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. This should be an easy fix so I'll change it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense so I will revert this. Making it work with the ordering should be an easy fix.
I just have some general comments for now. I will need to sit down and vigorously test this later. However, I think the general implementation looks good and I think the general concepts seem right. Comments
Also, let me make sure I'm understanding this correctly. The Thus, for Sounds like |
Yeah, I don't really understand what went wrong there since it only seems to fail on Python 2.
Practically this should produce the same result as the current method. The
I did change
Yes, definitely. One thing that hit me is that the
Hm, I would hope so.
Yes, this is a bit weird and not really intuitive for me neither. The problem is that numpy is always in C index order so there's no real way to distinguish the index orders. However, this follows the way numpy seems to deal with order using the
Exactly, and the
In my eyes, definitely. Especially since numpy uses C order. I understand the importance of backward compatibility though. |
Try to run the tests locally on Python 2 if you're able. I can rerun the Travis CI if it was just a freak occurrence that it failed.
Yeah, you're probably right. Just to be safe, I might play around with this and ensure the two options give the exact same strides.
Hmm, this is where things seem a bit odd. The
I'll need to do a bit more research on this, from what I've found there are some cases where the The dangerous part here is that what if the user has a C-ordered array that they are treating as a Fortran array. A lot of users aren't even paying attention to what memory-order their data is in. For example, let's say I'm used to MATLAB so I want to created a 5 x 10 x 15 array (x, y, z). Here's how I would do it:
If we want to truly separate the index order concept from the memory order, then I'm not sure a Numpy array will contain any information on the index order because that is arbitrary. So, I don't think an automatic order parameter is possible. Rather, my vote would be to make the This is what users would have to do for C-order which is kinda a pain, but not that bad really.
|
Codecov Report
@@ Coverage Diff @@
## master #87 +/- ##
==========================================
+ Coverage 99.45% 99.46% +<.01%
==========================================
Files 6 6
Lines 368 374 +6
Branches 116 119 +3
==========================================
+ Hits 366 372 +6
Misses 1 1
Partials 1 1
Continue to review full report at Codecov.
|
Odd, I thought I solved the issue this time since the tests seem to pass when I run them on Python 2. I have to look into this a bit further. First, it seemed like same warnings won't trigger twice on Python 2 unless you tell them to. Edit: nevermind, the test actually passed. It was just Codecov that complained.
As far as I know, the
Yeah, as mentioned in my first post, I realized that you could quite easily break things by doing something as simple as So, I agree, forcing |
Well, I think we're in agreement for what to do now. I'll leave this PR up to you, but let me know if you have questions or need help. Also, may be useful to put By the way, I probably need to add a threshold to the codecov diff. Having a greater coverage does not mean that your code is tested the best. @ihnorton @tashrifbillah @AAAArcus Any comments or concerns? This PR will not break any of your code, but if you want to utilize the Python-default C-order index ordering for the data only, a new argument named A future version, potentially v1.0.0 whenever that will be, will set the default to use C-order which will break code but ultimately is easy to fix. |
@addisonElliott I think this should be all, I hope I'm not forgetting anything. I made sure to clarify this in the user manual as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Documentation was a good start but I added a few more suggestions of content to add. Index ordering can be a confusing concept so I want to be as thorough as possible.
@tashrifbillah emailed me to let me know that he wants to review your PR just to make sure it doesn't break any of his existing code. Never hurts to have a second set of eyes.
From what I can tell, this code should be backwards compatible and I don't see any reason why it would break existing code.
docs/source/user-guide.rst
Outdated
@@ -182,6 +182,8 @@ Reading NRRD files | |||
------------------ | |||
There are three functions that are used to read NRRD files: :meth:`read`, :meth:`read_header` and :meth:`read_data`. :meth:`read` is a convenience function that opens the file located at :obj:`filename` and calls :meth:`read_header` and :meth:`read_data` in succession and returns the data and header. | |||
|
|||
Reading NRRD files using :meth:`read` or :meth:`read_data` will by default return arrays being indexed in Fortran order, i.e array elements are accessed by `data[x,y,z]`. This differs to the C-order, i.e. `data[z,y,x]`, which is more common in numpy. The :obj:`index_order` parameter can be used to specify which index ordering should be used on the returned array ('C' or 'F'). The :obj:`index_order` parameter needs to be consistent with the parameter of same name in :meth:`write`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading NRRD files using :meth:`read` or :meth:`read_data` will by default return arrays being indexed in Fortran order, i.e array elements are accessed by `data[x,y,z]`. This differs to the C-order, i.e. `data[z,y,x]`, which is more common in numpy. The :obj:`index_order` parameter can be used to specify which index ordering should be used on the returned array ('C' or 'F'). The :obj:`index_order` parameter needs to be consistent with the parameter of same name in :meth:`write`. | |
Reading NRRD files using :meth:`read` or :meth:`read_data` will by default return arrays being indexed in Fortran-order, i.e array elements are accessed by `data[x, y, z]`. This differs from C-order where array elements are accessed by `data[z, y, x]`, which is the more common order in Python and libraries (e.g. NumPy, scikit-image, PIL, OpenCV). The :obj:`index_order` parameter can be used to specify which index ordering should be used on the returned array ('C' or 'F'). |
Small comments on wording.
I think the last line mentioning that this should be consistent with nrrd.write
should be removed. The reason why is because the index_order
in nrrd.read
can be specified as whichever order preferred with no consequences. However, the important distinction is that the user must specify the correct index_order
in nrrd.write
.
Having that last sentence in the reading NRRD section could make others think "So I need to know what order the data was saved in, C-order or Fortran-order?", which is not the case. In terms of nrrd.read
, the index order is a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's worth reiterating on the fact that the user needs to be aware that the choices they make when reading will impact writing later on. This was the goal of the last sentence. Could probably do with some rephrasing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, I think this note that was added describes what you were trying to note. Do you agree?
The :obj:`index_order` parameter must be consistent with the index order specified in :meth:`read`. Reading an NRRD file in C-order and then writing as Fortran-order or vice versa will result in the data being transposed in the NRRD file.
If so, then we just need to think about where it should be placed. It is currently placed in nrrd.write
and I think that's the best spot personally. Maybe we could even place in the function docstring.
docs/source/user-guide.rst
Outdated
@@ -194,12 +196,16 @@ Writing NRRD files | |||
------------------ | |||
Writing to NRRD files can be done with the single function :meth:`write`. The :obj:`filename` parameter to the function specifies the absolute or relative filename to write the NRRD file. If the :obj:`filename` extension is .nhdr, then the :obj:`detached_header` parameter is set to true automatically. If the :obj:`detached_header` parameter is set to :obj:`True` and the :obj:`filename` ends in .nrrd, then the header file will have the same path and base name as the :obj:`filename` but with an extension of .nhdr. In all other cases, the header and data are saved in the same file. | |||
|
|||
The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file. | |||
The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file. :obj:`index_order` specifies the index order of the array :obj:`data`, this should be consistent with the argument of the same name in :meth:`read`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file. :obj:`index_order` specifies the index order of the array :obj:`data`, this should be consistent with the argument of the same name in :meth:`read`. | |
The :obj:`data` parameter is a :class:`numpy.ndarray` of data to be saved. :obj:`header` is an optional parameter of type :class:`dict` containing the field/values to be saved to the NRRD file. | |
Writing NRRD files will by default index the :obj:`data` array in Fortran-order where array elements are accessed by `data[x, y, z]`. This differs from C-order where array elements are accessed by `data[z, y, x]`, which is the more common order in Python and libraries (e.g. NumPy, scikit-image, PIL, OpenCV). The :obj:`index_order` parameter can be used to specify which index ordering should be used for the given :obj:`data` array ('C' or 'F'). |
My thoughts here are to keep the same wording and description from the reading NRRD files section just to be consistent. In addition, the note will draw to the user's attention that they need to be careful what they put here.
nrrd/writer.py
Outdated
header['dimension'] = data.ndim | ||
header['sizes'] = list(data.shape) | ||
|
||
if index_order == 'F': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would inline if it all:
header['sizes'] = list(data.shape) if index_order == 'F' else list(data.shape[::-1])
nrrd/reader.py
Outdated
Parameters | ||
---------- | ||
filename : :class:`str` | ||
Filename of the NRRD file | ||
custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional | ||
Dictionary used for parsing custom field types where the key is the custom field name and the value is a | ||
string identifying datatype for the custom field. | ||
index_order : :class: `str`, optional | ||
Specifies the index ordering of the outputted array, either 'F' (Fortran order) or 'C' (C order). Using 'F' (default) the image will be transposed, providing an index ordering in this manneri; `data[x][y][z]`. However, Numpy generally uses 'C' order, (e.g. `data[z][y][x]`) and 'C' order is therefore adviced to keep consistent with other libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies the index ordering of the outputted array, either 'F' (Fortran order) or 'C' (C order). Using 'F' (default) the image will be transposed, providing an index ordering in this manneri; `data[x][y][z]`. However, Numpy generally uses 'C' order, (e.g. `data[z][y][x]`) and 'C' order is therefore adviced to keep consistent with other libraries. | |
Specifies the index order used for reading. Either 'C' (C-order) where the dimensions are ordered from | |
slowest-varying to fastest-varying (e.g. (z, y, x)), or 'F' (Fortran-order) where the dimensions are ordered | |
from fastest-varying to slowest-varying (e.g. (x, y, z)). |
I basically just copied the wording from the write function. Not sure if it captures the same message you had or not. Just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could cause some confusion in the same manner as you mentioned in a previous comment. "Specifies the index order used for reading" implies that you are reading in a certain order from the file, which is not the case. The order is simply the index order (whether to transpose the data or not before returning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that "slowest-varying to fastest-varying" is vague without a description of memory layout which I agree with. Whatever language we come up with to describe C-order & Fortran-order, it should be here.
Fair enough, how about the below wording? It's basically changing "outputted" to "resulting data"
"Specifies the index order of the resulting data array."
Co-Authored-By: simeks <[email protected]>
Co-Authored-By: simeks <[email protected]>
Co-Authored-By: simeks <[email protected]>
Co-Authored-By: simeks <[email protected]>
The Github UI is a mess, but I think I covered all the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the new suggestions sounds like it may be buggy. I think there was one mess up with committing suggestions.
In addition, there was the inline if comment that wasn't addressed. Maybe that was overlooked or somehow your changes weren't committed?
docs/source/user-guide.rst
Outdated
@@ -182,6 +182,8 @@ Reading NRRD files | |||
------------------ | |||
There are three functions that are used to read NRRD files: :meth:`read`, :meth:`read_header` and :meth:`read_data`. :meth:`read` is a convenience function that opens the file located at :obj:`filename` and calls :meth:`read_header` and :meth:`read_data` in succession and returns the data and header. | |||
|
|||
Reading NRRD files using :meth:`read` or :meth:`read_data` will by default return arrays being indexed in Fortran order, i.e array elements are accessed by `data[x,y,z]`. This differs to the C-order, i.e. `data[z,y,x]`, which is more common in numpy. The :obj:`index_order` parameter can be used to specify which index ordering should be used on the returned array ('C' or 'F'). The :obj:`index_order` parameter needs to be consistent with the parameter of same name in :meth:`write`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, I think this note that was added describes what you were trying to note. Do you agree?
The :obj:`index_order` parameter must be consistent with the index order specified in :meth:`read`. Reading an NRRD file in C-order and then writing as Fortran-order or vice versa will result in the data being transposed in the NRRD file.
If so, then we just need to think about where it should be placed. It is currently placed in nrrd.write
and I think that's the best spot personally. Maybe we could even place in the function docstring.
nrrd/reader.py
Outdated
Parameters | ||
---------- | ||
filename : :class:`str` | ||
Filename of the NRRD file | ||
custom_field_map : :class:`dict` (:class:`str`, :class:`str`), optional | ||
Dictionary used for parsing custom field types where the key is the custom field name and the value is a | ||
string identifying datatype for the custom field. | ||
index_order : :class: `str`, optional | ||
Specifies the index ordering of the outputted array, either 'F' (Fortran order) or 'C' (C order). Using 'F' (default) the image will be transposed, providing an index ordering in this manneri; `data[x][y][z]`. However, Numpy generally uses 'C' order, (e.g. `data[z][y][x]`) and 'C' order is therefore adviced to keep consistent with other libraries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that "slowest-varying to fastest-varying" is vague without a description of memory layout which I agree with. Whatever language we come up with to describe C-order & Fortran-order, it should be here.
Fair enough, how about the below wording? It's basically changing "outputted" to "resulting data"
"Specifies the index order of the resulting data array."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this got really confusing due to me struggling with the interface. I think I managed to cover it all in the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the documentation side of this now.
There's one minor fix to make and then I want to take another look at the unit testing to make sure we're thorough. I think we are, but I just want to make absolutely sure the tests are running as expected.
@tashrifbillah hopefully will be able to review this soon so we can get it merged in.
Co-Authored-By: simeks <[email protected]>
👍
Yes, of course. There are other approaches as well, like https://pypi.org/project/parameterized/, but I didn't want to add additional dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over the tests closely. LGTM
Edit: One more minor change to make. I've found you enabled warnings for the code which is a good idea. However, I have a PR where it is at the top of each file and will fix some of these warnings.
nrrd/tests/test_reading.py
Outdated
@@ -176,6 +178,9 @@ def test_read_dup_field_error_and_warn(self): | |||
|
|||
import warnings | |||
with warnings.catch_warnings(record=True) as w: | |||
# Enable all warnings | |||
warnings.simplefilter("always") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead and remove this line. When running tests on your branch, I noticed I received some warnings that I didn't get before. I finally figured out it's because of this line.
I have an upcoming PR where I enable filters for the entire tests (I put this code at the top of each file). I also will be fixing some of these warnings too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 . This makes the test fail though, but since you're enabling it in another PR I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that's the reason why the one test was failing. I'm curious as to why this started happening only in your PR. I'm sure there's some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is mostly likely caused by the fact that we want to trigger the same warning twice (for both 'C' and 'F'). I don't know if it is intended but it seems that Python 2 only triggers warnings once, omitting the warning in subsequent calls.
This is a proposal for the problem discussed in #75. This should probably be seen as a work-in-progress and a base for further discussion. I'm not sure whether this is the best approach so feedback would be appreciated.
The first addition is the
index_order
argument fornrrd.read
which allows the user to specify the desired index order. I set the default to 'F' so the behavior should stay the same as with previous versions.I also changed
nrrd.write
. This function did previously not work on C-contiguous arrays and this should now be fixed. Here I added anindex_order
flag, but my hope is that it should only be required in a few cases. Similar to most numpy functions there are three options;F
,C
, andA
. TheA
(default) option automatically detects the order of the array. So for example, when trying to write an array that was read usingnrrd.read(index_order='F')
, the function should correctly derive the shape and write the array in Fortran order.At first sight I see only one problem with the
A
option and that is that it breaks down if you do something like this:Since
np.ascontiguousarray
will keep the shape but change the underlying memory order, changing it toC
. This is maybe not something we should worry about though.Also, to by knowledge there is no way to do parameterized tests using the standard
unittest
module so I did some jury-rigging to test theindex_order
argument. This could probably use some clean-up.