-
Notifications
You must be signed in to change notification settings - Fork 270
always return a masked array be default #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@shoyer, would this impact xarray? |
|
Xarray always sets |
|
@akrherz I would like to point out my issues with this pull-request in this thread. Below is a dummy code, but it is potentially similar to code I used to process raw radar measurements. I start with some This raw data also contains some noise, which has to be estimated and subtracted. Afterwards I convert everything to dB. Because I want to get rid of data that is below the noise level (including -inf and nan), I create the boolean The code might look like this: import numpy as np
def dummy_radar_processing(raw_signal):
noise = np.percentile(raw_signal, 10) # simplified noise estimation
log_signal = 10 * np.log10(raw_signal - noise) # subtract nose and to dB
filter = np.isfinite(log_signal) & (log_signal > 10*np.log10(noise))
above_noise = log_signal[filter]
print('log_signal:', log_signal)
print('above_noise:', above_noise)
print('I have %d values above noise level.' % above_noise.size)(Later, I would use the Now, let's apply this code to some test data. I use np.array and np.ma.masked_less as replacement for nc.variables['raw_data'][:]. Up to netCDF4 1.3.1 it would be like: raw_signal = np.array((1.,5,6,8,3,4,2,3,7,0.5,1), np.float32) # read from netcdf, no value is missing
dummy_radar_processing(raw_signal)But with a MaskedArray (PR #787) : import netCDF4
raw_signal = np.ma.masked_equal(np.array((1.,5,6,8,3,4,2,3,7,0.5,1), np.float32), netCDF4.default_fillvals['f4']) # read from netcdf
dummy_radar_processing(raw_signal)Now, it is obvious, that there are only 7 valid values, but since I treated masked arrays with my code that is written for normal array, I somehow get another result. I hope this example makes it clear, how code might break with the change of the default. |
|
If you use then the result is the same for ndarrays and MaskedArrays. |
|
Yes, thats true. There are many ways to save me from this pitfall. (But to do so, I have to be aware of this change) |
|
Agreed - this change can break existing code, and I regret that. Sometimes that can't be avoided though, and all in all I think all and all it is a step in the right direction. |
|
One good way to handle backwards compatibility breaks is to increment major version numbers. |
|
Hi, just my 2 cents, I had to raise a little issue: when the 2 different attributes are present, both _FillValue and missing_value, the reported value in the masked array is wrong, since in fill_value it has the value of the missing_value attribute. |
Issue #785 points out that by default, either a
numpy.ma.core.MaskedArrayor anumpy.ndarraycan be returned, depending on whether the slice contains any missing values. This pull request ensures that anumpy.ma.core.MaskedArrayis always returned, unlessset_auto_mask(False)is invoked and then anumpy.ndarrayis always returned.