- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
Fix getting filenames out of netCDF datasets #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if hasattr(bound_method, "__self__"): | ||
| file_handles.append(Path(bound_method.__self__._filename)) | ||
|  | ||
| return file_handles | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall structure looks good and I can follow what the function does, though I'm not familiar with the specific functions/classes/attributes used (can infer what they do from use, but not confirm that they do what they seem to)
| files.append(versioneer_file) | ||
| present = False | ||
| try: | ||
| with open(".gitattributes", "r") as fobj: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why the removal of "r" is beneficial. Wouldn't it be more futureproof to specify read only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a versioneer artefact, not sure what's caused this to change tbh.
        
          
                tests/test_chunking.py
              
                Outdated
          
        
      | engine="netcdf4", | ||
| ) | ||
| else: | ||
| ds = xr.open_dataset(fpath, decode_timedelta=False, engine="netcdf4") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open_mfdataset seems to cope with a single path (as best I can tell, it uses it as a glob string that returns a single file); is there a reason to use open_dataset explicitly for only one filepath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you get both instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. Working on updating the related examples now anyway so will fix in there
| fhandles = _get_file_handles(ds) | ||
|  | ||
| if isinstance(fpath, list): | ||
| assert fhandles == [Path(f) for f in fpath] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this work if fpath contains the same paths but in a different order to fhandles? Would you want this to pass or fail in that instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case it'd fail, you could replace it with a set intersection to make it pass. Whether you'd want it to pass/fail is more of a philosophical question I guess
| else: | ||
| ds = xr.open_dataset(fpath, decode_timedelta=False, engine="netcdf4") | ||
| with warnings.catch_warnings(): | ||
| chunk_dict = validate_chunkspec( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is the right place to mention this, but does validate_chunkspec care if the chunks are not an integer divisor of file size in that dimension? Ie, if files have 3 timesteps, in disc chunks of 1, is it okay with chunks of two? in which case you'd end up with every second chunk being half as big as the other. Are we okay with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I actually don't know whether this would cause performance issues. I suspect not but can't say for certain..
Mostly closes #7
As of right now I'm not sure that we can get all the filenames out of a multifile dataarray, only the first. With that said, probably unimportant in nearly all situations. I've added an xfailing test for it, but in practice I think it'll give the right answer if a multifile datarray is passed to this function.
@jemmajeffree think this might be up your alley - no rush with the review though.