Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open mfdataset enchancement #9955

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pratiman-91
Copy link

Added new argument in open_mfdataset to better handle the invalid files.

errors : {'ignore', 'raise', 'warn'}, default 'raise'
        - If 'raise', then invalid dataset will raise an exception.
        - If 'ignore', then invalid dataset will be ignored.
        - If 'warn', then a warning will be issued for each invalid dataset.

Copy link

welcome bot commented Jan 16, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@max-sixty
Copy link
Collaborator

I'm not the expert, but this looks reasonable! Any other thoughts?

Assuming no one thinks it's a bad idea, we would need tests.

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea.

But the way it is implemented here seems overly complicated and repetitive.
I would suggest to revert the logic: first build up the list wrapped in a single try and then handle the three cases in the except block.

xarray/backends/api.py Outdated Show resolved Hide resolved
)
continue
else:
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never being raised because it is catched by the outer try-except.

Also, to keep the complexity lower, check this before entering the if else block: if errors not in ("raise", "warn", "ignore"): raise ValueError(...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @headtr1ck for pointing this out. I have made changes to the logic based on your suggestions.

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Also, we should add tests for this.

xarray/backends/api.py Outdated Show resolved Hide resolved
xarray/backends/api.py Outdated Show resolved Hide resolved
@pratiman-91
Copy link
Author

@headtr1ck Thanks for the suggestions. I have added two tests (ignore and warn). Also, while testing, I found that a new argument broke combine="nested" due to invalid ids. I have now modified it to reflect the correct ids, and it is passing the tests. Please review the tests and the latest version.

@@ -1626,6 +1654,7 @@ def open_mfdataset(
if combine == "nested":
# Combined nested list by successive concat and merge operations
# along each dimension, using structure given by "ids"
ids = [id_ for i, id_ in enumerate(ids) if i not in invalid_ids]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with nested datasets? E.g. a 2x2 set where one file is missing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I did not know about 2x2 functionality. Now, I have modified the logic to remove the invalid ids and paths and added tests for the same.

@pratiman-91
Copy link
Author

Hi @headtr1ck, I have been thinking about the handling of ids. Current version looks like a patch work (I am not happy with it.). I think we can create ids after removing all the invalid datasets from path1d within the combine==nested block. Please let me know what do you think.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

better handling of invalid files in open_mfdataset
3 participants