Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#44] Aggregation data processing #45

Merged
merged 25 commits into from
Jun 8, 2023

Conversation

pkdash
Copy link
Member

@pkdash pkdash commented Mar 2, 2023

No description provided.

raise Exception(f"Aggregation was not found at: {agg_path}")

if for_save_data:
if self.metadata.type == AggregationType.GeographicFeatureAggregation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of handling all types of aggregations within this class, I think we should use class inheritance and and implementation aggregation type specific logic within the aggregations subclass.

Copy link
Collaborator

@devincowan devincowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Pabitra, I was pretty lost before. Now I see where you had marked tests to skip

@pkdash pkdash marked this pull request as ready for review June 2, 2023 21:54
@pkdash
Copy link
Member Author

pkdash commented Jun 5, 2023

@sblack-usu Please review. We need a release soon as @horsburgh is planning to use the new functionalities in this PR in a workshop during the CUAHSI Biennial that starts on 11th June.

@@ -52,7 +52,10 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3njsiY73m7_V"
"id": "3njsiY73m7_V",
"pycharm": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the new pycharm blocks are. It probably doesn't make a difference for the output or functionality when running in jupyterhub.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume PyCharm somehow injected these blocks when @horsburgh was editing this file. I removed those blocks.


# when searching using 'file__path' or files__path' as the key, there can be only one matching aggregation
file_path_priority = kwargs.pop("file_path_priority", True)
file_path = kwargs.get("file__path", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this new block is needed. It ignores any other search parameters by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the aggregation data object related operations retrieve aggregation by aggregation path name only. This block makes the search bit more efficient in that case. However, the search efficiency is noticeable only when there are many aggregations in a resource or few aggregations that contain a large number of files. I have removed this aggregation path priority based search since for most of the resources it won't matter in terms of faster search.

if agg_type == AggregationType.GeographicRasterAggregation and not path.endswith(".vrt") \
or agg_type == AggregationType.FileSetAggregation:
# search all files of the aggregation to find a matching aggregation
return self.aggregation(file__path=path, file_path_priority=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clue to why the block above is needed. I don't understand yet why we need the file_path_priority filter and why rasters and filesets are treated differently than other aggregations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these changes as I am no more doing the priority aggregation path-based search.

@@ -0,0 +1,320 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough tests (as usual for you)

:return: A pandas.Series object
"""

class DataObjectSupportingAggregation(Aggregation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of different class implementations for each aggregation type.

Copy link
Collaborator

@sblack-usu sblack-usu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to hold this up any longer so I am approving. The only question I have is around the aggregation search with the new filter file_path_priority. Everything else looks great, thanks for the thorough tests.

@pkdash
Copy link
Member Author

pkdash commented Jun 8, 2023

@sblack-usu See if I have addressed your comments.

@sblack-usu
Copy link
Collaborator

@sblack-usu See if I have addressed your comments.

Thanks for addressing my comments. Approved, go ahead and merge when you are ready.

@pkdash pkdash merged commit fd11a10 into master Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants