-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#44] Aggregation data processing #45
Conversation
hsclient/hydroshare.py
Outdated
raise Exception(f"Aggregation was not found at: {agg_path}") | ||
|
||
if for_save_data: | ||
if self.metadata.type == AggregationType.GeographicFeatureAggregation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handling all types of aggregations within this class, I think we should use class inheritance and and implementation aggregation type specific logic within the aggregations subclass.
Updated example Jupyter notebooks with changes based on testing. Also minor edits to text to change "hs rdf" to "hsclient".
…44-aggregation-data-processing-object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Pabitra, I was pretty lost before. Now I see where you had marked tests to skip
@sblack-usu Please review. We need a release soon as @horsburgh is planning to use the new functionalities in this PR in a workshop during the CUAHSI Biennial that starts on 11th June. |
docs/examples/Basic_Operations.ipynb
Outdated
@@ -52,7 +52,10 @@ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"metadata": { | |||
"id": "3njsiY73m7_V" | |||
"id": "3njsiY73m7_V", | |||
"pycharm": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the new pycharm blocks are. It probably doesn't make a difference for the output or functionality when running in jupyterhub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume PyCharm somehow injected these blocks when @horsburgh was editing this file. I removed those blocks.
hsclient/hydroshare.py
Outdated
|
||
# when searching using 'file__path' or files__path' as the key, there can be only one matching aggregation | ||
file_path_priority = kwargs.pop("file_path_priority", True) | ||
file_path = kwargs.get("file__path", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this new block is needed. It ignores any other search parameters by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the aggregation data object related operations retrieve aggregation by aggregation path name only. This block makes the search bit more efficient in that case. However, the search efficiency is noticeable only when there are many aggregations in a resource or few aggregations that contain a large number of files. I have removed this aggregation path priority based search since for most of the resources it won't matter in terms of faster search.
hsclient/hydroshare.py
Outdated
if agg_type == AggregationType.GeographicRasterAggregation and not path.endswith(".vrt") \ | ||
or agg_type == AggregationType.FileSetAggregation: | ||
# search all files of the aggregation to find a matching aggregation | ||
return self.aggregation(file__path=path, file_path_priority=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clue to why the block above is needed. I don't understand yet why we need the file_path_priority filter and why rasters and filesets are treated differently than other aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed these changes as I am no more doing the priority aggregation path-based search.
@@ -0,0 +1,320 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough tests (as usual for you)
:return: A pandas.Series object | ||
""" | ||
|
||
class DataObjectSupportingAggregation(Aggregation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach of different class implementations for each aggregation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to hold this up any longer so I am approving. The only question I have is around the aggregation search with the new filter file_path_priority
. Everything else looks great, thanks for the thorough tests.
@sblack-usu See if I have addressed your comments. |
Thanks for addressing my comments. Approved, go ahead and merge when you are ready. |
No description provided.