[#44] Aggregation data processing #45

pkdash · 2023-03-02T15:37:00Z

No description provided.

…tions

sblack-usu · 2023-03-15T14:38:03Z

hsclient/hydroshare.py

+                raise Exception(f"Aggregation was not found at: {agg_path}")
+
+        if for_save_data:
+            if self.metadata.type == AggregationType.GeographicFeatureAggregation:


Instead of handling all types of aggregations within this class, I think we should use class inheritance and and implementation aggregation type specific logic within the aggregations subclass.

Updated example Jupyter notebooks with changes based on testing. Also minor edits to text to change "hs rdf" to "hsclient".

… path search

… data object

…44-aggregation-data-processing-object

devincowan

thanks Pabitra, I was pretty lost before. Now I see where you had marked tests to skip

…hon packages

pkdash · 2023-06-05T15:01:06Z

@sblack-usu Please review. We need a release soon as @horsburgh is planning to use the new functionalities in this PR in a workshop during the CUAHSI Biennial that starts on 11th June.

sblack-usu · 2023-06-07T15:38:24Z

docs/examples/Basic_Operations.ipynb

@@ -52,7 +52,10 @@
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
-    "id": "3njsiY73m7_V"
+    "id": "3njsiY73m7_V",
+    "pycharm": {


I'm not sure what the new pycharm blocks are. It probably doesn't make a difference for the output or functionality when running in jupyterhub.

I assume PyCharm somehow injected these blocks when @horsburgh was editing this file. I removed those blocks.

sblack-usu · 2023-06-07T15:48:44Z

hsclient/hydroshare.py

+
+        # when searching using 'file__path' or files__path' as the key, there can be only one matching aggregation
+        file_path_priority = kwargs.pop("file_path_priority", True)
+        file_path = kwargs.get("file__path", "")


I don't understand why this new block is needed. It ignores any other search parameters by default.

Most of the aggregation data object related operations retrieve aggregation by aggregation path name only. This block makes the search bit more efficient in that case. However, the search efficiency is noticeable only when there are many aggregations in a resource or few aggregations that contain a large number of files. I have removed this aggregation path priority based search since for most of the resources it won't matter in terms of faster search.

sblack-usu · 2023-06-07T15:51:36Z

hsclient/hydroshare.py

+            if agg_type == AggregationType.GeographicRasterAggregation and not path.endswith(".vrt") \
+                    or agg_type == AggregationType.FileSetAggregation:
+                # search all files of the aggregation to find a matching aggregation
+                return self.aggregation(file__path=path, file_path_priority=False)


This is a clue to why the block above is needed. I don't understand yet why we need the file_path_priority filter and why rasters and filesets are treated differently than other aggregations.

I have removed these changes as I am no more doing the priority aggregation path-based search.

sblack-usu · 2023-06-07T15:53:15Z

tests/test_data_objects.py

@@ -0,0 +1,320 @@
+import os


Thanks for the thorough tests (as usual for you)

sblack-usu · 2023-06-07T15:55:29Z

hsclient/hydroshare.py

-        :return: A pandas.Series object
-        """
+
+class DataObjectSupportingAggregation(Aggregation):


I like the approach of different class implementations for each aggregation type.

sblack-usu

I don't want to hold this up any longer so I am approving. The only question I have is around the aggregation search with the new filter file_path_priority. Everything else looks great, thanks for the thorough tests.

pkdash · 2023-06-08T03:02:55Z

@sblack-usu See if I have addressed your comments.

sblack-usu · 2023-06-08T14:59:52Z

@sblack-usu See if I have addressed your comments.

Thanks for addressing my comments. Approved, go ahead and merge when you are ready.

pkdash added 6 commits February 27, 2023 15:13

[#44] initial work - loading aggregation data to data processing object

a0ce435

[#44] initial work for editing data for netcdf and timeseries aggrega…

9673494

…tions

[#44] making the data processing package installation optional

61d9ba3

[#44] data processing objects to work only with downloaded aggregations

7315cad

[#44] fixing tests - marked some tests to skip for bugs in hydroshare

22ee683

[#44] initial work for editing data for geo-feature aggregation

ee1e009

sblack-usu requested changes Mar 17, 2023

View reviewed changes

horsburgh and others added 8 commits March 21, 2023 20:52

Update example notebooks

07d8a24

Updated example Jupyter notebooks with changes based on testing. Also minor edits to text to change "hs rdf" to "hsclient".

[#44] adding aggregation type classes for data object support

069631e

[#44] adding new method to move aggregation

6f036f1

[#44] using aggregation resmap filename to filter aggregation by file…

2e52e0e

… path search

[#44] preventing aggregation delete as part of aggregation update via…

469c4e2

… data object

[#44] adding tests for aggregation data object

72f36fe

Merge branch 'master' of https://github.com/hydroshare/hsclient into …

657f599

…44-aggregation-data-processing-object

[#44] enabling skipped tests

c0fd80f

devincowan approved these changes Apr 19, 2023

View reviewed changes

pkdash added 5 commits April 21, 2023 15:07

[#44] adding a flag to use file path to find aggregation

6f2747c

[#44] compute aggregation path for updated aggregation

5efe36c

[#44] fixing tests for aggregation data objects

feb1e84

[#44] updating with master

d1355ca

[#44] adding example notebook for aggregation data object operations

429bec2

pkdash marked this pull request as ready for review June 2, 2023 21:54

pkdash added 2 commits June 2, 2023 23:26

[#44] updating the github action yml file to install all optional pyt…

14e9d7e

…hon packages

[#44] reverting packaging yml change for optional dependencies

cf29bce

[#44] run github workflow job using PR source branch

94909d6

sblack-usu reviewed Jun 7, 2023

View reviewed changes

tests/test_data_objects.py

@@ -0,0 +1,320 @@

import os

Copy link

Collaborator

sblack-usu Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough tests (as usual for you)

sblack-usu reviewed Jun 7, 2023

View reviewed changes

sblack-usu approved these changes Jun 7, 2023

View reviewed changes

pkdash added 3 commits June 7, 2023 22:31

[#44] removing priority search on aggregation path

2e0ed39

[#44] rolling back changes to github action workflow file

f3fa653

[#44] removing some pycharm specific blocks from notebook file

2c14b5a

pkdash merged commit fd11a10 into master Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#44] Aggregation data processing #45

[#44] Aggregation data processing #45

pkdash commented Mar 2, 2023

sblack-usu Mar 15, 2023

devincowan left a comment

pkdash commented Jun 5, 2023

sblack-usu Jun 7, 2023

pkdash Jun 8, 2023

sblack-usu Jun 7, 2023

pkdash Jun 8, 2023

sblack-usu Jun 7, 2023

pkdash Jun 8, 2023

sblack-usu Jun 7, 2023

sblack-usu Jun 7, 2023

sblack-usu left a comment

pkdash commented Jun 8, 2023

sblack-usu commented Jun 8, 2023

[#44] Aggregation data processing #45

[#44] Aggregation data processing #45

Conversation

pkdash commented Mar 2, 2023

Choose a reason for hiding this comment

devincowan left a comment

Choose a reason for hiding this comment

pkdash commented Jun 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sblack-usu left a comment

Choose a reason for hiding this comment

pkdash commented Jun 8, 2023

sblack-usu commented Jun 8, 2023