-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest COGs directly into GEE #587
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #587 +/- ##
========================================
Coverage 81.23% 81.23%
========================================
Files 130 130
Lines 5857 5857
========================================
Hits 4758 4758
Misses 1099 1099
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the missing call at line 118, this looks good. I didn't try to verify that the GEE api calls are correct, but they seem reasonable.
batch/python/export_to_gee.py
Outdated
gcs_path = upload_cog_to_gcs(dataset, implementation) | ||
create_cog_backed_asset(dataset, implementation, gcs_path, service_account) | ||
asset_id = (dataset, implementation, gcs_path, service_account) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the call to ingest_in_gee here! I think you mean:
asset_id = ingest_in_gee(dataset, implementation, gcs_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@danscales fixed your comments, and tested the script, and it now works. The only issue I couldn't figure out is how to set the ACL for the new asset to allow public reading. For the COG backed asset, the asset is created immediately, and then you can set the ACL. For the direct ingestion, the asset doesn't actually exist until it's finished ingesting, so you can't modify the ACL. Because we just fire and forget, we don't wait around until we can modify it... not sure what to do here, but unfortunately don't think I can figure it out today. If someone could pick this up and figure out a way to make sure it's publicly accessible after creation, then we could merge this PR. |
If don't have access to the GEE project, talk to @tanderegg |
@danscales alternatively, once this item is completed for DIST alerts, we can just split the integrated alerts COG into a mosaic where each COG is under the header limit, and then upload those as an image collection: https://gfw.atlassian.net/browse/GTC-2980 You can check the header size with this script: https://github.com/rouault/cog_validator/blob/master/validate_cloud_optimized_geotiff.py |
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
Currently, we upload data to GEE as COG-backed assets. Some important, large datasets like integrated alerts hit a limit of 10 MB for metadata size, causing it to fail.
Issue Number: GTC-2953
What is the new behavior?
Just directly ingest image to GEE using GEE API. This is "fire-and-forget"; this kicks off the ingestion job on GEE, but doesn't check if it succeeds, since it can take multiple hours. For integrated alerts, which is updated daily, this is fine.
Does this introduce a breaking change?
Other information