Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest COGs directly into GEE #587

Merged
merged 3 commits into from
Nov 4, 2024
Merged

Ingest COGs directly into GEE #587

merged 3 commits into from
Nov 4, 2024

Conversation

jterry64
Copy link
Member

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Currently, we upload data to GEE as COG-backed assets. Some important, large datasets like integrated alerts hit a limit of 10 MB for metadata size, causing it to fail.

Issue Number: GTC-2953

What is the new behavior?

Just directly ingest image to GEE using GEE API. This is "fire-and-forget"; this kicks off the ingestion job on GEE, but doesn't check if it succeeds, since it can take multiple hours. For integrated alerts, which is updated daily, this is fine.

Does this introduce a breaking change?

  • Yes
  • No

Other information

@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.23%. Comparing base (14f51c5) to head (458125e).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #587   +/-   ##
========================================
  Coverage    81.23%   81.23%           
========================================
  Files          130      130           
  Lines         5857     5857           
========================================
  Hits          4758     4758           
  Misses        1099     1099           
Flag Coverage Δ
unittests 81.23% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@danscales danscales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the missing call at line 118, this looks good. I didn't try to verify that the GEE api calls are correct, but they seem reasonable.

gcs_path = upload_cog_to_gcs(dataset, implementation)
create_cog_backed_asset(dataset, implementation, gcs_path, service_account)
asset_id = (dataset, implementation, gcs_path, service_account)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the call to ingest_in_gee here! I think you mean:

asset_id = ingest_in_gee(dataset, implementation, gcs_path)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jterry64
Copy link
Member Author

@danscales fixed your comments, and tested the script, and it now works. The only issue I couldn't figure out is how to set the ACL for the new asset to allow public reading. For the COG backed asset, the asset is created immediately, and then you can set the ACL.

For the direct ingestion, the asset doesn't actually exist until it's finished ingesting, so you can't modify the ACL. Because we just fire and forget, we don't wait around until we can modify it... not sure what to do here, but unfortunately don't think I can figure it out today. If someone could pick this up and figure out a way to make sure it's publicly accessible after creation, then we could merge this PR.

@jterry64
Copy link
Member Author

If don't have access to the GEE project, talk to @tanderegg

@jterry64
Copy link
Member Author

@danscales alternatively, once this item is completed for DIST alerts, we can just split the integrated alerts COG into a mosaic where each COG is under the header limit, and then upload those as an image collection: https://gfw.atlassian.net/browse/GTC-2980

You can check the header size with this script: https://github.com/rouault/cog_validator/blob/master/validate_cloud_optimized_geotiff.py

@danscales danscales merged commit 7e9eef9 into develop Nov 4, 2024
2 checks passed
@danscales danscales deleted the gtc-2953/ingest_to_gee branch November 4, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants