You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello from the ASDC developer team! I am creating this Issue to track a PR [1].
Issue
As Cumulus operators, we experienced failures with the Discover Granules workflow. We came across Cumulus ingest failures and realized that not all necessary files were present in the workflow's input that were specified in the Cumulus Collection configuration file.
The root cause of this issue happens when data provider teams do not supply the necessary files for Granules for successful ingestion.
Proposed solution
To mitigate failures, especially when ingesting multiple granules, we have incorporated behavior to detect and remove Granules from the current workflow that do not satisfy their Cumulus Collection configuration.
This PR change introduces an optionalallFilesPresent flag which checks for necessary Granule files before continuing with ingest workflow tasks. When configured in the Cumulus Collection configuration, this new field acts like a filter, removing Granules that are missing files from the Cumulus Collection configuration. This flag can be added to the meta field for a Cumulus Collection, as seen in the example section below.
This change allows Discover Granules to perform workflows without failures due to missing files. This is desired when large sums of Granules are being ingested at once, and we do not wish to fail otherwise good Granules.
This change has been incorporated into our own repository [2], and we have been pulling the Discover Granules task from this location where the proposed changes have already been made. We have been successfully using this implementation since October '22.
This Cumulus Collection enforces that all files must be present as seen in the meta field's "allFilesPresent": true configuration.
So data providers submitting these Granules will experience the following behavior.
All files present (files matching ..\\d{8}$", ..*\\.met$, and ..*\\.cmr\\.json$ are detected) -> Successful DiscoverGranules workflow
Any file missing -> Successful DiscoverGranules workflow, but Granule was removed
Other noteworthy behavior when allFilesPresent is set to true
Assuming there is only one granule in S3 and it gets removed. Then the Discover Granules would just discover nothing so there would be no workflow. If a granule is missing a file it's not even discovered, it's completely filtered out.
The Granules could be removed or just left in the S3 location. The missing files for the Granule could appear later on in S3. Every time the Discover Granules is ran, it will check all the files again
If you do remove the single Granule nothing should happen to the Granule, it won't even go to the next stage. Only the Granules with all files present will go to the next stage
Hello from the ASDC developer team! I am creating this Issue to track a PR [1].
Issue
As Cumulus operators, we experienced failures with the Discover Granules workflow. We came across Cumulus ingest failures and realized that not all necessary files were present in the workflow's input that were specified in the Cumulus Collection configuration file.
The root cause of this issue happens when data provider teams do not supply the necessary files for Granules for successful ingestion.
Proposed solution
To mitigate failures, especially when ingesting multiple granules, we have incorporated behavior to detect and remove Granules from the current workflow that do not satisfy their Cumulus Collection configuration.
This PR change introduces an optional
allFilesPresent
flag which checks for necessary Granule files before continuing with ingest workflow tasks. When configured in the Cumulus Collection configuration, this new field acts like a filter, removing Granules that are missing files from the Cumulus Collection configuration. This flag can be added to themeta
field for a Cumulus Collection, as seen in the example section below.This change allows Discover Granules to perform workflows without failures due to missing files. This is desired when large sums of Granules are being ingested at once, and we do not wish to fail otherwise good Granules.
This change has been incorporated into our own repository [2], and we have been pulling the Discover Granules task from this location where the proposed changes have already been made. We have been successfully using this implementation since October '22.
Example
Consider this Cumulus Collection
This Cumulus Collection enforces that all files must be present as seen in the
meta
field's"allFilesPresent": true
configuration.So data providers submitting these Granules will experience the following behavior.
..\\d{8}$"
,..*\\.met$
, and..*\\.cmr\\.json$
are detected) -> Successful DiscoverGranules workflowOther noteworthy behavior when
allFilesPresent
is set totrue
[1] #3200
[2] https://git.earthdata.nasa.gov/projects/ASDCCLOUD/repos/asdc-cumulus/pull-requests/71/overview
The text was updated successfully, but these errors were encountered: