-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUMULUS-304: Add AWS API throttling to pdr-status-check task #256
Conversation
* @param {string} executionArn - step function execution arn | ||
* @returns {Promise.<Object>} - an object describing the status of the exection | ||
*/ | ||
function describeExecutionStatus(executionArn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this matters but this is kind of duplicating functionality we have in https://github.com/cumulus-nasa/cumulus/blob/master/packages/api/endpoints/execution-status.js - we could think about putting this in @cumulus/common/aws
and sharing code between these 2 modules (cumulus/tasks/pdr-status-check/index.js
and packages/api/endpoints/execution-status.js
) from @cumulus/common/aws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it might be we want different enough behavior that keeping them separate makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks similar, but combining them doesn't seem save much code.
return aws.sfn().describeExecution({ executionArn }).promise() | ||
.catch((e) => { | ||
if (e.code === 'ExecutionDoesNotExist') { | ||
return { executionArn: executionArn, status: 'RUNNING' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this to me? If we get an error ExecutionDoesNotExist
why do we return a status of RUNNING
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The granule stats in the pdr have the following lists: running, completed, failed. Initially, the PDR queues these granules and puts the executions to running list. So, the executions which haven't been executed are still belong to running list.
} | ||
} | ||
const promisedExecutionDescriptions = runningExecutionArns.map((executionArn) => | ||
limit(() => describeExecutionStatus(executionArn))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that this limits concurrency to 10 such that if there are 100 running executions, it would only fetch 10 statuses at a time?
Related question - is there any retry functionality built into this? Like if one of the connections fails we retry a request for the execution arn? Not required for this fix but just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only 10 sfn.DescribeExecution can be run at a time.
It's not retried, if any sfn.DescribeExecution fails, the lambda function will fail. I think we could configure the lambda function to be retried in case of failure, although I haven't done one.
@@ -122,3 +125,40 @@ test('returns the correct results in the nominal case', (t) => { | |||
}); | |||
}); | |||
}); | |||
|
|||
test('test concurrency limit setting on sfn api calls', (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test!
@jennyhliu PR looks good to me! But need a fix of merge conflict - I think just the CHANGELOG is out of date. |
@abarciauskas-bgse Aimee, updated branch. |
CHANGELOG.md
Outdated
@@ -5,6 +5,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) | |||
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). | |||
|
|||
## [Unreleased] | |||
- **CUMULUS-260: "PDR page on dashboard only shows zeros."** The PDR stats in LPDAAC are all 0s, even if the dashboard has been fixed to retrieve the correct fields. The current version of pdr-status-check has a few issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is part of the 1.2.0 release: https://github.com/cumulus-nasa/cumulus/pull/256/files#diff-4ac32a78649ca5bdd8e0ba38b7006a1eR21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed duplicate entry.
Summary: Summary of changes
Addresses CUMULUS-304: Add AWS API throttling to pdr-status-check task
Changes
Test Plan
Things that should succeed before merging.
./bin/eslint-ratchet
and verify that eslint errors have not increasedeslint-ratchet error: eslint errors have increased by 10. But the code I wrote doesn't have any eslint errors.
.eslint-ratchet-high-water-mark
if the score has improvedRelease PR
If this is a release PR, make sure you branch name starts with
release-
prefix, e.g.release-v-1.1.2
.Reviewers: @tag-your-friends