Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUMULUS-304: Add AWS API throttling to pdr-status-check task #256

Merged
merged 6 commits into from
Mar 21, 2018

Conversation

jennyhliu
Copy link
Contributor

@jennyhliu jennyhliu commented Mar 15, 2018

Summary: Summary of changes

Addresses CUMULUS-304: Add AWS API throttling to pdr-status-check task

Changes

  • Added concurrency limit on SFN API calls

Test Plan

Things that should succeed before merging.

  • Unit tests
  • Adhoc testing
  • Update CHANGELOG
  • Run ./bin/eslint-ratchet and verify that eslint errors have not increased
    eslint-ratchet error: eslint errors have increased by 10. But the code I wrote doesn't have any eslint errors.
    • Commit .eslint-ratchet-high-water-mark if the score has improved

Release PR

If this is a release PR, make sure you branch name starts with release- prefix, e.g. release-v-1.1.2.

Reviewers: @tag-your-friends

* @param {string} executionArn - step function execution arn
* @returns {Promise.<Object>} - an object describing the status of the exection
*/
function describeExecutionStatus(executionArn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this matters but this is kind of duplicating functionality we have in https://github.com/cumulus-nasa/cumulus/blob/master/packages/api/endpoints/execution-status.js - we could think about putting this in @cumulus/common/aws and sharing code between these 2 modules (cumulus/tasks/pdr-status-check/index.js and packages/api/endpoints/execution-status.js) from @cumulus/common/aws

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it might be we want different enough behavior that keeping them separate makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks similar, but combining them doesn't seem save much code.

return aws.sfn().describeExecution({ executionArn }).promise()
.catch((e) => {
if (e.code === 'ExecutionDoesNotExist') {
return { executionArn: executionArn, status: 'RUNNING' };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this to me? If we get an error ExecutionDoesNotExist why do we return a status of RUNNING?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The granule stats in the pdr have the following lists: running, completed, failed. Initially, the PDR queues these granules and puts the executions to running list. So, the executions which haven't been executed are still belong to running list.

}
}
const promisedExecutionDescriptions = runningExecutionArns.map((executionArn) =>
limit(() => describeExecutionStatus(executionArn)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that this limits concurrency to 10 such that if there are 100 running executions, it would only fetch 10 statuses at a time?

Related question - is there any retry functionality built into this? Like if one of the connections fails we retry a request for the execution arn? Not required for this fix but just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only 10 sfn.DescribeExecution can be run at a time.
It's not retried, if any sfn.DescribeExecution fails, the lambda function will fail. I think we could configure the lambda function to be retried in case of failure, although I haven't done one.

@@ -122,3 +125,40 @@ test('returns the correct results in the nominal case', (t) => {
});
});
});

test('test concurrency limit setting on sfn api calls', (t) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@abarciauskas-bgse
Copy link
Contributor

@jennyhliu PR looks good to me! But need a fix of merge conflict - I think just the CHANGELOG is out of date.

@jennyhliu
Copy link
Contributor Author

@abarciauskas-bgse Aimee, updated branch.

CHANGELOG.md Outdated
@@ -5,6 +5,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]
- **CUMULUS-260: "PDR page on dashboard only shows zeros."** The PDR stats in LPDAAC are all 0s, even if the dashboard has been fixed to retrieve the correct fields. The current version of pdr-status-check has a few issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed duplicate entry.

@jennyhliu jennyhliu merged commit ee37f0b into master Mar 21, 2018
@jennyhliu jennyhliu deleted the pdrstatsplimit branch March 21, 2018 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants