-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUMULUS-260 update pdr stats #235
Conversation
Actually, we can call indexer.partialRecordUpdate to update the pdr.stats only from pdr-status-check. I will try that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add changelog here?
CHANGELOG.md
Outdated
- **CUMULUS-260: "PDR page on dashboard only shows zeros."** The PDR stats in LPDAAC are all 0s, even if the dashboard has been fixed to retrieve the correct fields. The current version of pdr-status-check has a few issues. | ||
- pdr is not included in the input/output schema. It's available from the input event. So the pdr status and stats are not updated when the ParsePdr workflow is complete. Adding the pdr to the input/output of the task will fix this. | ||
- pdr-status-check doesn't update pdr stats which prevent the real time pdr progress from showing up in the dashboard. To solve this, added lambda function sf-sns-report which is copied from @cumulus/api/lambdas/sf-sns-broadcase with modification, sf-sns-report can be used to report step function status anywhere inside a step function. So add step sf-sns-report after each pdr-status-check, we will get the PDR status progress at real time. | ||
- It's possible a execution is still in the queue and doesn't exist in sfn yet. Added code to handle 'ExecutionDoesNotExist' error when checking the execution status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor grammar fix: It's possible an execution is still in the queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
try { | ||
const execution = await aws.sfn().describeExecution({ executionArn }).promise(); | ||
executions.push(execution); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should } catch (e) {
should be on one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eslint doesn't like the one-line style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that seems inconsistent with https://github.com/airbnb/javascript#blocks--cuddled-elses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because of this override: https://github.com/cumulus-nasa/cumulus/blob/master/.eslintrc.json#L86.
Our eslintrc is requiring the "stroustrup" style for braces: https://eslint.org/docs/rules/brace-style
// it's ok if a execution is still in the queue and has not be executed | ||
if (e.code === 'ExecutionDoesNotExist') { | ||
executions.push({ executionArn: executionArn, status: 'RUNNING' }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here
} else {
throw e;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, our eslint doesn't like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the other instance:
That's because of this override: https://github.com/cumulus-nasa/cumulus/blob/master/.eslintrc.json#L86.
Our eslintrc is requiring the "stroustrup" style for braces: https://eslint.org/docs/rules/brace-style
cumulus/tasks/sf-sns-report/index.js
Outdated
} | ||
// Error and error keys are not part of the cumulus message | ||
// and if they appear in the message something is seriously wrong | ||
else if (event.Error || event.error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know an instance of this occurring? A better option might be to whitelist the top-level keys we expect from a cumulus message, otherwise this seems unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't. I don't know if removing this logic will break something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was reported in an earlier bug report and this is the fix for that bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok - do you recall if that bug was before or after cumulus message adapter integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abarciauskas-bgse brings up a very good point. When you're using the message adapter, the only keys that the simplified event is going to have are "config" and "input". This is never going to trigger.
cumulus/tasks/sf-sns-report/index.js
Outdated
* @returns {boolean} true if there was an exception, false otherwise | ||
*/ | ||
function eventFailed(event) { | ||
if (event.exception) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe flatten these nested conditionals?
if (event.exception && typeof(event.exception) === 'object' && Object.keys(event.exception.length) > 0) {
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. this is neat
cumulus/tasks/sf-sns-report/index.js
Outdated
|
||
/** | ||
* if the cumulus message shows that a previous step failed, | ||
* this function extract the error message from the cumulus message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function extracts the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cumulus/tasks/sf-sns-report/index.js
Outdated
/** | ||
* if the cumulus message shows that a previous step failed, | ||
* this function extract the error message from the cumulus message | ||
* and fail the function with that information. This ensures that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and fails the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cumulus/tasks/sf-sns-report/index.js
Outdated
* @returns {Promise} updated event object | ||
*/ | ||
function handler(event, context, callback) { | ||
return StepFunction.pullEvent(event).then((message) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap in curly brace:
return StepFunction.pullEvent(event).then((message) => {
cumulusMessageAdapter.runCumulusTask(publishSnsMessage, message, context, callback));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}); | ||
|
||
|
||
test('send report when sfn is finished and granule succeed', async (t) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send report when sfn is finished and granule has succeeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@jennyhliu you mentioned in the PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked for a small change
@@ -40,6 +40,7 @@ | |||
] | |||
}, | |||
"dependencies": { | |||
"@cumulus/api": "^1.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used any where. Please remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -0,0 +1,55 @@ | |||
# @cumulus/queue-pdrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
@abarciauskas-bgse I think we should retire that other lambda function in the api package. But retiring it will break existing deployments. so we should come up with a plan for that. |
@abarciauskas-bgse @scisco updated based on comments. |
await exports.s3().putObject(bucket, key, '', null, { executionArn, status }).promise(); | ||
const params = { Bucket: bucket, Key: key }; | ||
params.Metadata = { executionArn, status }; | ||
await exports.s3().putObject(params).promise(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed a ticket to take a look at this functionality: https://bugs.earthdata.nasa.gov/browse/CUMULUS-420
CHANGELOG.md
Outdated
@@ -17,6 +17,11 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. | |||
- `@cumulus/deployment` deploys DynamoDB streams for the Collections, Providers and Rules tables as well as a new lambda function called `dbIndexer`. The `dbIndexer` lambda has an event source mapping which listens to each of the DynamoDB streams. The dbIndexer lambda receives events referencing operations on the DynamoDB table and updates the elasticsearch cluster accordingly. | |||
- The `@cumulus/api` endpoints for collections, providers and rules _only_ query DynamoDB, with the exception of LIST endpoints and the collections' GET endpoint. | |||
|
|||
- **CUMULUS-260: "PDR page on dashboard only shows zeros."** The PDR stats in LPDAAC are all 0s, even if the dashboard has been fixed to retrieve the correct fields. The current version of pdr-status-check has a few issues. | |||
- pdr is not included in the input/output schema. It's available from the input event. So the pdr status and stats are not updated when the ParsePdr workflow is complete. Adding the pdr to the input/output of the task will fix this. | |||
- pdr-status-check doesn't update pdr stats which prevent the real time pdr progress from showing up in the dashboard. To solve this, added lambda function sf-sns-report which is copied from @cumulus/api/lambdas/sf-sns-broadcase with modification, sf-sns-report can be used to report step function status anywhere inside a step function. So add step sf-sns-report after each pdr-status-check, we will get the PDR status progress at real time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"broadcast", not "broadcase"
cumulus/tasks/sf-sns-report/index.js
Outdated
function eventFailed(event) { | ||
// event has exception | ||
// and it is needed to avoid flagging cases like "exception: {}" or "exception: 'none'" | ||
if (event.exception && (typeof event.exception === 'object') && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check for event.exception is not necessary. If it isn't set, then (typeof event.exception === 'object')
will be false. Can just be:
function eventFailed(event) {
// event has exception
// and it is needed to avoid flagging cases like "exception: {}" or "exception: 'none'"
if ((typeof event.exception === 'object') &&
(Object.keys(event.exception).length > 0)) return true;
// Error and error keys are not part of the cumulus message
// and if they appear in the message something is seriously wrong
if (event.Error || event.error) return true;
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is also formatted so it's a little easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually ``(event.exception)check is need to take care of the case
event.exception=null`, otherwise `Object.keys(event.exception)` will throw exception.
* @param {Object} event - aws event object | ||
* @returns {undefined} throws an error and does not return anything | ||
*/ | ||
function makeLambdaFunctionFail(event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little hard to follow what this function is doing. It might be easier with something like:
function buildError(type, cause) {
let ErrorClass;
if (Object.keys(errors).includes(type)) ErrorClass = errors[type];
else if (type === 'TypeError') ErrorClass = TypeError;
else ErrorClass = Error;
return new ErrorClass(cause);
}
function makeLambdaFunctionFail(event) {
const error = event.exception || event.error;
if (!error) throw new Error('Step Function failed for an unknown reason.');
throw buildError(error.Error, error.Cause);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This makes the code much clear. Sorry I didn't try to understand the code.
cumulus/tasks/sf-sns-report/index.js
Outdated
* @returns {Promise} updated event object | ||
*/ | ||
function handler(event, context, callback) { | ||
return StepFunction.pullEvent(event).then((message) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling the event is handled by the message adapter and should not be required here.
"description": "Describes the config used by the sf-sns-report task", | ||
"type": "object", | ||
"additionalProperties": false, | ||
"properties": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that all of these properties are required, so they should be marked as required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the code, they are only needed when the task is the last step of the step function and a granule state is set.
cumulus/tasks/sf-sns-report/index.js
Outdated
* schema that is passed to the next task in the workflow | ||
*/ | ||
async function publishSnsMessage(event) { | ||
const config = get(event, 'config', []); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is an Object, not an Array. That being said, message adapter will ensure that config is always set, so there's no need for a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
cumulus/tasks/sf-sns-report/index.js
Outdated
*/ | ||
async function publishSnsMessage(event) { | ||
const config = get(event, 'config', []); | ||
const message = get(event, 'input', []); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is an Object, not an Array. That being said, message adapter will ensure that config is always set, so there's no need for a default value.
cumulus/tasks/sf-sns-report/index.js
Outdated
if (granuleId) { | ||
await setGranuleStatus( | ||
granuleId, | ||
get(config, 'stack', null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of these values are not set, you're passing null
in their place. What is setGranuleStatus
going to do with a null bucket, or any of the other parameters for that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code and schema, there is no good way to prevent the user from misconfiguring the parameters.
@@ -0,0 +1,87 @@ | |||
'use strict'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these tests are verifying that the correct message was sent to SNS. They are only looking at the return value from the function. Localstack has SNS support, please update the tests to verify that the correct messages are being sent to SNS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the function to return the sns response. If the MessageId is returned, then the message is sent to SNS. There is no easy way to retrieve the message sent to SNS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included a number of comments that need to be addressed before this can be merged.
cumulus/tasks/sf-sns-report/index.js
Outdated
makeLambdaFunctionFail(message); | ||
} | ||
|
||
return message; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jennyhliu I think we should NOT return the whole message here. If the user forgets to include the outputs
key in the cumulus config, the message ends up getting larger and larger.
Could you instead just return the output of the sns aws call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code to return the output from sns response, and the README.md
@scisco @yjpa7145 made updates based on comments. please review |
cumulus/tasks/sf-sns-report/index.js
Outdated
if (granuleId) { | ||
await setGranuleStatus( | ||
granuleId, | ||
get(config, 'stack'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you aren't looking deep into an object, there is no need for get()
here. Simply saying config.stack
accomplishes the same thing. The same goes for the next few lines as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👏
}, | ||
"oneOf": [ | ||
{ | ||
"required": ["stack", "bucket", "stateMachine", "executionTime"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack, bucket, and stateMachine should always be required. Should executionTime
actually be executionName
? I don't see executionTime
anywhere in the code, but I do see executionName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sf-sns-report
task needs to have an input.json schema file defined.
@jennyhliu the input.json should accept any object. Basically whatever you pass to the task it will publish to sns. |
The concensus is that this all looks good.
Summary: Summary of changes
Addresses CUMULUS-260: PDR page on dashboard only shows zero
Currently the pdr status stays in 'running' even if the granules are processed and completed.
Changes
Test Plan
Things that should succeed before merging.
Reviewers: @scisco @laurenfrederick