Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the queue-granules task to fetch collection configs for each granule from S3 #284

Merged
merged 30 commits into from
Apr 9, 2018

Conversation

marchuffnagle
Copy link
Contributor

@marchuffnagle marchuffnagle commented Mar 30, 2018

Summary: Update the queue-granules task to fetch collection configs for each granule from S3

Addresses CUMULUS-450: Restore functionality to get correct collection for IngestGranule

Changes

  • Updated the queue-granules task to fetch collection configs for each granule from S3 instead of relying on the single collection config specified in the event config.
  • Updated the parse-pdr task to use the correct granuleIdExtraction in cases where the granules in a PDR don't all belong to the same collection.
  • Removed the deprecated findTmpTestDataDirectory function from@cumulus/common/test-utils.
  • Updated the sftp mixin to export sftpMixin instead of using a default export. This matches the mixins for the other protocols.
  • Refactored the parsePdr() function.

Test Plan

Things that should succeed before merging.

  • Unit tests
  • Have LPDAAC run a workflow using these updates
  • Update CHANGELOG
  • Run ./bin/eslint-ratchet and verify that eslint errors have not increased
    • Commit .eslint-ratchet-high-water-mark if the score has improved

@marchuffnagle marchuffnagle changed the title WIP CUMULUS-450 CUMULUS-450 Mar 30, 2018
@marchuffnagle marchuffnagle changed the title CUMULUS-450 Update the queue-granules task to fetch collection configs for each granule from S3 Mar 30, 2018
CHANGELOG.md Outdated
@@ -6,6 +6,23 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

### Changed
- The **queue-granules** task no longer takes a "collection" in its config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the ticket number to these notes please. Also this would be a breaking change, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put all these changes in bullets under one line item and call out what changes will need to be made to the workflow configuration to support this.

// get all the file specs in each group
const specs = fileGroup.objects('FILE_SPEC');
// FIXME This is a very generic error
if (specs.length === 0) throw new Error();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should at least put a message in the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And test the error somehow


if (!dataType) throw new PDRParsingError('DATA_TYPE is missing');
const files = specs.map(parseSpec.bind(null, pdrName));
const granuleId = extractGranuleId(files[0].name, granuleIdExtraction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible files would be empty here? Is that something we should check for?


// Get all the file groups
const fileGroups = parsed.objects('FILE_GROUP');
function granuleFromFileGroup(fileGroup, granuleIdExtraction, pdrName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs docs.

const Crypto = require('./crypto').DefaultProvider;
const log = require('@cumulus/common/log');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be const { log } = require('@cumulus/common')'?

return [dataType, JSON.parse(collectionConfig)];
}

async function fetchCollectionConfigs(stackName, Bucket, dataTypes) { // eslint-disable-line require-jsdoc, max-len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this function

recursivelyDeleteS3Bucket(t.context.templateBucket),
sqs().deleteQueue({ QueueUrl: t.context.event.config.queueUrl }).promise()
]);
});

function uploadCollectionConfig(testContext, dataType, collectionConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document

// Get all the file groups
const fileGroups = parsed.objects('FILE_GROUP');
function granuleFromFileGroup(fileGroup, granuleIdExtraction, pdrName) {
if (!fileGroup.get('DATA_TYPE')) throw new PDRParsingError('DATA_TYPE is missing');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this error?

Copy link
Contributor

@jennyhliu jennyhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow needs to be updated.
https://github.com/cumulus-nasa/cumulus-integration-tests/blob/master/workflows.yml
Also, in the integration test, it would be better to have a PDR which contains granules from different datatypes.


dataType = dataType.value;
const files = specs.map(parseSpec.bind(null, pdrName));
const granuleId = extractGranuleId(files[0].name, granuleIdExtraction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an error on empty files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created CUMULUS-507 to handle that.

await S3.put(process.env.internal, key, JSON.stringify(item));
const collectionConfigStore =
new CollectionConfigStore(process.env.internal, process.env.stackName);
await collectionConfigStore.put(item.name, item);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have one central piece of code that manages the storage and retrieval of collection configs in S3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants