Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the queue-granules task to fetch collection configs for each granule from S3 #284

Merged
merged 30 commits into from
Apr 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
54a9a1a
Refactored parse-pdr
Mar 29, 2018
0b91ef4
Update sftp mixin to remove default export
Mar 29, 2018
ac3504a
Remove unused findTmpTestDataDirectory function from test-utils
Mar 29, 2018
1363fed
Add a simple parse-pdr test
Mar 29, 2018
cca6efc
Merge branch 'master' into CUMULUS-450
Mar 29, 2018
8994bad
CUMULUS-450 Fix incorrect collection configs in queue-granules
Mar 30, 2018
70c75fb
Updated changelog
Mar 30, 2018
a56240e
Ratchet-down eslint score
Mar 30, 2018
fdbd682
Updates based on PR feedback
Mar 30, 2018
e4d9842
Added a reminder to document the granuleFromFileGroup function
Mar 30, 2018
ab21eb8
Ratchet-down eslint score
Mar 30, 2018
267681a
Create 1.2.1-alpha.1 release of ingest and queue-granules
Apr 2, 2018
2e68f00
Merge branch 'master' of https://github.com/cumulus-nasa/cumulus
Apr 2, 2018
731c205
Merge branch 'master' into CUMULUS-450
Apr 2, 2018
d49a7ad
ParsePdr task uses correct granuleExtractionId if multiple dataTypes …
Apr 4, 2018
86a9180
Merge branch 'master' into CUMULUS-450
Apr 4, 2018
74449e9
Add comments to collection-config-store
Apr 5, 2018
5aa0aaa
Revert index reorganization
Apr 5, 2018
ef39309
Add comments to collection-config-store tests
Apr 5, 2018
23c3cd3
Revert import order changes
Apr 5, 2018
27b9844
Updated changelog
Apr 5, 2018
299a4ea
Revert PDR import order changes
Apr 5, 2018
d758e44
Revert SFTP import order changes
Apr 5, 2018
bcb3798
Revert queue-granules array order
Apr 5, 2018
b0c67d5
Update eslint ratchet score
Apr 5, 2018
d599326
Fix e2e tests
Apr 5, 2018
1ef5f0b
Updates to support cumulus integration tests
Apr 6, 2018
7e90148
Updates to get integration tests working
Apr 9, 2018
ac8d9e1
Merge branch 'master' into CUMULUS-450
Apr 9, 2018
8dcaecd
Merge branch 'master' into CUMULUS-450
Apr 9, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .eslint-ratchet-high-water-mark
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1306
1145
14 changes: 3 additions & 11 deletions .eslintrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,7 @@
"generator-star-spacing": "off",
"import/no-extraneous-dependencies": "off",
"import/newline-after-import": "off",
"no-warning-comments": [
2,
{
"terms": [
"TODO",
"fixme"
],
"location": "anywhere"
}
],
"no-warning-comments": "off",
"no-unused-vars": [
"error",
{ "argsIgnorePattern": "^_" }
Expand Down Expand Up @@ -97,6 +88,7 @@
"ignorePattern": "(https?:|JSON\\.parse|[Uu]rl =)"
}
],
"arrow-parens": ["error", "always"]
"arrow-parens": ["error", "always"],
"prefer-destructuring": "off"
}
}
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,35 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- fixed pdr-status-check schema, the failed execution contains arn and reason
- **CUMULUS-206** make sure homepage and repository urls exist in package.json files of tasks and packages

### Changed
- [CUMULUS-450](https://bugs.earthdata.nasa.gov/browse/CUMULUS-450) - Updated
the config schema of the **queue-granules** task
- The config no longer takes a "collection" property
- The config now takes an "internalBucket" property
- The config now takes a "stackName" property
- [CUMULUS-450](https://bugs.earthdata.nasa.gov/browse/CUMULUS-450) - Updated
the config schema of the **parse-pdr** task
- The config no longer takes a "buckets" property
- The config no longer takes a "collection" property
- The config now takes an "internalBucket" property
- The "stack", "provider", and "internalBucket" config properties are now
required

### Removed
- Removed the `findTmpTestDataDirectory()` function from
`@cumulus/common/test-utils`

### Fixed
- [CUMULUS-450](https://bugs.earthdata.nasa.gov/browse/CUMULUS-450)
- The **queue-granules** task now enqueues a **sync-granule** task with the
correct collection config for that granule based on the granule's
data-type. It had previously been using the collection config from the
config of the **queue-granules** task, which was a problem if the granules
being queued belonged to different data-types.
- The **parse-pdr** task now handles the case where a PDR contains granules
with different data types, and uses the correct granuleIdExtraction for
each granule.

### Added
- **CUMULUS-448** Add code coverage checking using [nyc](https://github.com/istanbuljs/nyc).
- **CUMULUS-469** Added a lambda to the API package to prototype creating an S3 bucket policy for direct, in-region S3 access for the prototype bucket
Expand Down
7 changes: 4 additions & 3 deletions packages/api/models/collections.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
'use strict';

const { CollectionConfigStore } = require('@cumulus/common');
const { S3 } = require('@cumulus/ingest/aws');
const Manager = require('./base');
const collectionSchema = require('./schemas').collection;
Expand Down Expand Up @@ -44,9 +45,9 @@ class Collection extends Manager {
}

async create(item) {
// write the record to S3
const key = `${process.env.stackName}/collections/${item.name}.json`;
await S3.put(process.env.internal, key, JSON.stringify(item));
const collectionConfigStore =
new CollectionConfigStore(process.env.internal, process.env.stackName);
await collectionConfigStore.put(item.name, item);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have one central piece of code that manages the storage and retrieval of collection configs in S3.


return super.create(item);
}
Expand Down
87 changes: 87 additions & 0 deletions packages/common/collection-config-store.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
'use strict';

const { s3 } = require('./aws');

/**
* Store and retrieve collection configs in S3
*/
class CollectionConfigStore {
/**
* Initialize a CollectionConfigFetcher instance
*
* @param {string} bucket - the bucket where collection configs are stored
* @param {string} stackName - the Cumulus deployment stack name
*/
constructor(bucket, stackName) {
this.bucket = bucket;
this.stackName = stackName;
this.cache = {};
}

/**
* Fetch a collection config from S3 (or cache if available)
*
* @param {string} dataType - the name of the collection config to fetch
* @returns {Object} the fetched collection config
*/
async get(dataType) {
// Check to see if the collection config has already been cached
if (!this.cache[dataType]) {
let response;
try {
// Attempt to fetch the collection config from S3
response = await s3().getObject({
Bucket: this.bucket,
Key: this.configKey(dataType)
}).promise();
}
catch (err) {
if (err.code === 'NoSuchKey') {
throw new Error(`A collection config for data type "${dataType}" was not found.`);
}

if (err.code === 'NoSuchBucket') {
throw new Error(`Collection config bucket does not exist: ${this.bucket}`);
}

throw err;
}

// Store the fetched collection config to the cache
this.cache[dataType] = JSON.parse(response.Body.toString());
}

return this.cache[dataType];
}

/**
* Store a collection config to S3
*
* @param {string} dataType - the name of the collection config to store
* @param {Object} config - the collection config to store
* @returns {Promise<null>} resolves when the collection config has been written
* to S3
*/
async put(dataType, config) {
this.cache[dataType] = config;

return s3().putObject({
Bucket: this.bucket,
Key: `${this.stackName}/collections/${dataType}.json`,
Body: JSON.stringify(config)
}).promise().then(() => null); // Don't leak implementation details to the caller
}

/**
* Return the S3 key pointing to the collection config
*
* @param {string} dataType - the datatype
* @returns {string} the S3 key where the collection config is located
*
* @private
*/
configKey(dataType) {
return `${this.stackName}/collections/${dataType}.json`;
}
}
module.exports = CollectionConfigStore;
1 change: 1 addition & 0 deletions packages/common/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
exports.log = require('./log');
exports.aws = require('./aws');
exports.task = require('./task');
exports.CollectionConfigStore = require('./collection-config-store');
4 changes: 2 additions & 2 deletions packages/common/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"url": "https://github.com/cumulus-nasa/cumulus"
},
"scripts": {
"build": "true",
"lint": "eslint .",
"mocha": "mocha-webpack test/**/*.js",
"mocha-coverage": "nyc mocha-webpack test/**/*.js",
Expand Down Expand Up @@ -58,7 +59,7 @@
"ajv-cli": "^1.1.1",
"async": "^2.0.0",
"aws-sdk": "^2.4.11",
"babel-core": "^6.13.2",
"babel-core": "^6.26.0",
"babel-eslint": "^8.2.2",
"babel-polyfill": "^6.13.0",
"babel-preset-env": "^1.6.1",
Expand Down Expand Up @@ -87,7 +88,6 @@
"webpack-node-externals": "^1.5.4"
},
"devDependencies": {
"ajv": "^5.5.2",
"ava": "^0.25.0",
"nyc": "^11.6.0"
}
Expand Down
11 changes: 0 additions & 11 deletions packages/common/test-utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -177,17 +177,6 @@ async function findGitRepoRootDirectory(dirname) {
}
exports.findGitRepoRootDirectory = findGitRepoRootDirectory;

/**
* Determine the path of the .tmp-test-data directory
*
* @returns {Promise.<string>} - the filesystem path of the .tmp-test-data directory
*/
function findTmpTestDataDirectory() {
return exports.findGitRepoRootDirectory(process.cwd())
.then((gitRepoRoot) => path.join(gitRepoRoot, '.tmp-test-data'));
}
exports.findTmpTestDataDirectory = findTmpTestDataDirectory;

/**
* Determine the path of the packages/test-data directory
*
Expand Down
5 changes: 5 additions & 0 deletions packages/common/tests/.eslintrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"rules": {
"no-param-reassign": "off"
}
}
115 changes: 115 additions & 0 deletions packages/common/tests/collection-config-store.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
'use strict';

const test = require('ava');
const { recursivelyDeleteS3Bucket, s3 } = require('../aws');
const { randomString } = require('../test-utils');
const CollectionConfigStore = require('../collection-config-store');

test.beforeEach(async (t) => {
t.context.stackName = randomString();
t.context.dataType = randomString();
t.context.collectionConfig = { name: randomString() };

t.context.bucket = randomString();
await s3().createBucket({ Bucket: t.context.bucket }).promise();

// Utility function to return the S3 key of a collection config
t.context.collectionConfigKey = (dataType) =>
`${t.context.stackName}/collections/${dataType}.json`;
});

test.afterEach(async (t) => {
try {
await recursivelyDeleteS3Bucket(t.context.bucket);
}
catch (err) {
// Some tests delete the bucket before this "afterEach" hook is run
if (err.code !== 'NoSuchBucket') throw err;
}
});

test('get() fetches a collection config from S3', async (t) => {
await s3().putObject({
Bucket: t.context.bucket,
Key: t.context.collectionConfigKey(t.context.dataType),
Body: JSON.stringify(t.context.collectionConfig)
}).promise();

const collectionConfigStore = new CollectionConfigStore(t.context.bucket, t.context.stackName);
const fetchedCollectionConfig = await collectionConfigStore.get(t.context.dataType);

t.deepEqual(fetchedCollectionConfig, t.context.collectionConfig);
});

test('get() does not hit S3 for a cached collection config', async (t) => {
await s3().putObject({
Bucket: t.context.bucket,
Key: t.context.collectionConfigKey(t.context.dataType),
Body: JSON.stringify(t.context.collectionConfig)
}).promise();

const collectionConfigStore = new CollectionConfigStore(t.context.bucket, t.context.stackName);

// Fetch the collection config once so it's in the cache
await collectionConfigStore.get(t.context.dataType);

// Delete the S3 bucket so the config can't be fetched from S3
await recursivelyDeleteS3Bucket(t.context.bucket);

// This get() should use the cache
const fetchedCollectionConfig = await collectionConfigStore.get(t.context.dataType);

t.deepEqual(fetchedCollectionConfig, t.context.collectionConfig);
});

test('get() throws an exception if the collection config could not be found', async (t) => {
const invalidDataType = randomString();
const collectionConfigStore = new CollectionConfigStore(t.context.bucket, t.context.stackName);

try {
await collectionConfigStore.get(invalidDataType);
t.fail('Expected an error to be thrown');
}
catch (err) {
t.is(err.message, `A collection config for data type "${invalidDataType}" was not found.`);
}
});

test('get() throws an exception if the bucket does not exist', async (t) => {
const invalidBucket = randomString();
const collectionConfigStore = new CollectionConfigStore(invalidBucket, t.context.stackName);

try {
await collectionConfigStore.get(t.context.dataType);
t.fail('Expected an error to be thrown');
}
catch (err) {
t.is(err.message, `Collection config bucket does not exist: ${invalidBucket}`);
}
});

test('put() stores a collection config to S3', async (t) => {
const collectionConfigStore = new CollectionConfigStore(t.context.bucket, t.context.stackName);
await collectionConfigStore.put(t.context.dataType, t.context.collectionConfig);

const getObjectResponse = await s3().getObject({
Bucket: t.context.bucket,
Key: t.context.collectionConfigKey(t.context.dataType)
}).promise();

const storedCollectionConfig = JSON.parse(getObjectResponse.Body.toString());
t.deepEqual(storedCollectionConfig, t.context.collectionConfig);
});

test('put() updates the cache with the new collection config', async (t) => {
const collectionConfigStore = new CollectionConfigStore(t.context.bucket, t.context.stackName);
await collectionConfigStore.put(t.context.dataType, t.context.collectionConfig);

// Delete the S3 bucket so the config can't be fetched from S3
await recursivelyDeleteS3Bucket(t.context.bucket);

// This get() should use the cache
const fetchedCollectionConfig = await collectionConfigStore.get(t.context.dataType);

t.deepEqual(fetchedCollectionConfig, t.context.collectionConfig);
});
8 changes: 4 additions & 4 deletions packages/ingest/granule.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ const urljoin = require('url-join');
const cksum = require('cksum');
const checksum = require('checksum');
const errors = require('@cumulus/common/errors');
const sftpMixin = require('./sftp');
const ftpMixin = require('./ftp').ftpMixin;
const httpMixin = require('./http').httpMixin;
const s3Mixin = require('./s3').s3Mixin;
const { sftpMixin } = require('./sftp');
const { ftpMixin } = require('./ftp');
const { httpMixin } = require('./http');
const { s3Mixin } = require('./s3');
const { baseProtocol } = require('./protocol');

class Discover {
Expand Down
Loading