-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1834 +/- ##
==========================================
+ Coverage 93.50% 93.52% +0.02%
==========================================
Files 772 774 +2
Lines 18650 18709 +59
Branches 4807 4806 -1
==========================================
+ Hits 17438 17497 +59
Misses 1131 1131
Partials 81 81 ☔ View full report in Codecov by Sentry. |
827182f
to
cdcd1cf
Compare
Writing here so I don't forget. We'll want to update the Something like:
|
|
||
if (process.env.IS_OFFLINE) { | ||
config.endpoint = 'http://localhost:4569' | ||
config.forcePathStyle = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On MMT we set this value for both offline and deployed versions, why is it only for offline mode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of the things that was pointed out by NGAP. Having forcePathStyle = true
changes the way the AWS SDK generates S3 URL: https://s3.<region>.amazonaws.com/<bucket-name>/<key>
. But in the deployed app, CloudFront expects the S3 origin to use virtual-hosted-style URLs https://<bucket-name>.s3.<region>.amazonaws.com/<key>
.
According to AWS doc using forcePathStyle creates a conflict with CloudFront because it routes traffic based on the hostname, which must include the bucket name. Path-style URLs are incompatible with CloudFront's expected configuration.
For local where CloudFront isn't in use, forcePathStyle is needed to work to local S3.
serverless-configs/aws-functions.yml
Outdated
- http: | ||
method: post | ||
cors: ${file(./serverless-configs/${self:provider.name}-cors-configuration.yml)} | ||
path: generateNotebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We like to stick with snake case for paths, so generate-notebook
would be preferred here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our lambdas actually use underscores instead of snake-case (not sure why). But lets do generate_notebook
to be consistent with the other lambda paths
const parsedNotebook = JSON.parse(renderedNotebookString) | ||
|
||
// Generates notebook key | ||
const key = `notebook/rendered_notebook_${granuleId}.ipynb` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should tweak this file name. Users don't really know what a CMR concept id is. A timestamp would prevent collisions in the case we get downloads for the same granule with different parameters.
Something like {granule name}-sample-notebook_{timestamp}
might work well.
"source": [ | ||
"# Define the bounding area\n", | ||
"\n", | ||
"# Select the data within the bounding box applied in Earthdata Search at the time of generation.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section isnt quite right. I think we can make it a little more useful for the user.
When the user has a bounding box applied, the section should read:
# Select the data within the bounding box that was applied in Earthdata Search.
min_lon = -92.17969
min_lat = 22.19104
max_lon = -80.89453
max_lat = 31.12491
# To select data for the granule encompassing the entire globe, remove the variables above and uncomment the following variable declarations for the coordinate points.
# min_lon = -90
# min_lat = -180
# max_lon = 90
# max_lat = 180
When a custom bounding box is not applied, the section should read:
# Select the data by setting variable declarations for the coordinate points to encompass the entire globe. These values can be updated to subset the data to a different area of interest. The values can be set manually by changing the values or by setting a bounding box before generating a notebook in Earthdata Search.
min_lon = -90
min_lat = -180
max_lon = 90
max_lat = 180
"cell_type": "markdown", | ||
"id": "508dcd76-0e18-4f37-ba4f-dd0466ddc7cb", | ||
"metadata": {}, | ||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add a small section to this bit of text reading "If a bounding box is applied in Earthdata Search when generating this notebook, the bounding box coordinates will be used below."
The updated code should looks something like this:
"source": [
"## Select a subset of the data using `xarray.DataTree.sel()`\n",
"\n",
"The `xarray.DataTree.sel()` function can be used to return a new dataset which has been indexed to a specific bounding area. For large datasets, this can result in improved performance when doing analysis and plotting. \n",
"\n",
"If a bounding box is applied in Earthdata Search when generating this notebook, the bounding box coordinates will be used below. \n",
"\n",
"Find more information about `xarray.DataTree.sel()` and its parameters in the [xarray.DataTree.sel documentation](https://docs.xarray.dev/en/latest/generated/xarray.DataTree.sel.html)."
]
EDSC-4265: Fixes quotes EDSC-4265: Adds CLOUDFRONT_OAI_ID as an env variable EDSC-4265: Testing S3 env and only us-east for bucket policy EDSC-4265: Use VPC values from NGAP bucket policies EDSC-4265: Adds missing sourceVPC EDSC-4265: Adds support for generateNotebook Lambda
}) | ||
}) | ||
|
||
describe('when bounding field are is provided', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably change this to something like "when a bounding box is provided"
Overview
What is the feature?
Creating a Lambda that can dynamically create a Jupyter Notebook based on parameter passed in.
What is the Solution?
Created a Node Lambda that will dynamically generate a notebook and saves it to S3 bucket.
Testing
Endpoint:
http://localhost:3001/dev/generateNotebook
Returns a 307 with URL to download the file in response header.
Also deployed my branch to SIT and verify the functionality as expected.
Checklist