EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

dmistry1 · 2024-11-20T20:29:48Z

Overview

What is the feature?

Creating a Lambda that can dynamically create a Jupyter Notebook based on parameter passed in.

What is the Solution?

Created a Node Lambda that will dynamically generate a notebook and saves it to S3 bucket.

Workflow
1. Created a POST `/generateNotebook` endpoint
2. Takes `granuleId`, `boundingBox`, `variableId`, and `referrerUrl` as parameters
3. Calls CMR GraphQL to retrieve granules information
4. Generates the Jupyter Notebook using JavaScript Handlebars library
5. Saves the notebook to an S3 bucket on AWS
6. Returns a Signed URL of the bucket as a response for download

Changes:
- Implemented new Lambda function `generateNotebook`
- Added GraphQL query to fetch granule information
- Integrated Handlebars for notebook template rendering
- Set up S3 bucket for notebook storage
- Implemented signed URL generation for secure notebook access

Testing

Endpoint: http://localhost:3001/dev/generateNotebook

{
    "granuleId": "G3269187397-POCLOUD",
    "boundingBox": "-86.44922, 24.58316, -81.03516, 30.49084",
    "variableId": "V2028632042-POCLOUD",
    "referrerUrl": "https://search.earthdata.nasa.gov/search/granules?p=C1996881146-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&q=GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)&sb[0]=-90.5625%2C22.7481%2C-81.49219%2C30.85594&qt=2024-10-09T00%3A00%3A00.000Z%2C2024-10-10T23%3A59%3A59.999Z&tl=1729527755!3!!&lat=26.28792637835305&long=-92.8916015625&zoom=5"
}

Returns a 307 with URL to download the file in response header.

Also deployed my branch to SIT and verify the functionality as expected.

Checklist

I have added automated tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings

codecov · 2024-11-20T20:33:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.52%. Comparing base (c20fbfa) to head (6e5cd37).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1834      +/-   ##
==========================================
+ Coverage   93.50%   93.52%   +0.02%     
==========================================
  Files         772      774       +2     
  Lines       18650    18709      +59     
  Branches     4807     4806       -1     
==========================================
+ Hits        17438    17497      +59     
  Misses       1131     1131              
  Partials       81       81

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

serverless.yml

eudoroolivares2016 · 2024-11-21T22:27:09Z

Writing here so I don't forget. We'll want to update the Deployment section in the README with the vpc values that NGAP has given us for bucket policies

Something like:
`This application requires known VPC values from NASA Internet Services to properly setup S3 bucket policies

Internet_Services_East_VPC
Internet_Services_West_VPC`

serverless-configs/aws-resources.yml

serverless/src/generateNotebook/handler.js

macrouch · 2024-11-23T19:55:11Z

static/src/js/util/getS3Client.js

+
+  if (process.env.IS_OFFLINE) {
+    config.endpoint = 'http://localhost:4569'
+    config.forcePathStyle = true


On MMT we set this value for both offline and deployed versions, why is it only for offline mode here?

This was one of the things that was pointed out by NGAP. Having forcePathStyle = true changes the way the AWS SDK generates S3 URL: https://s3.<region>.amazonaws.com/<bucket-name>/<key>. But in the deployed app, CloudFront expects the S3 origin to use virtual-hosted-style URLs https://<bucket-name>.s3.<region>.amazonaws.com/<key>.

According to AWS doc using forcePathStyle creates a conflict with CloudFront because it routes traffic based on the hostname, which must include the bucket name. Path-style URLs are incompatible with CloudFront's expected configuration.

For local where CloudFront isn't in use, forcePathStyle is needed to work to local S3.

serverless/src/generateNotebook/handler.js

package.json

bin/deploy-bamboo.sh

serverless/src/generateNotebook/handler.js

README.md

trevorlang · 2024-11-26T14:48:09Z

serverless-configs/aws-functions.yml

+      - http:
+          method: post
+          cors: ${file(./serverless-configs/${self:provider.name}-cors-configuration.yml)}
+          path: generateNotebook


We like to stick with snake case for paths, so generate-notebook would be preferred here

Our lambdas actually use underscores instead of snake-case (not sure why). But lets do generate_notebook to be consistent with the other lambda paths

trevorlang · 2024-11-26T14:57:15Z

serverless/src/generateNotebook/handler.js

+    const parsedNotebook = JSON.parse(renderedNotebookString)
+
+    // Generates notebook key
+    const key = `notebook/rendered_notebook_${granuleId}.ipynb`


I think we should tweak this file name. Users don't really know what a CMR concept id is. A timestamp would prevent collisions in the case we get downloads for the same granule with different parameters.

Something like {granule name}-sample-notebook_{timestamp} might work well.

trevorlang · 2024-11-26T15:28:55Z

serverless/src/generateNotebook/notebookTemplate.ipynb

+   "source": [
+    "# Define the bounding area\n",
+    "\n",
+    "# Select the data within the bounding box applied in Earthdata Search at the time of generation.\n",


This section isnt quite right. I think we can make it a little more useful for the user.

When the user has a bounding box applied, the section should read:

# Select the data within the bounding box that was applied in Earthdata Search. min_lon = -92.17969 min_lat = 22.19104 max_lon = -80.89453 max_lat = 31.12491 # To select data for the granule encompassing the entire globe, remove the variables above and uncomment the following variable declarations for the coordinate points. # min_lon = -90 # min_lat = -180 # max_lon = 90 # max_lat = 180

When a custom bounding box is not applied, the section should read:

# Select the data by setting variable declarations for the coordinate points to encompass the entire globe. These values can be updated to subset the data to a different area of interest. The values can be set manually by changing the values or by setting a bounding box before generating a notebook in Earthdata Search. min_lon = -90 min_lat = -180 max_lon = 90 max_lat = 180

trevorlang · 2024-11-26T15:29:00Z

serverless/src/generateNotebook/notebookTemplate.ipynb

+   "cell_type": "markdown",
+   "id": "508dcd76-0e18-4f37-ba4f-dd0466ddc7cb",
+   "metadata": {},
+   "source": [


I want to add a small section to this bit of text reading "If a bounding box is applied in Earthdata Search when generating this notebook, the bounding box coordinates will be used below."

The updated code should looks something like this:

"source": [ "## Select a subset of the data using `xarray.DataTree.sel()`\n", "\n", "The `xarray.DataTree.sel()` function can be used to return a new dataset which has been indexed to a specific bounding area. For large datasets, this can result in improved performance when doing analysis and plotting. \n", "\n", "If a bounding box is applied in Earthdata Search when generating this notebook, the bounding box coordinates will be used below. \n", "\n", "Find more information about `xarray.DataTree.sel()` and its parameters in the [xarray.DataTree.sel documentation](https://docs.xarray.dev/en/latest/generated/xarray.DataTree.sel.html)." ]

EDSC-4265: Fixes quotes EDSC-4265: Adds CLOUDFRONT_OAI_ID as an env variable EDSC-4265: Testing S3 env and only us-east for bucket policy EDSC-4265: Use VPC values from NGAP bucket policies EDSC-4265: Adds missing sourceVPC EDSC-4265: Adds support for generateNotebook Lambda

trevorlang · 2024-11-26T19:15:53Z

serverless/src/generateNotebook/__tests__/handler.test.js

+    })
+  })
+
+  describe('when bounding field are is provided', () => {


Should probably change this to something like "when a bounding box is provided"

eudoroolivares2016 changed the title ~~Edsc 4265~~ EDSC-4265: Develop API endpoint to dynamically create jupyter notebook Nov 20, 2024

eudoroolivares2016 reviewed Nov 20, 2024

View reviewed changes

serverless.yml Outdated Show resolved Hide resolved

dmistry1 requested review from macrouch, trevorlang, bnp26, dpesall, daniel-zamora and eudoroolivares2016 November 21, 2024 19:44

dmistry1 marked this pull request as ready for review November 21, 2024 19:45

dmistry1 force-pushed the EDSC-4265 branch 3 times, most recently from 827182f to cdcd1cf Compare November 21, 2024 20:43

macrouch reviewed Nov 23, 2024

View reviewed changes

dmistry1 force-pushed the EDSC-4265 branch from b651517 to 0939fd7 Compare November 25, 2024 19:08

eudoroolivares2016 reviewed Nov 25, 2024

View reviewed changes

package.json Outdated Show resolved Hide resolved

macrouch reviewed Nov 25, 2024

View reviewed changes

bin/deploy-bamboo.sh Outdated Show resolved Hide resolved

bin/deploy-bamboo.sh Outdated Show resolved Hide resolved

serverless/src/generateNotebook/handler.js Show resolved Hide resolved

dmistry1 force-pushed the EDSC-4265 branch from 2d934ef to 83de855 Compare November 25, 2024 21:32

eudoroolivares2016 reviewed Nov 25, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

macrouch approved these changes Nov 25, 2024

View reviewed changes

trevorlang requested changes Nov 26, 2024

View reviewed changes

dmistry1 requested a review from trevorlang November 26, 2024 16:44

eudoroolivares2016 approved these changes Nov 26, 2024

View reviewed changes

dmistry1 added 6 commits November 26, 2024 13:38

EDSC-4265: Fixes tests and remove console.log()

c2ea636

EDSC-4265: Removes local test --coverage=false

58b6a6b

EDSC-4265: Adds Internet_Services_East_VPC

c5eef09

EDSC-4265: Adds missing test coverage and addresses PR feedback

1e4518e

EDSC-4265: Adds a troubleshooting section to the notebook template

f5771a0

dmistry1 added 2 commits November 26, 2024 13:38

EDSC-4265: Update readme with INTERNET_SERVICE_EAST_VPC

5b528c6

EDSC-4265: Addresses PR feedback

23a1dc3

dmistry1 force-pushed the EDSC-4265 branch from a13ea69 to 23a1dc3 Compare November 26, 2024 18:38

dmistry1 added 2 commits November 26, 2024 13:47

EDSC-4265: Testing S3 dir structure

b78d6f0

EDSC-4265: Updates tests with ShortName mocks

8686403

trevorlang reviewed Nov 26, 2024

View reviewed changes

EDSC-4265: Adds console log with S3 bucket UUID and updates test title

6e5cd37

trevorlang approved these changes Nov 26, 2024

View reviewed changes

dmistry1 merged commit 2d59b50 into main Nov 26, 2024
11 checks passed

dmistry1 deleted the EDSC-4265 branch November 26, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

dmistry1 commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

eudoroolivares2016 commented Nov 21, 2024 •

edited

Loading

macrouch Nov 23, 2024

dmistry1 Nov 25, 2024

trevorlang Nov 26, 2024

macrouch Nov 26, 2024

trevorlang Nov 26, 2024 •

edited

Loading

trevorlang Nov 26, 2024 •

edited

Loading

trevorlang Nov 26, 2024

trevorlang Nov 26, 2024

EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

Conversation

dmistry1 commented Nov 20, 2024 • edited Loading

Overview

What is the feature?

What is the Solution?

Testing

Checklist

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

eudoroolivares2016 commented Nov 21, 2024 • edited Loading

macrouch Nov 23, 2024

Choose a reason for hiding this comment

dmistry1 Nov 25, 2024

Choose a reason for hiding this comment

trevorlang Nov 26, 2024

Choose a reason for hiding this comment

macrouch Nov 26, 2024

Choose a reason for hiding this comment

trevorlang Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

trevorlang Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

trevorlang Nov 26, 2024

Choose a reason for hiding this comment

trevorlang Nov 26, 2024

Choose a reason for hiding this comment

dmistry1 commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

eudoroolivares2016 commented Nov 21, 2024 •

edited

Loading

trevorlang Nov 26, 2024 •

edited

Loading

trevorlang Nov 26, 2024 •

edited

Loading