Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDSC-4265: Develop API endpoint to dynamically create jupyter notebook #1834

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dmistry1
Copy link
Contributor

@dmistry1 dmistry1 commented Nov 20, 2024

Overview

What is the feature?

Creating a Lambda that can dynamically create a Jupyter Notebook based on parameter passed in.

What is the Solution?

Created a Node Lambda that will dynamically generate a notebook and saves it to S3 bucket.

Workflow
1. Created a POST `/generateNotebook` endpoint
2. Takes `granuleId`, `boundingBox`, `variableId`, and `referrerUrl` as parameters
3. Calls CMR GraphQL to retrieve granules information
4. Generates the Jupyter Notebook using JavaScript Handlebars library
5. Saves the notebook to an S3 bucket on AWS
6. Returns a Signed URL of the bucket as a response for download

Changes:
- Implemented new Lambda function `generateNotebook`
- Added GraphQL query to fetch granule information
- Integrated Handlebars for notebook template rendering
- Set up S3 bucket for notebook storage
- Implemented signed URL generation for secure notebook access

Testing

Endpoint: http://localhost:3001/dev/generateNotebook

{
    "granuleId": "G3269187397-POCLOUD",
    "boundingBox": "-86.44922, 24.58316, -81.03516, 30.49084",
    "variableId": "V2028632042-POCLOUD",
    "referrerUrl": "https://search.earthdata.nasa.gov/search/granules?p=C1996881146-POCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&q=GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)&sb[0]=-90.5625%2C22.7481%2C-81.49219%2C30.85594&qt=2024-10-09T00%3A00%3A00.000Z%2C2024-10-10T23%3A59%3A59.999Z&tl=1729527755!3!!&lat=26.28792637835305&long=-92.8916015625&zoom=5"
}

Returns a 307 with URL to download the file in response header.

Also deployed my branch to SIT and verify the functionality as expected.

Checklist

  • I have added automated tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.48%. Comparing base (cd4f064) to head (7158539).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1834      +/-   ##
==========================================
+ Coverage   93.46%   93.48%   +0.01%     
==========================================
  Files         770      772       +2     
  Lines       18615    18670      +55     
  Branches     4800     4779      -21     
==========================================
+ Hits        17398    17453      +55     
+ Misses       1169     1168       -1     
- Partials       48       49       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@eudoroolivares2016 eudoroolivares2016 changed the title Edsc 4265 EDSC-4265: Develop API endpoint to dynamically create jupyter notebook Nov 20, 2024
serverless.yml Outdated Show resolved Hide resolved
@eudoroolivares2016
Copy link
Contributor

eudoroolivares2016 commented Nov 21, 2024

Writing here so I don't forget. We'll want to update the Deployment section in the README with the vpc values that NGAP has given us for bucket policies

Something like:
`This application requires known VPC values from NASA Internet Services to properly setup S3 bucket policies

  • Internet_Services_East_VPC
  • Internet_Services_West_VPC`

@@ -174,11 +174,60 @@ Resources:
- "s3:ListBucket"
- "s3:ListAllMyBuckets"
- "s3:GetObject"
- "s3:PutObject"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this needed after you added the S3 actions to EDSCLambdaBase?

const params = JSON.parse(body)

const {
boundingBox,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this work with other spatial types?

const { name: variableName } = variableItems[0]

// Read the Jupyter notebook template file
const templateContent = fs.readFileSync('serverless/src/generateNotebook/notebookTemplate.ipynb', 'utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we import this file like we would a JSON file? So we wouldn't have to use fs.readFileSync and just have esbuild include it in the build?


if (process.env.IS_OFFLINE) {
config.endpoint = 'http://localhost:4569'
config.forcePathStyle = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MMT we set this value for both offline and deployed versions, why is it only for offline mode here?

})

const { data: responseData } = granuleResponse
const { data } = responseData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the try/catch wrapping this logic handle if the graphql response contains an errors key?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants