Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lambda Bundles #7

Closed
7 of 10 tasks
eladb opened this issue Dec 8, 2019 · 7 comments
Closed
7 of 10 tasks

Lambda Bundles #7

eladb opened this issue Dec 8, 2019 · 7 comments
Labels
devex Developer Experience status/done Implementation complete

Comments

@eladb
Copy link
Contributor

eladb commented Dec 8, 2019

PR Champion
# @eladb

Description

  • NodeFunction, JavaFunction, PythonFunction, etc.
  • Automatic publishing of environment variables
  • Combine "grant" & publish (e.g. bucket.grantXxx(lambda) will also add BUCKET_ARN environment variable)

Progress

  • Tracking Issue Created
  • RFC PR Created
  • Core Team Member Assigned
  • Initial Approval / Final Comment Period
  • Ready For Implementation
    • implementation issue 1
  • Resolved
@eladb eladb added the devex Developer Experience label Dec 8, 2019
@MrArnoldPalmer MrArnoldPalmer added the status/proposed Newly proposed RFC label Jan 4, 2020
@eladb eladb changed the title Runtime & infrastructure code integration Lambda Bundles Feb 23, 2020
@eladb
Copy link
Contributor Author

eladb commented Feb 23, 2020

The new NodeJsFunction construct creates AWS Lambda bundles for Node.js only during synthesis.

If NodeJsFunction is used inside a 3rd-party construct library, the bundling will only happen when the app that consumes this library (directly or indirectly) is synthesized.

This means that, for example, bundling tools such as parcel or the docker image for building lambda native modules (which we wanted to introduce in #6323) will need to be installed in the environment of the top-level app.

In a sense, this is somewhat aligned with how docker assets work. We only include the source of the image in the cloud assembly, and only during publishing, we actually build the docker image.

I am wondering if perhaps the right approach is to move Lambda bundling into the publishing stage. This means that during synthesis, we will only copy the sources to the cloud assembly and we will add some hooks to the publishing stage (cdk-assets) which will allow processing these sources and producing an eventual bundle.

There is an interesting synergy related to Docker. We eventually want to bundle lambda functions inside a docker container that matches the Lambda environment, and the publishing environment supports docker.

Maybe the right, general purpose, solution is to basically treat these more like a docker asset than a .zip asset.

Copy: @rix0rrr, @jogold

@ran-isenberg
Copy link

As a developer who uses CDK, i'd rather get Lambda creation errors as soon as possible (aka cdk synth) and not wait until the cdk deploy part where it deploys other items.

To me, a solution would be that during the python lambda creation a docker container (with python dep and pipenv/poetry) will run, and pull the requirements into an output folder which can be then zipped. Each language, nodejs/python etc, will just spin up a different builder docker image but the logic can be the same overall.

@eladb
Copy link
Contributor Author

eladb commented Feb 27, 2020

Yes, I retract my proposal to do this during publishing. It won't work because you need dependencies from your project and those won't be available in the cloud assembly.

So this needs to happen either during build, before synth (and then during synth we will basically have a bundle that we can just reference as a .zip file) or it can happen during synth, but we need some way to abstract away any dependencies.

Let's examine these two options.

Before synth

In this option, the preparation of the bundle happens sometimes before we call cdk synth. This means that as far as the CDK app is concerned, the asset is just a .zip file. For example, this is how we publish the lambda bundle for the @aws-cdk/aws-s3-deployment module. The published module includes a .zip file that contains the lambda bundle as-is.

This is technically already supported, but requires that users will codify this in their library build process (see the prebuild configuration in s3-deployment's package.json and the actual lambda build script).

It's not too hard, but also not a great developer experience.

It is important to notice that when building libraries the CDK CLI is not involved at all. The CLI is only used by applications not when building and publishing libraries. This is just a normal TypeScript library.

Therefore, in this approach, we basically need to vend another command line tool that users will be able to integrate into their build system which will prepare these bundles and allow them to be referenced by the CDK library.

It won't be possible to rely configuration from the app to this new tool because the app is never executed when you build a library, so we will need some additional configuration that will be read by the CDK to identify the bundled assets.

A downside of this approach is also that the eventual library can technically be pretty big because it will include the compiled zip file with all it's dependencies, so we are not leveraging the standard dependency mechanisms.

During synth

In this approach we are basically saying that bundling only happens when the app is actually synthesized. This is similar to how NodeJsFunction works today (where parcel is only executed during synth) but we must find a way to abstract these dependencies.

One way to do that would be to always require that bundling happens inside a Docker container. This has the benefit of reducing the dependency surface area (consumers only need docker during synth) and will also allow us to actually build Lambda functions in a lambda-compatible container (like sam build), so native modules will be supported.

The main benefits of this approach:

  • Smaller library size (they contain only source, not artifacts)
  • Bundling configuration is self-contained inside the CDK code
  • No custom tools required to build libraries

Downsides:

  • Longer synth time

@jogold
Copy link

jogold commented Feb 27, 2020

I think that the CDK should offer the best possible developer experience so I would definitely go for the during synth option.

Building inside a Docker container is indeed the right solution for maximum compatibility. But the main question is what kind of build workflow will run inside the container. The worklows here https://github.com/awslabs/aws-lambda-builders/tree/develop/aws_lambda_builders/workflows (= sam build) are an excellent source of inspiration and show how complex it can be when considering all the details of each language.

As far as JavaScript/TypeScript is concerned I'm not sure that the worklow offered by aws-lambda-builders gives the best developer experience:

  • potentially large Lambda package size: all the production dependencies are installed whether used in the Lambda function's source code or not + for a project with multiple Lambda functions they all end up in each Lambda packages unless you start with complex include/exclude logic or specific directory structure with package.json files for each Lambda
  • no transpiling: a real problem for JS developers using modern syntax, less a problem for TS where tsc will almost always run before synth
  • no monorepo support: copying the package.json of a single module and running npm install simply doesn't work here

We should maybe start with a minimum set of requirements: what is important/critical? what use cases do we want to support?

@eladb
Copy link
Contributor Author

eladb commented Mar 1, 2020

I tend to agree that synth is more inline with how we want CDK experience to work. Ideally we should offer some kind of an open framework for building assets inside docker images during synthesis.

The minimal surface can be something like "run this command inside a docker image with two mounted volumes: /src with the project source tree and /asset is mounted to where the asset output should be emitted (could be a directory or a file).

Then, we can implement our parcel bundler using something like this, and also perhaps implement an additional builder that leverages sam build.

@jogold
Copy link

jogold commented Mar 3, 2020

Hey @eladb, I'm sure you've seen aws/aws-cdk#6535... I've been working further on this to come up with the best possible developer experience for JS/TS Lambda functions.

I have now a working construct offering the following API:

image

Code can be defined inside the construct and it can use any top level dependency imported in the file (another file or an external module). This offers a really great developer experience. I'm using the Typescript Compiler API to analyze the AST for this.

It can also easily support supports props like externals (= a list of module that should not be bundled like aws-sdk) and natives (= a list of modules that should be included/installed in the node_modules folder, can be done in a Lambda compatible docker image). The whole process could be dockerized.

We can discuss this further when you're available.

@jogold
Copy link

jogold commented Mar 11, 2020

@eladb can we start the discussion with this?

NodejsCodeFunction integ test

It works like this:

  1. Find the defining file
  2. Extract top level import/require statements from this file
  3. Write the top level import/require statements and the handler's code to a temporary file
  4. Collect identifiers in this new temporary file to check for unused import/require statements
  5. Remove unused top level import/requires statements in the temporary file
  6. Give this to parcel

Moreover we have the following features:

  • Support for externals (typically aws-sdk)
  • Support for includes: modules that should not be bundled but included as "real" installs in a node_modules folder in the build dir. For this, versions are extracted from the package.json and if we have a lock file (package-lock.json or yarn.lock) it is taken into account by using it and running install with the right installer (npm or yarn). The install process can optionally run in a Lambda compatible container (currenty using lambci docker images).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devex Developer Experience status/done Implementation complete
Projects
None yet
Development

No branches or pull requests

4 participants