Puppeteer/Node Headless Chromium Lambda Layer

Purpose

This repository provides an AWS Lambda layer that can run Puppeteer in a headless chromium browser, a utility library and a sample Lambda with an AWS CloudFormation template.

Requirements

The AWS CLI
The AWS Serverless Application Model must be installed
Node 12.x

Building

To build the layer and the sample crawler run the following:

sam build --template-file cloudformation-template.yml  
sam deploy --template-file .aws-sam/build/template.yaml --stack-name lambda-crawler3 --resolve-s3 --capabilities "CAPABILITY_NAMED_IAM"

Usage

Using the included Crawler library

const crawler = require("crawler")

exports.handler = async (event, context, callback) => {

  try {
    await crawler(process.env.URLS.split(","),(data) => {
        // <Your code here>
        
    });
    return;
  } catch (err) {
    console.log(err);
    throw err;
  }
};

The crawler function takes two parameters:

an array of URLs to crawl. The pages can either be regular web pages or PDF files.
a callback function with one parameter -- data

data will contain five properties after the page is loaded:

id - a short hash of the url.
url - the url of the page that was crawled
title - the page title for HTML or the name of the PDF
content - the content of the page
page - the Puppeteer page object

CrawlPage sample Lambda

The CrawlPage sample Lambda simply crawls the pages specified by the URLS environment variable and saves the content to the local filesystem on a schedule using CloudWatch Events.

Using the Puppeteer module directly

You can use Puppeteer directly by adding a require statement at the beginning of your code:

const puppeteer = require("puppeteer")

See the official web site for usage.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
cloudformation-template.yml		cloudformation-template.yml
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Puppeteer/Node Headless Chromium Lambda Layer

Purpose

Requirements

Building

Usage

Using the included Crawler library

CrawlPage sample Lambda

Using the Puppeteer module directly

About

Releases 1

Packages

Languages

License

aws-samples/aws-lambda-layer-node-puppeteer-headless-chromium

Folders and files

Latest commit

History

Repository files navigation

Puppeteer/Node Headless Chromium Lambda Layer

Purpose

Requirements

Building

Usage

Using the included Crawler library

CrawlPage sample Lambda

Using the Puppeteer module directly

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages