Inroduction

This project contains source code and supporting files for a serverless application that automatically extract text from scanned pdf files using AWS Textract.

Prerequisites

The following must be done before following this guide:

Setup an AWS account.
Configure the AWS CLI with user credentials.
Install AWS CLI.
AWS SAM CLI.
jq (optional).

Deployment

$sam deploy --capabilities CAPABILITY_NAMED_IAM --guided

Test

After you upload a pdf file into S3 bucket deployed, there will be a text file created automatically in the same bucket.

You can also call api gateway endpoint path /textract to get Textract result by job id. eg.

$curl  -d '{"jobId":"xxxxx2bd5ad43875edxxxx5aee29b65f273fxxxxx"}'  -H "Content-Type: application/json" https://xxxx.execute-api.ap-southeast-2.amazonaws.com/textract | jq '.'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Inroduction

Prerequisites

Deployment

Test

Files

README.md

Latest commit

History

README.md

File metadata and controls

Inroduction

Prerequisites

Deployment

Test