Skip to content

Commit 5b43b74

Browse files
authored
feat: add lambda-ddb-mysql-etl-pipeline python service (#134)
* Contributing new Python CDK example project * feat: Add context vars & fix: README update * feat: added layers, context vars, fixed mysql exmpl & gen refactor
1 parent 61a72f8 commit 5b43b74

File tree

12 files changed

+703
-0
lines changed

12 files changed

+703
-0
lines changed
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# lambda-ddb-mysql-etl-pipeline
2+
<!--BEGIN STABILITY BANNER-->
3+
---
4+
5+
![Stability: Stable](https://img.shields.io/badge/stability-Stable-success.svg?style=for-the-badge)
6+
7+
> **This is a stable example. It should successfully build out of the box**
8+
>
9+
> This examples does is built on Construct Libraries marked "Stable" and does not have any infrastructure prerequisites to build.
10+
11+
---
12+
<!--END STABILITY BANNER-->
13+
14+
This is a CDK Python ETL Pipeline example that produces the AWS resources necessary to achieve the following:
15+
1) Dynamically deploy CDK apps to different environments.
16+
2) Make an API Request to a NASA asteroid API.
17+
3) Process and write response content to both .csv and .json files.
18+
4) Upload the files to s3.
19+
5) Trigger an s3 event for object retrieval post-put s3 object.
20+
6) Process then dynamically write to either DynamoDB or a MySQL instance.
21+
*The `__doc__` strings are verbose (overly). Please read them carefully as exceptions
22+
and considerations have been included, to provide a more comprehensive example.
23+
24+
**Please don't forget to read the 'Important Notes' section at the bottom of this README.
25+
I've also included additional links to useful documentation there as well.
26+
27+
## Project Directory Rundown
28+
`README.md` — The introductory README for this project.
29+
30+
`etl_pipeline_cdk` — A Python module directory containing the core stack code.
31+
32+
`etl_pipeline_cdk_stack.py` — A custom CDK stack construct that is the core of the CDK application.
33+
It is where we bring the core stack components together before synthesizing our Cloudformation template.
34+
35+
`requirements.txt` — Pip uses this file to install all of the dependencies for this CDK app.
36+
In this case, it contains only '-e', which tells pip to install the requirements
37+
specified in `setup.py`--I have all requirements listed.
38+
It also tells pip to run python `setup.py` develop to install the code in the `etl_pipeline_cdk` module so that it can be edited in place.
39+
40+
`setup.py` — Defines how this Python package would be constructed and what the dependencies are.
41+
42+
`lambda` — Contains all lambda handler code in the example. See `__doc__` strings for specifics.
43+
44+
`layers` — Contains the requests layer archive, created for this project.
45+
46+
## Pre-requisites
47+
#### Keys, Copy & Paste
48+
1) Submit a request for a NASA API key here (it comes quick!): https://api.nasa.gov/
49+
2) Navigate to the `etl_pipeline_cdk_stack.py` file and replace this text `<nasa_key_here>`
50+
with your NASA key that was emailed to you.**
51+
3) Navigate to the `app.py` file and replace this text `<acct_id>` with your AWS account id
52+
and `<region_id>` with the region you plan to work in--e.g. `us-west-2` for Oregon and `us-east-1` for N. Virginia.
53+
4) Via macOS cli, run this command to set `preprod` env variable: `export AWS_CDK_ENV=preprod`
54+
55+
**Yes, this is not best practice. We should be using Secrets Manager to store these keys.
56+
I have included the required code to extract those along with some commented notes in my sample of how this is achieved.
57+
Just haven't the time to "plug them in" at the moment--plus it makes this a bit easier to follow.
58+
59+
## AWS Instructions for env setup
60+
This project is set up like a standard Python project. The initialization
61+
process also creates a virtualenv within this project, stored under the .env
62+
directory. To create the virtualenv it assumes that there is a `python3`
63+
(or `python` for Windows) executable in your path with access to the `venv`
64+
package. If for any reason the automatic creation of the virtualenv fails,
65+
you can create the virtualenv manually.
66+
67+
To manually create a virtualenv on MacOS and Linux:
68+
69+
```
70+
$ python3 -m venv .env
71+
```
72+
73+
After the init process completes and the virtualenv is created, you can use the following
74+
step to activate your virtualenv.
75+
76+
```
77+
$ source .env/bin/activate
78+
```
79+
80+
If you are a Windows platform, you would activate the virtualenv like this:
81+
82+
```
83+
% .env\Scripts\activate.bat
84+
```
85+
86+
Once the virtualenv is activated, you can install the required dependencies.
87+
**I've listed all required dependencies in setup.py, thus the `-e`.
88+
89+
```
90+
$ pip install -r requirements.txt
91+
```
92+
93+
At this point you can now synthesize the CloudFormation template for this code.
94+
95+
```
96+
$ cdk synth
97+
```
98+
99+
To add additional dependencies, for example other CDK libraries, just add
100+
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
101+
command.
102+
103+
# Useful commands
104+
105+
* `cdk ls` list all stacks in the app
106+
* `cdk synth` emits the synthesized CloudFormation template
107+
* `cdk deploy` deploy this stack to your default AWS account/region
108+
* `cdk diff` compare deployed stack with current state
109+
* `cdk docs` open CDK documentation
110+
111+
# Important Notes:
112+
Destroying Resources:
113+
114+
After you are finished with this app, you can run `cdk destroy` to quickly remove the majority
115+
of the stack's resources. However, some resources will NOT automatically be destroyed and require
116+
some manual intervention. Here is a list directions of what you must do:
117+
1) S3 bucket: You must first delete all files in bucket. Changes to the current policy which forbid
118+
bucket deletion, if files are present are in development and can be found here: https://github.com/aws/aws-cdk/issues/3297
119+
2) CloudWatch Log Groups for lambda logging. Found on filter: `/aws/lambda/Etl`
120+
3) s3 CDK folder with your CloudFormation templates. Delete at your discretion.
121+
4) Your bootstrap stack asset s3 folder will have some assets in there. Delete/save at your discretion.
122+
**Don't delete the bootstrap stack, nor the s3 asset bucket, if you plan to continue using CDK.
123+
5) Both lambdas are set to run in `logging.DEBUG`, switch if too verbose. See CloudWatch logs for logs.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env python3
2+
3+
from aws_cdk import core
4+
from etl_pipeline_cdk.etl_pipeline_cdk_stack import EtlPipelineCdkStack
5+
6+
app = core.App()
7+
STAGE = app.node.try_get_context("STAGE")
8+
ENV={
9+
"region": app.node.try_get_context("REGION"),
10+
"account": app.node.try_get_context("ACCTID")
11+
}
12+
stack_name = app.node.try_get_context("stack_name")
13+
14+
EtlPipelineCdkStack(app,
15+
stack_name,
16+
env=ENV,
17+
stage=STAGE)
18+
app.synth()
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"context": {
3+
"STAGE":"preprod",
4+
"REGION":"us-east-1",
5+
"ACCTID":"<INSERTACCTID>",
6+
"NASA_KEY":"<INSERTNASAKEY>",
7+
"SCHEMA":"DB_NAME_PREPROD",
8+
"DB_SECRETS_REF":"<INSERTSECRETS_MGR_REF>",
9+
"TOPIC_ARN":"<INSERT_TOPIC_ARN>",
10+
"stack_name":"EtlPipelineCdkStackPreProd"
11+
},
12+
"app": "python3 app.py"
13+
}

python/lambda-ddb-mysql-etl-pipeline/etl_pipeline_cdk/__init__.py

Whitespace-only changes.
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
from aws_cdk import (
2+
core,
3+
aws_s3 as s3,
4+
aws_s3_notifications as s3n,
5+
aws_lambda as _lambda,
6+
aws_dynamodb as ddb,
7+
aws_events as events,
8+
aws_events_targets as targets,
9+
)
10+
11+
class EtlPipelineCdkStack(core.Stack):
12+
"""Define the custom CDK stack construct class that inherits from the cdk.Construct base class.
13+
14+
Notes:
15+
This is the meat of our stack that will be built in app.py of the CDK application.
16+
17+
Lambda inline code ex. => You can use the following in lieu of AssetCode() for inline code:
18+
with open("lambda/dbwrite.py", encoding="utf8") as fp:
19+
handler_code = fp.read()
20+
...code=_lambda.InlineCode(handler_code)...
21+
*Please consider char limits for inline code write.
22+
"""
23+
24+
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
25+
"""Invoke the base class constructor via super with the received scope, id, and props
26+
27+
Args:
28+
scope: Defines scope in which this custom construct stack is created.
29+
id (str): Defines local identity of the construct. Must be unique amongst constructs
30+
within the same scope, as it's used to formulate the cF logical id for each resource
31+
defined in this scope.
32+
kwargs: Lots of possibilities
33+
"""
34+
35+
# example of passing app.py level params to stack class
36+
self.stage=kwargs['stage']
37+
kwargs={}
38+
39+
super().__init__(scope, id, **kwargs)
40+
41+
# Resources to create
42+
s3_bucket = s3.Bucket(
43+
self, "Bucket",
44+
bucket_name=f"asteroids-{self.stage}",
45+
versioned=False,
46+
removal_policy=core.RemovalPolicy.DESTROY # NOT recommended for production code
47+
)
48+
49+
ddb_asteroids_table = ddb.Table(
50+
self, "Table",
51+
table_name="asteroids_table",
52+
partition_key={
53+
"name": "id",
54+
"type": ddb.AttributeType.STRING
55+
},
56+
removal_policy=core.RemovalPolicy.DESTROY # NOT recommended for production code
57+
)
58+
59+
# Lambdas and layers
60+
requests_layer = _lambda.LayerVersion(
61+
self, "requests",
62+
code=_lambda.AssetCode('layers/requests.zip'))
63+
pandas_layer = _lambda.LayerVersion(
64+
self, "pandas",
65+
code=_lambda.AssetCode('layers/pandas.zip'))
66+
pymysql_layer = _lambda.LayerVersion(
67+
self, "pymysql",
68+
code=_lambda.AssetCode('layers/pymysql.zip'))
69+
70+
process_asteroid_data = _lambda.Function(
71+
self, "ProcessAsteroidsLambda",
72+
runtime=_lambda.Runtime.PYTHON_3_7,
73+
code=_lambda.AssetCode("lambda"),
74+
handler="asteroids.handler",
75+
layers=[requests_layer],
76+
environment={
77+
"S3_BUCKET": s3_bucket.bucket_name,
78+
"NASA_KEY": self.node.try_get_context("NASA_KEY"),
79+
}
80+
)
81+
82+
db_write = _lambda.Function(
83+
self, "DbWriteLambda",
84+
runtime=_lambda.Runtime.PYTHON_3_7,
85+
handler="dbwrite.handler",
86+
layers=[pandas_layer, pymysql_layer],
87+
code=_lambda.Code.asset('lambda'),
88+
environment={
89+
"ASTEROIDS_TABLE": ddb_asteroids_table.table_name,
90+
"S3_BUCKET": s3_bucket.bucket_name,
91+
"SCHEMA": self.node.try_get_context("SCHEMA"),
92+
"REGION": self.node.try_get_context("REGION"),
93+
"DB_SECRETS": self.node.try_get_context("DB_SECRETS_REF"),
94+
"TOPIC_ARN": self.node.try_get_context("TOPIC_ARN")
95+
}
96+
)
97+
98+
# Rules and Events
99+
json_rule = events.Rule(
100+
self, "JSONRule",
101+
schedule=events.Schedule.cron(
102+
minute="15",
103+
hour="*",
104+
month="*",
105+
week_day="*",
106+
year="*"
107+
)
108+
)
109+
110+
csv_rule = events.Rule(
111+
self, "CSVRule",
112+
schedule=events.Schedule.cron(
113+
minute="30",
114+
hour="*",
115+
month="*",
116+
week_day="*",
117+
year="*"
118+
)
119+
)
120+
121+
# add lambda function target as well as custom trigger input to rules
122+
json_rule.add_target(
123+
targets.LambdaFunction(
124+
process_asteroid_data,
125+
event=events.RuleTargetInput.from_text("json")
126+
)
127+
)
128+
csv_rule.add_target(
129+
targets.LambdaFunction(
130+
process_asteroid_data,
131+
event=events.RuleTargetInput.from_text("csv")
132+
)
133+
)
134+
# create s3 notification for the db_write function
135+
notify_lambda = s3n.LambdaDestination(db_write)
136+
# assign 'notify_lambda' notification for 'OBJECT_CREATED' event type
137+
s3_bucket.add_event_notification(s3.EventType.OBJECT_CREATED, notify_lambda)
138+
139+
# Permissions
140+
s3_bucket.grant_read_write(process_asteroid_data)
141+
s3_bucket.grant_read_write(db_write)
142+
ddb_asteroids_table.grant_read_write_data(db_write)

0 commit comments

Comments
 (0)