|
| 1 | +# lambda-ddb-mysql-etl-pipeline |
| 2 | +<!--BEGIN STABILITY BANNER--> |
| 3 | +--- |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +> **This is a stable example. It should successfully build out of the box** |
| 8 | +> |
| 9 | +> This examples does is built on Construct Libraries marked "Stable" and does not have any infrastructure prerequisites to build. |
| 10 | +
|
| 11 | +--- |
| 12 | +<!--END STABILITY BANNER--> |
| 13 | + |
| 14 | +This is a CDK Python ETL Pipeline example that produces the AWS resources necessary to achieve the following: |
| 15 | +1) Dynamically deploy CDK apps to different environments. |
| 16 | +2) Make an API Request to a NASA asteroid API. |
| 17 | +3) Process and write response content to both .csv and .json files. |
| 18 | +4) Upload the files to s3. |
| 19 | +5) Trigger an s3 event for object retrieval post-put s3 object. |
| 20 | +6) Process then dynamically write to either DynamoDB or a MySQL instance. |
| 21 | +*The `__doc__` strings are verbose (overly). Please read them carefully as exceptions |
| 22 | +and considerations have been included, to provide a more comprehensive example. |
| 23 | + |
| 24 | +**Please don't forget to read the 'Important Notes' section at the bottom of this README. |
| 25 | +I've also included additional links to useful documentation there as well. |
| 26 | + |
| 27 | +## Project Directory Rundown |
| 28 | +`README.md` — The introductory README for this project. |
| 29 | + |
| 30 | +`etl_pipeline_cdk` — A Python module directory containing the core stack code. |
| 31 | + |
| 32 | +`etl_pipeline_cdk_stack.py` — A custom CDK stack construct that is the core of the CDK application. |
| 33 | +It is where we bring the core stack components together before synthesizing our Cloudformation template. |
| 34 | + |
| 35 | +`requirements.txt` — Pip uses this file to install all of the dependencies for this CDK app. |
| 36 | +In this case, it contains only '-e', which tells pip to install the requirements |
| 37 | +specified in `setup.py`--I have all requirements listed. |
| 38 | +It also tells pip to run python `setup.py` develop to install the code in the `etl_pipeline_cdk` module so that it can be edited in place. |
| 39 | + |
| 40 | +`setup.py` — Defines how this Python package would be constructed and what the dependencies are. |
| 41 | + |
| 42 | +`lambda` — Contains all lambda handler code in the example. See `__doc__` strings for specifics. |
| 43 | + |
| 44 | +`layers` — Contains the requests layer archive, created for this project. |
| 45 | + |
| 46 | +## Pre-requisites |
| 47 | +#### Keys, Copy & Paste |
| 48 | +1) Submit a request for a NASA API key here (it comes quick!): https://api.nasa.gov/ |
| 49 | +2) Navigate to the `etl_pipeline_cdk_stack.py` file and replace this text `<nasa_key_here>` |
| 50 | +with your NASA key that was emailed to you.** |
| 51 | +3) Navigate to the `app.py` file and replace this text `<acct_id>` with your AWS account id |
| 52 | +and `<region_id>` with the region you plan to work in--e.g. `us-west-2` for Oregon and `us-east-1` for N. Virginia. |
| 53 | +4) Via macOS cli, run this command to set `preprod` env variable: `export AWS_CDK_ENV=preprod` |
| 54 | + |
| 55 | +**Yes, this is not best practice. We should be using Secrets Manager to store these keys. |
| 56 | +I have included the required code to extract those along with some commented notes in my sample of how this is achieved. |
| 57 | +Just haven't the time to "plug them in" at the moment--plus it makes this a bit easier to follow. |
| 58 | + |
| 59 | +## AWS Instructions for env setup |
| 60 | +This project is set up like a standard Python project. The initialization |
| 61 | +process also creates a virtualenv within this project, stored under the .env |
| 62 | +directory. To create the virtualenv it assumes that there is a `python3` |
| 63 | +(or `python` for Windows) executable in your path with access to the `venv` |
| 64 | +package. If for any reason the automatic creation of the virtualenv fails, |
| 65 | +you can create the virtualenv manually. |
| 66 | + |
| 67 | +To manually create a virtualenv on MacOS and Linux: |
| 68 | + |
| 69 | +``` |
| 70 | +$ python3 -m venv .env |
| 71 | +``` |
| 72 | + |
| 73 | +After the init process completes and the virtualenv is created, you can use the following |
| 74 | +step to activate your virtualenv. |
| 75 | + |
| 76 | +``` |
| 77 | +$ source .env/bin/activate |
| 78 | +``` |
| 79 | + |
| 80 | +If you are a Windows platform, you would activate the virtualenv like this: |
| 81 | + |
| 82 | +``` |
| 83 | +% .env\Scripts\activate.bat |
| 84 | +``` |
| 85 | + |
| 86 | +Once the virtualenv is activated, you can install the required dependencies. |
| 87 | +**I've listed all required dependencies in setup.py, thus the `-e`. |
| 88 | + |
| 89 | +``` |
| 90 | +$ pip install -r requirements.txt |
| 91 | +``` |
| 92 | + |
| 93 | +At this point you can now synthesize the CloudFormation template for this code. |
| 94 | + |
| 95 | +``` |
| 96 | +$ cdk synth |
| 97 | +``` |
| 98 | + |
| 99 | +To add additional dependencies, for example other CDK libraries, just add |
| 100 | +them to your `setup.py` file and rerun the `pip install -r requirements.txt` |
| 101 | +command. |
| 102 | + |
| 103 | +# Useful commands |
| 104 | + |
| 105 | + * `cdk ls` list all stacks in the app |
| 106 | + * `cdk synth` emits the synthesized CloudFormation template |
| 107 | + * `cdk deploy` deploy this stack to your default AWS account/region |
| 108 | + * `cdk diff` compare deployed stack with current state |
| 109 | + * `cdk docs` open CDK documentation |
| 110 | + |
| 111 | +# Important Notes: |
| 112 | +Destroying Resources: |
| 113 | + |
| 114 | +After you are finished with this app, you can run `cdk destroy` to quickly remove the majority |
| 115 | +of the stack's resources. However, some resources will NOT automatically be destroyed and require |
| 116 | +some manual intervention. Here is a list directions of what you must do: |
| 117 | +1) S3 bucket: You must first delete all files in bucket. Changes to the current policy which forbid |
| 118 | +bucket deletion, if files are present are in development and can be found here: https://github.com/aws/aws-cdk/issues/3297 |
| 119 | +2) CloudWatch Log Groups for lambda logging. Found on filter: `/aws/lambda/Etl` |
| 120 | +3) s3 CDK folder with your CloudFormation templates. Delete at your discretion. |
| 121 | +4) Your bootstrap stack asset s3 folder will have some assets in there. Delete/save at your discretion. |
| 122 | +**Don't delete the bootstrap stack, nor the s3 asset bucket, if you plan to continue using CDK. |
| 123 | +5) Both lambdas are set to run in `logging.DEBUG`, switch if too verbose. See CloudWatch logs for logs. |
0 commit comments