-
To add an AWS Glue job instead of an AWS Lambda function to either Stage A or Stage B of my SDLF pipeline, would the Glue code go in the datalakelibrary repository or somewhere else? And where would the best place for the Glue job's CloudFormation yaml go (pipeline, dataset, stage a/b, etc.)? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
I would advise against storing the Glue job code in the datalakeLibrary repository. The content of this repository is built into a Lambda layer intended to be used by Lambda functions part of Stage A and Stage B. The code of the Glue job itself is of no use to them and will only make the layer bigger, which can be an issue. What usually happens is storing the Glue job code with the Glue job's CloudFormation template inside a repository distinct from all the SDLF repositories. This repository can be called There are alternatives to that although they're not perfect:
|
Beta Was this translation helpful? Give feedback.
-
Also check out this example under sdlf-utils which focuses on glue job deployment: https://github.com/awslabs/aws-serverless-data-lake-framework/tree/main/sdlf-utils/pipeline-examples/glue-jobs-deployer |
Beta Was this translation helpful? Give feedback.
I would advise against storing the Glue job code in the datalakeLibrary repository. The content of this repository is built into a Lambda layer intended to be used by Lambda functions part of Stage A and Stage B. The code of the Glue job itself is of no use to them and will only make the layer bigger, which can be an issue.
What usually happens is storing the Glue job code with the Glue job's CloudFormation template inside a repository distinct from all the SDLF repositories. This repository can be called
sdlf-transforms
for example. Stage B (using the datalakeLibrary) runs a Glue job using a specific name so that's the only thing you need to be careful about.There are alternatives to th…