Skip to content

Best practices for chaining transformations #122

Answered by cnfait
ystoneman asked this question in Q&A
Discussion options

You must be logged in to vote

The easiest solution is to write code in the stage B transformation targeting only the table you're interested in, as you suggest. You can even edit the postupdate lambda of stage A to avoid sending the other tables to stage B.

The alternative you are mentioning is definitely more work as it would likely involve:

  • declaring a new dataset for the table
  • copying the file from the stage bucket (output from stage A) back into the raw bucket under the prefix of the new dataset
  • creating a new pipeline with a single stage looking like stage B.

(as an aside, we are working on making it easier to chain stages using AWS EventBridge, but there is no ETA yet)

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ystoneman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants