-
I am working on an SDLF pipeline. The pipeline includes a dataset defined in the dataset repository, which creates a database from a folder in the staging bucket and then creates multiple tables (one table per file from that parent folder). The stage A transformation applies to the entire dataset, which includes the entire database and all its tables. If I want to apply a stage B transform to only one of the tables within the "dataset", would it be better to create a new dataset from the already existing one (if that is even possible), or should I write code in the stage B transformation file that targets the specific table that I'm interested in? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The easiest solution is to write code in the stage B transformation targeting only the table you're interested in, as you suggest. You can even edit the postupdate lambda of stage A to avoid sending the other tables to stage B. The alternative you are mentioning is definitely more work as it would likely involve:
(as an aside, we are working on making it easier to chain stages using AWS EventBridge, but there is no ETA yet) |
Beta Was this translation helpful? Give feedback.
The easiest solution is to write code in the stage B transformation targeting only the table you're interested in, as you suggest. You can even edit the postupdate lambda of stage A to avoid sending the other tables to stage B.
The alternative you are mentioning is definitely more work as it would likely involve:
(as an aside, we are working on making it easier to chain stages using AWS EventBridge, but there is no ETA yet)