-
Notifications
You must be signed in to change notification settings - Fork 835
Closed
Description
Hi,
I'm developing a glue job that runs nightly and does some transforms for set of users (user1, user2 etc) every day.
Can glue load data from files grouped by day and user (or other identified), organized like this?
20.05.2020/user1/relation.json
20.05.2020/user1/accounts.json
20.05.2020/user1/profiles.json
20.05.2020/user2/relation.json
20.05.2020/user2/accounts.json
20.05.2020/user2/profiles.json
etc
I see that create_dynamic_frame_from_options can specify paths, but its not clear how to get data partitioned by identifier (user)? Or how to get dataframe by data type (relation, accounts etc).
df = glueContext.create_dynamic_frame_from_options("s3",
{'paths': ["s3://bucket/20.05.2020"]},
'recurs': true,
format="json") \
.toDF()
Could you give me some clues for glues data loads? :)
Thanks!
Metadata
Metadata
Assignees
Labels
No labels