Skip to content

Loading data from different files grouped by date  #70

@OperationalFallacy

Description

@OperationalFallacy

Hi,

I'm developing a glue job that runs nightly and does some transforms for set of users (user1, user2 etc) every day.

Can glue load data from files grouped by day and user (or other identified), organized like this?

20.05.2020/user1/relation.json
20.05.2020/user1/accounts.json
20.05.2020/user1/profiles.json
20.05.2020/user2/relation.json
20.05.2020/user2/accounts.json
20.05.2020/user2/profiles.json

etc

I see that create_dynamic_frame_from_options can specify paths, but its not clear how to get data partitioned by identifier (user)? Or how to get dataframe by data type (relation, accounts etc).

df = glueContext.create_dynamic_frame_from_options("s3",
    {'paths': ["s3://bucket/20.05.2020"]},
     'recurs': true,
    format="json") \
    .toDF()

Could you give me some clues for glues data loads? :)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions