Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small change to legislators to support NDJSON #68

Open
mariandumitrascu opened this issue Jun 8, 2022 · 0 comments
Open

small change to legislators to support NDJSON #68

mariandumitrascu opened this issue Jun 8, 2022 · 0 comments
Assignees
Labels
question Further information is requested

Comments

@mariandumitrascu
Copy link
Contributor

mariandumitrascu commented Jun 8, 2022

Dear SDLF Team,

I recently bumped in SDLF and I love the concept and the architecture. I have 3 items about the "Testing the Framework" section of the Serverless Data Lake Workshop.

  1. The provided code in the legislators pipeline, expects input files to be a fully valid JSON files, that is an array of dictionaries, while the original example from AWS Glue documentation (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html ) deals with NDJSON, that is json objects on each line (new-line being the separator, no comma separator. In real world NDJSON is the most common case. For example, AWS Connect log files are NDJSON.

  2. One of the input file in ./sdlf-utils/pipeline-examples/legislators/data more precisely regions.json is corrupted and I understand. this is by design. This is explained in closed issue region.json data set in legislator dataset is invalid #28. However I think this should be more explicitly explained in a README file associated with the example.

  3. Can I do a pull request to modify the legislators example to handle NDJSON files and figure out what type of JSON is? It looks like this project is somehow inactive, I would like to be active in it and also make some additions such as more examples and a UI in the near future.

Thank you,
-Marian

@mariandumitrascu mariandumitrascu added the question Further information is requested label Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants