You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently bumped in SDLF and I love the concept and the architecture. I have 3 items about the "Testing the Framework" section of the Serverless Data Lake Workshop.
The provided code in the legislators pipeline, expects input files to be a fully valid JSON files, that is an array of dictionaries, while the original example from AWS Glue documentation (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html ) deals with NDJSON, that is json objects on each line (new-line being the separator, no comma separator. In real world NDJSON is the most common case. For example, AWS Connect log files are NDJSON.
One of the input file in ./sdlf-utils/pipeline-examples/legislators/data more precisely regions.json is corrupted and I understand. this is by design. This is explained in closed issue region.json data set in legislator dataset is invalid #28. However I think this should be more explicitly explained in a README file associated with the example.
Can I do a pull request to modify the legislators example to handle NDJSON files and figure out what type of JSON is? It looks like this project is somehow inactive, I would like to be active in it and also make some additions such as more examples and a UI in the near future.
Thank you,
-Marian
The text was updated successfully, but these errors were encountered:
Dear SDLF Team,
I recently bumped in SDLF and I love the concept and the architecture. I have 3 items about the "Testing the Framework" section of the Serverless Data Lake Workshop.
The provided code in the legislators pipeline, expects input files to be a fully valid JSON files, that is an array of dictionaries, while the original example from AWS Glue documentation (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html ) deals with NDJSON, that is json objects on each line (new-line being the separator, no comma separator. In real world NDJSON is the most common case. For example, AWS Connect log files are NDJSON.
One of the input file in ./sdlf-utils/pipeline-examples/legislators/data more precisely regions.json is corrupted and I understand. this is by design. This is explained in closed issue region.json data set in legislator dataset is invalid #28. However I think this should be more explicitly explained in a README file associated with the example.
Can I do a pull request to modify the legislators example to handle NDJSON files and figure out what type of JSON is? It looks like this project is somehow inactive, I would like to be active in it and also make some additions such as more examples and a UI in the near future.
Thank you,
-Marian
The text was updated successfully, but these errors were encountered: