This is the companion code for the Unskew data blog post here.
We use PySpark to process data and write it to S3 as a delta lake table. Later, we discuss how to deploy this PySpark application to EMR.
As always, please write to us with any questions, comments or improvements.