- installation_steps contains the different shell commands needed to install the different softwares.
- After the installation, the database & table can be created through the DDL statements present in db_table_creation.
- Once the table is created, random_data_generation.py should be run to generate & populate random transactions data in the table. A small subset of this data is also exported in the file transactions.csv (just to get a visual understanding of how the randomly generated data looks like).
- After setting up hadoop & sqoop (instructions in installation_steps), sqoop_job generates the desired job for pushing incremental updates to HDFS.
- pyspark_job.py contains the spark code for generating the user level metrics.
-
Notifications
You must be signed in to change notification settings - Fork 0
prakkashm/data-mingling
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Brushing up my Data Engineering skills!
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published