![Screenshot 2024-01-26 at 12 50 44 AM](https://private-user-images.githubusercontent.com/56320349/299783024-a2997131-ce46-4a3b-b20d-e5d08a7e4c6e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAzODk4ODQsIm5iZiI6MTcyMDM4OTU4NCwicGF0aCI6Ii81NjMyMDM0OS8yOTk3ODMwMjQtYTI5OTcxMzEtY2U0Ni00YTNiLWIyMGQtZTVkMDhhN2U0YzZlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA3VDIxNTk0NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI1YjJiZDA5MjQ3NWIxOGRkZGFjZDRkOTg3YjllZmRmY2M1NmRhYmYwY2YwMzI5MzVkOWIzNmYyNmQ0YThiZmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.M4NtwFRvWgBFCtrAPeJvBpalzHCWzJPSbcKQpqQzkf8)
![Screenshot 2024-01-26 at 12 53 23 AM](https://private-user-images.githubusercontent.com/56320349/299783589-97ae6570-851c-472c-b545-e3e633b2ad61.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAzODk4ODQsIm5iZiI6MTcyMDM4OTU4NCwicGF0aCI6Ii81NjMyMDM0OS8yOTk3ODM1ODktOTdhZTY1NzAtODUxYy00NzJjLWI1NDUtZTNlNjMzYjJhZDYxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA3VDIxNTk0NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNjN2Y0ODFjOWU1NDUxM2YwMDU3YzVlNWEzOTQyMmUyMWU5MjViNmIxOGVkZTBjYTg0YWY5YTU5MGViMzcwOWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.SY2H6CRavRtKoDnEub6diJcursE6kQREh04ufUfFOws)
- Designing Data-Intensive Applications
- Fundamentals of Data Engineering
- The Data Warehouse Toolkit
- Cracking the Data Engineering Interview
- Data Engineering with Python
- Data Pipelines with Apache Airflow
- The Data Warehouse Toolkit
- Big Data: Principles and Best Practices of Scalable Real-Time Data Systems
- Designing Data-Intensive Applications
-
Basic Skills:
Linux
,Git & GitHub
,Computer Networking
,Cloud Computing
,Network & Security
,Agile Development
-
Advanced Skills (Good to Know):
Data Lake & Data WareHouse Concepts
,REST APIs
,Databases(SQL & NoSQL)
-
Programming Languages:
Python
,SQL
,Java
,Scala
-
Databases:
PostgreSQL
,MongoDB
,Neo4j
,Redis
,Cassandra
,Apache HBase
,Snowflake
,InfluxDB
-
Data Ingestion:
Apache Kafka
,Flume
,Logstash
,Airbyte
,Apache Spark
,Talend
,Informatica
-
Data Tranformation:
Python
,Pandas
,SQL
,Apache Spark
,Hive
,dbt
,Matillion
,Pig
-
Data Preprocessing:
Apache Spark
,Apache Hadoop
,Apache Flink
-
Data Orchestration:
Apache Airflow
,Luigi
-
Data Storage:
Data Lake
: AWS S3, Azure Blob Storage, Google Cloud Storage,Data Warehouse
: Snowflake, Google BigQuery, Amazon Redshift, Apache Hive -
Data Visualization:
Tableau
,PowerBI
,Looker
-
DataOps:
Docker
,Kubernetes
,Jenkins
- 🐍 Python,
- 📊 SQL,
- 🛠️ MySQL,
- 🌳 MongoDB,
- 🔥 PySpark,
- 🎈 Bash,
- 🌬️ Airflow,
- ☕ Apache Kafka,
- 🐙 Git,
- 🐈 GitHub,
- ⚙️ CICD basics,
- 🏬 Data Warehousing,
- 🛠️ DBT,
- 🌊 Data Lakes,
- 📘 DataBricks,
- ☁️ Azure Databricks,
- ❄️ Snowflake,
- 🌪️ Apache NiFi,
- 🌐 Debezium
-
Master Python: https://lnkd.in/d-pZPyf5
-
Learn SQL: https://lnkd.in/dzAiRF-x
-
Get hands-on with MySQL: https://lnkd.in/ddpSkUhc
-
Dive into MongoDB: https://lnkd.in/dHQ4VC2E
-
Master PySpark: https://lnkd.in/d7fgs7dE
-
Discover Bash, Airflow & Kafka: https://lnkd.in/dDhuEqQE
-
Master Git & GitHub: https://lnkd.in/dqJ7J3kN
-
Understand CICD basics: https://lnkd.in/dcfKBmCa
-
Decode Data Warehousing: https://lnkd.in/dPVRDJT5
-
Learn DBT: https://lnkd.in/eG9eaEuE
-
Understand Data Lakes: https://lnkd.in/dtZKJ4d6
-
Explore DataBricks: https://lnkd.in/dCBiQXPR
-
Learn Azure Databricks: https://lnkd.in/dzmwBs4Y
-
Master Snowflake: https://lnkd.in/dDBeddVy
-
Explore Apache NiFi: https://lnkd.in/de7bvnSt
Tools | Link | Used for | Official Docs | Youtube |
---|---|---|---|---|
DBMS | - MySQL - MongoDB | |||
SQL | https://lnkd.in/dzAiRF-x | |||
Python | https://lnkd.in/d-pZPyf5 | |||
Linux | ||||
Data Warehouse & Lake Concepts | - Data Warehouse - Data Lakes | |||
Data Pipelines | ||||
DBT | https://lnkd.in/eG9eaEuE | |||
PySpark | https://lnkd.in/d7fgs7dE | |||
Kafka | ||||
Apache Nifi | https://lnkd.in/de7bvnSt | |||
Airflow | ||||
Databricks | https://lnkd.in/dCBiQXPR | |||
Snowflake | https://lnkd.in/dDBeddVy | |||
Cloud Computing Concepts | ||||
Distributed Systems fundamentals | ||||
AWS | ||||
Azure | ||||
GCP | ||||
Git & GitHub | https://lnkd.in/dqJ7J3kN | |||
CI/CD | https://lnkd.in/dcfKBmCa | |||
Jenkins | ||||
Github Actions | ||||
Terraform | ||||
Sonarqube | ||||
Docker | ||||
Kubernetes | ||||
Power BI | ||||
Tableau | ||||
Apache Superset | ||||
Prometheus | ||||
Graphana | ||||
Datadog |
- Netflix - https://netflixtechblog.medium.com/
- AWS - https://aws.amazon.com/solutions/case-studies/
- GCP - https://cloud.google.com/customers
- Azure - https://azure.microsoft.com/en-us/resources/customer-stories/
- Spotify - https://engineering.atspotify.com/category/data/
- MongoDB - https://www.mongodb.com/blog/all
- Swiggy - https://bytes.swiggy.com/the-swiggy-delivery-challenge-part-one-6a2abb4f82f6 - https://bytes.swiggy.com/swiggy-distance-service-9868dcf613f4 - https://bytes.swiggy.com/the-tech-that-brings-you-your-food-1a7926229886
- Zomato - https://blog.zomato.com/