- About
- Learning Goals
- Curriculum Overview
- How to Use This Curriculum
- Extra Bibliography
- References
- Notes and Clarifications
This Self-Taught Data Science Curriculum is a structured learning roadmap designed to provide a comprehensive education in data science and analytics, leveraging free online resources. Initially created as a personal guide, it is now shared for anyone who wishes to follow a similar path.
This program is structured to provide an end-to-end understanding of key topics, including programming, mathematics, statistics, machine learning, deep learning, and big data. The curriculum was built based on high-quality, freely available courses and learning resources.
If you're looking to transition into data science or enhance your knowledge, this roadmap can serve as a solid foundation.
By completing this curriculum, you will develop proficiency in:
- Python: Data manipulation, visualization, and machine learning.
- R: Statistical modeling and advanced data analysis.
- Linear Algebra, Calculus, Probability, and Inferential Statistics.
- Bayesian Methods, Regression, and Machine Learning Theory.
- SQL and NoSQL Databases.
- Data lakes and cloud computing solutions.
- Big Data processing with Spark and Hadoop.
- Supervised and Unsupervised Learning.
- Neural Networks and Natural Language Processing (NLP).
- Reinforcement Learning and AI Ethics.
The curriculum is divided into well-structured sections, each covering essential areas of data science:
- Fundamentals - Basic concepts and data literacy (~40h).
- Mathematics & Statistics - Essential mathematical foundations (~90h).
- Programming - Python & R for data science (~215h).
- Data Mining - Extracting insights and patterns (~120h).
- Databases - SQL and database management (~80h).
- Big Data - Processing large-scale datasets (~85h).
- Machine Learning - Core ML concepts and models (~120h).
- Deep Learning - Advanced AI techniques (~125h).
- Data Warehousing - Data integration and storage (~300h).
- Cloud Computing - Cloud solutions for data science (~120h).
A detailed breakdown of each section, including recommended courses, can be found in the repository.
This roadmap is flexible and can be adapted based on your learning pace and background:
β Follow it sequentially if you're starting from scratch. β Skip sections if you already have knowledge in a particular area. β Combine different resources, projects, and additional readings.
Each module contains curated courses with estimated effort and certification options when available.
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data β What It Is, What We Can Do With It | Johns Hopkins University | ~11h | Certificate of Completion | β |
What is Data Science? | IBM Skills Network | ~11h | Certificate of Completion | β |
The Data Scientist's Toolbox | Johns Hopkins University | ~18h | Certificate of Completion | β |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Linear Algebra for Machine Learning and Data Science | DeepLearning.AI | ~34h | -- | -- |
Calculus for Machine Learning and Data Science | DeepLearning.AI | ~25h | -- | -- |
Probability and Statistics for Machine Learning and Data Science | DeepLearning.AI | ~33h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Data Science in Python | University of Michigan | ~34h | -- | -- |
Applied Plotting, Charting & Data Representation in Python | University of Michigan | ~24h | -- | -- |
Applied Machine Learning in Python | University of Michigan | ~31h | -- | -- |
Applied Text Mining in Python | University of Michigan | ~25h | -- | -- |
Applied Social Network Analysis in Python | University of Michigan | ~26h | -- | -- |
Course | Offered by | Effort of | Certificate, if applicable | Status |
---|---|---|---|---|
R Programming | Johns Hopkins University | ~27h | -- | -- |
Advanced R Programming | Johns Hopkins University | ~18h | -- | -- |
Building R Packages | Johns Hopkins University | ~20 | -- | -- |
Building Data Visualization Tools | Johns Hopkins University | ~12h | -- | -- |
Mastering Software Development in R | Johns Hopkins University | ~3h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Data Visualization | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Text Retrieval and Search Engines | University of Illinois Urbana-Champaign | ~30h | -- | -- |
Text Mining and Analysis | University of Illinois Urbana-Champaign | ~33h | -- | -- |
Pattern Discovery in Data Mining | University of Illinois Urbana-Champaign | ~17h | -- | -- |
Cluster Analysis in Data Mining | University of Illinois Urbana-Champaign | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Relational Database Design | University of Colorado | ~34h | -- | -- |
The Structured Query Language (SQL) | University of Colorado | ~26h | -- | -- |
Advanced Topics and Future Trends in Database Technologies | University of Colorado | ~16h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Introduction to Big Data | University of California | ~17h | -- | -- |
Big Data Modeling and Management Systems | University of California | ~13h | -- | -- |
Big Data Integration and Processing | University of California | ~17h | -- | -- |
Machine Learning with Big Data | University of California | ~23h | -- | -- |
Graph Analytics for Big Data | University of California | ~13h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Supervised Machine Learning: Regression and Classification | DeepLearning.AI | ~33h | -- | -- |
Advanced Machine Learning Algorithms | DeepLearning.AI | ~34h | -- | -- |
Unsupervised Learning, Recommenders, Reinforcement Learning | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Neural Networks and Deep Learning | DeepLearning.AI | ~24h | -- | -- |
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization | DeepLearning.AI | ~23h | -- | -- |
Structuring Machine Learning Projects | DeepLearning.AI | ~06h | -- | -- |
Convolutional Neural Networks | DeepLearning.AI | ~35h | -- | -- |
Sequence Models | DeepLearning.AI | ~37h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Database Management Essentials | Colorado Boulder | ~122h | -- | -- |
Data Warehouse Concepts, Design, and Data Integration | Colorado Boulder | ~62h | -- | -- |
Relational Database Support for Data Warehouses | Colorado Boulder | ~71h | -- | -- |
Business Intelligence Concepts, Tools, and Applications | Colorado Boulder | ~21h | -- | -- |
Design and Build a Data Warehouse for Business Intelligence Implementation | Colorado Boulder | ~31h | -- | -- |
Course | Offered by | Effort | Certificate, if applicable | Status |
---|---|---|---|---|
Cloud Concepts 1 | University of Illinois Urbana-Champaign | ~24h | -- | -- |
Cloud Concepts 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud applications 1 | University of Illinois Urbana-Champaign | ~15h | -- | -- |
Cloud applications 2 | University of Illinois Urbana-Champaign | ~19h | -- | -- |
Cloud Networks | University of Illinois Urbana-Champaign | ~22h | -- | -- |
Cloud Computing Project | University of Illinois Urbana-Champaign | ~21h | -- | -- |
If you're looking for deeper insights, consider these additional resources:
- The Elements of Statistical Learning - Hastie, Tibshirani, Friedman.
- Introduction to Statistical Learning - James, Witten, Hastie, Tibshirani.
- Bayesian Statistics - Peter M. Lee.
- Artificial Intelligence: A Modern Approach - Stuart Russell.
- Deep Learning Papers Reading Roadmap - Collection of AI research papers.
- SQL for Smarties - Joe Celko.
- The Missing Semester of Your CS Education - MIT.
These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.
- Course durations are approximate and based on platform estimates.
- Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
- The curriculum is continuously evolving as new resources become available.
Sources used to structure this curriculum:
- OSSU Data Science - Open-source university model.
- AI Expert Roadmap - AI & Data Science roadmap.
- Roadmap SH - Learning paths for various tech disciplines.
- USP Statistics Course - Inspiration for course selection.