Skip to content

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Notifications You must be signed in to change notification settings

marcoshsq/The_Self-taught_Data_Scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 

Repository files navigation

Developer Roadmap

Advanced Data Science and Analytics Self-Taught Program


πŸ“Œ Summary


🧠 About

This Self-Taught Data Science Curriculum is a structured learning roadmap designed to provide a comprehensive education in data science and analytics, leveraging free online resources. Initially created as a personal guide, it is now shared for anyone who wishes to follow a similar path.

This program is structured to provide an end-to-end understanding of key topics, including programming, mathematics, statistics, machine learning, deep learning, and big data. The curriculum was built based on high-quality, freely available courses and learning resources.

If you're looking to transition into data science or enhance your knowledge, this roadmap can serve as a solid foundation.


🎯 Learning Goals

By completing this curriculum, you will develop proficiency in:

1️⃣ Programming for Data Science

  • Python: Data manipulation, visualization, and machine learning.
  • R: Statistical modeling and advanced data analysis.

2️⃣ Mathematics & Statistics for Data Science

  • Linear Algebra, Calculus, Probability, and Inferential Statistics.
  • Bayesian Methods, Regression, and Machine Learning Theory.

3️⃣ Databases, Data Warehousing, and Big Data

  • SQL and NoSQL Databases.
  • Data lakes and cloud computing solutions.
  • Big Data processing with Spark and Hadoop.

4️⃣ Machine Learning & Deep Learning

  • Supervised and Unsupervised Learning.
  • Neural Networks and Natural Language Processing (NLP).
  • Reinforcement Learning and AI Ethics.

πŸ“š Curriculum Overview

The curriculum is divided into well-structured sections, each covering essential areas of data science:

  1. Fundamentals - Basic concepts and data literacy (~40h).
  2. Mathematics & Statistics - Essential mathematical foundations (~90h).
  3. Programming - Python & R for data science (~215h).
  4. Data Mining - Extracting insights and patterns (~120h).
  5. Databases - SQL and database management (~80h).
  6. Big Data - Processing large-scale datasets (~85h).
  7. Machine Learning - Core ML concepts and models (~120h).
  8. Deep Learning - Advanced AI techniques (~125h).
  9. Data Warehousing - Data integration and storage (~300h).
  10. Cloud Computing - Cloud solutions for data science (~120h).

A detailed breakdown of each section, including recommended courses, can be found in the repository.


πŸ“Œ How to Use This Curriculum

This roadmap is flexible and can be adapted based on your learning pace and background:

βœ… Follow it sequentially if you're starting from scratch. βœ… Skip sections if you already have knowledge in a particular area. βœ… Combine different resources, projects, and additional readings.

Each module contains curated courses with estimated effort and certification options when available.


Section 01 - Fundamentals (~40h)

Course Offered by Effort Certificate, if applicable Status
Data – What It Is, What We Can Do With It Johns Hopkins University ~11h Certificate of Completion βœ“
What is Data Science? IBM Skills Network ~11h Certificate of Completion βœ“
The Data Scientist's Toolbox Johns Hopkins University ~18h Certificate of Completion βœ“

Section 02 - Mathematics and Statistics for Data Science (~90h)

Course Offered by Effort of Certificate, if applicable Status
Linear Algebra for Machine Learning and Data Science DeepLearning.AI ~34h -- --
Calculus for Machine Learning and Data Science DeepLearning.AI ~25h -- --
Probability and Statistics for Machine Learning and Data Science DeepLearning.AI ~33h -- --

Section 03 - Programming for Data Science

Section 03-A - Python Language for Data Analysis (~140h)

Course Offered by Effort of Certificate, if applicable Status
Introduction to Data Science in Python University of Michigan ~34h -- --
Applied Plotting, Charting & Data Representation in Python University of Michigan ~24h -- --
Applied Machine Learning in Python University of Michigan ~31h -- --
Applied Text Mining in Python University of Michigan ~25h -- --
Applied Social Network Analysis in Python University of Michigan ~26h -- --

Section 03-B - R Language for Statistical Analysis and Modeling (~75h)

Course Offered by Effort of Certificate, if applicable Status
R Programming Johns Hopkins University ~27h -- --
Advanced R Programming Johns Hopkins University ~18h -- --
Building R Packages Johns Hopkins University ~20 -- --
Building Data Visualization Tools Johns Hopkins University ~12h -- --
Mastering Software Development in R Johns Hopkins University ~3h -- --

Section 04 - Data Mining (~120h)

Course Offered by Effort Certificate, if applicable Status
Data Visualization University of Illinois Urbana-Champaign ~15h -- --
Text Retrieval and Search Engines University of Illinois Urbana-Champaign ~30h -- --
Text Mining and Analysis University of Illinois Urbana-Champaign ~33h -- --
Pattern Discovery in Data Mining University of Illinois Urbana-Champaign ~17h -- --
Cluster Analysis in Data Mining University of Illinois Urbana-Champaign ~16h -- --

Section 05 - Databases and SQL (~80h)

Course Offered by Effort Certificate, if applicable Status
Relational Database Design University of Colorado ~34h -- --
The Structured Query Language (SQL) University of Colorado ~26h -- --
Advanced Topics and Future Trends in Database Technologies University of Colorado ~16h -- --

Section 06 - Big Data (~85h)

Course Offered by Effort Certificate, if applicable Status
Introduction to Big Data University of California ~17h -- --
Big Data Modeling and Management Systems University of California ~13h -- --
Big Data Integration and Processing University of California ~17h -- --
Machine Learning with Big Data University of California ~23h -- --
Graph Analytics for Big Data University of California ~13h -- --

Section 07 - Machine Learning (~120h)

Course Offered by Effort Certificate, if applicable Status
Supervised Machine Learning: Regression and Classification DeepLearning.AI ~33h -- --
Advanced Machine Learning Algorithms DeepLearning.AI ~34h -- --
Unsupervised Learning, Recommenders, Reinforcement Learning DeepLearning.AI ~37h -- --

Section 08 - Deep Learning (~125h)

Course Offered by Effort Certificate, if applicable Status
Neural Networks and Deep Learning DeepLearning.AI ~24h -- --
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization DeepLearning.AI ~23h -- --
Structuring Machine Learning Projects DeepLearning.AI ~06h -- --
Convolutional Neural Networks DeepLearning.AI ~35h -- --
Sequence Models DeepLearning.AI ~37h -- --

Section 09 - Data Warehousing (~300h)

Course Offered by Effort Certificate, if applicable Status
Database Management Essentials Colorado Boulder ~122h -- --
Data Warehouse Concepts, Design, and Data Integration Colorado Boulder ~62h -- --
Relational Database Support for Data Warehouses Colorado Boulder ~71h -- --
Business Intelligence Concepts, Tools, and Applications Colorado Boulder ~21h -- --
Design and Build a Data Warehouse for Business Intelligence Implementation Colorado Boulder ~31h -- --

Section 10 - Cloud Computing (~120h)

Course Offered by Effort Certificate, if applicable Status
Cloud Concepts 1 University of Illinois Urbana-Champaign ~24h -- --
Cloud Concepts 2 University of Illinois Urbana-Champaign ~19h -- --
Cloud applications 1 University of Illinois Urbana-Champaign ~15h -- --
Cloud applications 2 University of Illinois Urbana-Champaign ~19h -- --
Cloud Networks University of Illinois Urbana-Champaign ~22h -- --
Cloud Computing Project University of Illinois Urbana-Champaign ~21h -- --

πŸ“– Extra Bibliography

If you're looking for deeper insights, consider these additional resources:

Mathematics

Machine Learning & AI

Programming & Databases

These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.

πŸ“ Notes and Clarifications

  • Course durations are approximate and based on platform estimates.
  • Some books were accessed through university partnerships, but if you don't have access... well, explore alternative ways. If possible, support authors by purchasing them.
  • The curriculum is continuously evolving as new resources become available.

πŸ”— References

Sources used to structure this curriculum:


Developer Roadmap


About

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published