References:
- Fast Forwards Labs
References:
https://github.com/asjad99/Algorithms-for-data-products CS168, advanced analytics in Spark Data science foundations book by CMU
Welcome to the Fundamental Concepts in Data Science Repo!
This Repo is a collection of Jupyter notebooks aimed at teaching key concepts in data science, ranging from foundational mathematics to practical data analysis techniques. Each notebook is designed to be a standalone learning resource, complete with explanations, examples, and exercises to help you get hands-on with each topic.
Below is an index of the notebooks included in this repository. Links to the individual notebooks will be added soon.
Topic | Description | Link |
---|---|---|
1. Exploratory Data Analysis (EDA) | EDA Notebook 1: Introduction to EDA and basic data summary techniques EDA Notebook 2: Understanding distributions, central tendency, and variability EDA Notebook 3: Univariate and multivariate relationships in data | [Link Placeholder] |
2. Data Munging | Data Cleaning and Preprocessing: Handling missing data, data transformation, and feature engineering Data Wrangling: Merging, reshaping, and dealing with categorical data | [Link Placeholder] |
3. Linear Algebra | Vectors and Matrices: Concepts of vectors, operations on matrices, and matrix factorizations Applications in Data Science: How linear algebra is used in machine learning models | [Link Placeholder] |
4. Data Visualization Techniques | Basic Plotting: Introduction to Matplotlib and Seaborn for visualizing data Advanced Visualization: Creating interactive visualizations and dashboards | [Link Placeholder] |
5. Statistics | Descriptive Statistics: Measures of central tendency and variability Inferential Statistics: Hypothesis testing, confidence intervals, and p-values | [Link Placeholder] |
6. Experimental Design and Analysis | Experimental Design: Concepts of controlled experiments, A/B testing, and sample size determination Analysis Techniques: Methods for analyzing experimental results | [Link Placeholder] |
7. Dimensionality Reduction | PCA and t-SNE: Introduction to Principal Component Analysis and t-Distributed Stochastic Neighbor Embedding Feature Selection: Techniques for selecting important features in a dataset | [Link Placeholder] |
8. Clustering | K-Means and Hierarchical Clustering: Understanding unsupervised learning and cluster formation Clustering Evaluation: Techniques to evaluate clustering effectiveness | [Link Placeholder] |
9. Graphs | Introduction to Graphs: Understanding nodes, edges, and types of graphs Network Analysis: Concepts like centrality, shortest path, and community detection | [Link Placeholder] |
10. Numerical Optimization | Optimization Basics: Gradient descent, learning rates, and optimization algorithms Applications: Optimization techniques in machine learning models | [Link Placeholder] |
11. Storytelling with Data (Our World in Data) | Data Storytelling Techniques: How to effectively communicate insights using real-world datasets Examples from Our World in Data: Exploring global datasets to tell compelling stories | [Link Placeholder] |
Contributions are welcome! If you would like to add new notebooks, suggest changes, or fix any issues, please feel free to submit a pull request.
Advanced Algorithms: https://github.com/asjad99/Algorithms-for-data-products Everyday DS Tools: https://github.com/asjad99/Data-Science-Tools Case Stuides/Applications: https://github.com/asjad99/Data-Science-Applications
https://computationalthinking.mit.edu/Fall24/
Jeremy Kun etc