Questions-answers/ materials I use to study for data science interviews
Some credits to: this repo
Note: Do contribute with PRs! Repo is very messy atm, sorry.
- Used Stratascratch a lot for data analytics interview preparation
- Leetcode - Database
- HackerRank
- Grind75
- Leetcode - Top 150 Questions
- deep-ml
- Statquest - Machine Learning/ Statistics/ Deep Learning
- 3b1b - Math for Machine Learning
- Andrej kaparthy - LLM Legend
- Indently - Good coding practices
- Linear Regression
- https://www.analyticsvidhya.com/blog/2021/06/linear-regression-in-machine-learning/
- Used for regression tasks to predict a continuous target variable.
- Logistic Regression
- https://www.analyticsvidhya.com/blog/2021/10/building-an-end-to-end-logistic-regression-model/
- Used for binary classification problems.
- Decision Trees
- https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/
- A tree-based model for both classification and regression tasks.
- Random Forest
- https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/
- An ensemble method based on multiple decision trees.
- Support Vector Machines (SVM)
- https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-a-complete-guide-for-beginners/
- Used for classification tasks by finding the optimal hyperplane.
- K-Nearest Neighbors (KNN)
- https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
- A simple classification algorithm that assigns a class based on the majority class of nearest neighbors.
- Naive Bayes
- https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
- A probabilistic classifier based on Bayes' theorem with strong independence assumptions.
- Gradient Boosting Machines (GBM)
- https://www.analyticsvidhya.com/blog/2021/09/gradient-boosting-algorithm-a-complete-guide-for-beginners/
- An ensemble method that builds models sequentially to reduce errors.
- AdaBoost
- https://www.analyticsvidhya.com/blog/2021/09/adaboost-algorithm-a-complete-guide-for-beginners/
- https://www.analyticsvidhya.com/blog/2021/03/introduction-to-adaboost-algorithm-with-python/
- A boosting method that combines weak learners to form a strong classifier.
- XGBoost
- https://www.analyticsvidhya.com/blog/2018/09/an-end-to-end-guide-to-understand-the-math-behind-xgboost/
- https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/
- A popular gradient-boosting framework optimized for performance.
- K-Means Clustering
- https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/
- A popular clustering algorithm that partitions data into K clusters.
- Hierarchical Clustering
- https://www.analyticsvidhya.com/blog/2022/11/hierarchical-clustering-in-machine-learning/
- https://www.analyticsvidhya.com/blog/2021/06/single-link-hierarchical-clustering-clearly-explained/
- A clustering technique that builds a hierarchy of clusters.
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- https://www.analyticsvidhya.com/blog/2021/06/understand-the-dbscan-clustering-algorithm/
- https://www.analyticsvidhya.com/blog/2020/09/how-dbscan-clustering-works/
- Clustering algorithm that groups together closely packed points and marks outliers.
- Principal Component Analysis (PCA)
- https://www.analyticsvidhya.com/blog/2016/03/pca-practical-guide-principal-component-analysis-python/
- https://towardsdatascience.com/the-mathematics-behind-principal-component-analysis-fff2d7f4b643
- A dimensionality reduction technique used to project data onto fewer dimensions.
- t-SNE (t-distributed Stochastic Neighbor Embedding)
- https://medium.com/@sachinsoni600517/mastering-t-sne-t-distributed-stochastic-neighbor-embedding-0e365ee898ea
- A non-linear dimensionality reduction method for visualizing high-dimensional data.
- Intro: https://ketanhdoshi.github.io/Reinforcement-Learning-Intro/
- Solution Approaches: https://ketanhdoshi.github.io/Reinforcement-Learning-Solutions/
- Model Free Solution: https://ketanhdoshi.github.io/Reinforcement-Learning-Model/
- Q-Learning
- https://ketanhdoshi.github.io/Reinforcement-Learning-Q-Learning/
- A model-free reinforcement learning algorithm based on learning a Q-value function.
- Deep Q-Networks (DQN)
- https://ketanhdoshi.github.io/Reinforcement-Learning-Deep-Q-Network/
- A combination of Q-learning and deep neural networks.
- Policy Gradient Methods
- https://ketanhdoshi.github.io/Reinforcement-Learning-Policy-Gradients/
- Learn policies directly instead of learning a value function.
- Proximal Policy Optimization (PPO)
- https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b
- A modern, stable policy optimization method used in reinforcement learning.
- SARSA (State-Action-Reward-State-Action)
- A reinforcement learning algorithm that updates policies based on current actions.
- Artificial Neural Networks (ANN)
- The basic neural network model used for various tasks.
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/
- https://www.analyticsvidhya.com/blog/2022/03/a-brief-overview-of-recurrent-neural-networks-rnn/
- https://karpathy.github.io/2015/05/21/rnn-effectiveness/
- Used for sequential data tasks like time series or natural language processing.
- Long Short-Term Memory Networks (LSTM)
- https://medium.com/@ottaviocalzone/an-intuitive-explanation-of-lstm-a035eb6ab42c
- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://weberna.github.io/blog/2017/11/15/LSTM-Vanishing-Gradients.html
- https://data-science-blog.com/blog/2020/09/07/back-propagation-of-lstm/
- A type of RNN capable of learning long-term dependencies.
- Transformer Networks
- A deep learning architecture primarily used in NLP tasks (e.g., BERT, GPT).
- Generative Adversarial Networks (GANs)
- https://www.analyticsvidhya.com/blog/2021/10/an-end-to-end-introduction-to-generative-adversarial-networksgans/
- A framework involving two neural networks to generate new data.
- https://towardsdatascience.com/recommender-systems-a-complete-guide-to-machine-learning-models-96d3f94ea748
- Collaborative Filtering:
- Content-Based Filtering:
-
Bagging
- Combines the predictions of several base models (e.g., Random Forest).
-
Boosting
- Sequentially builds models that correct the errors of previous models (e.g., XGBoost, AdaBoost).
-
Stacking
- Combines multiple models by training a meta-model on their predictions.