Welcome to the central repository for Omdena's resources. This repository serves as a comprehensive guide to help Data Scientists and Machine Learning Engineers at all levels, from beginners to advanced practitioners.
This repository includes tutorials, code examples, notebooks, and libraries for a variety of topics in Data Science and Machine Learning.
- Overview of Data Science & Machine Learning
- Key differences between Data Science and Machine Learning
- Importance of Data in Decision Making
- Python for Data Science
- Libraries: Pandas, Numpy, Matplotlib, Seaborn
- Data Cleaning and Preprocessing
- Exploratory Data Analysis (EDA)
- Statistics & Probability
- Descriptive Statistics
- Probability Distributions
- Hypothesis Testing
- A/B Testing
- Data Visualization
- Matplotlib, Seaborn
- Plotly, Dash
- Tableau (Introductory tutorials)
- Supervised Learning
- Regression Models: Linear Regression, Lasso, Ridge, ElasticNet
- Classification Models: Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM)
- Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, ROC Curve
- Unsupervised Learning
- Clustering: K-Means, DBSCAN, Hierarchical Clustering
- Dimensionality Reduction: PCA, t-SNE
- Model Tuning and Hyperparameter Optimization
- Cross-Validation
- Grid Search, Random Search
- Bayesian Optimization
- Deep Learning
- Neural Networks (ANN)
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
- Transformers and Attention Mechanism
- GANs (Generative Adversarial Networks)
- Natural Language Processing (NLP)
- Text Preprocessing
- Tokenization, Lemmatization, Stemming
- Text Classification
- Word Embeddings (Word2Vec, GloVe)
- Transformers (BERT, GPT)
- Reinforcement Learning
- Markov Decision Processes (MDP)
- Q-Learning
- Policy Gradient Methods
- Deep Learning Libraries
- TensorFlow
- Keras
- PyTorch
- FastAI
- Data Manipulation and Analysis
- Pandas
- NumPy
- SciPy
- Dask
- Model Deployment
- Flask/Django for API Development
- FastAPI
- Streamlit for Interactive Dashboards
- Docker for Containerization
- Kubernetes for Orchestration
- MLflow, DVC for Model Versioning
- Other Useful Libraries
- Scikit-learn for Classic ML
- XGBoost, LightGBM, CatBoost for Boosting Models
- Optuna for Hyperparameter Optimization
- Plotly, Matplotlib, Seaborn for Visualization
- SQL for Data Querying
- Code Style and Documentation
- Version Control with Git
- Collaborative Work in GitHub (Forking, Pull Requests, Issues)
- Writing Tests for Machine Learning Models
- Model Interpretability (LIME, SHAP)
- Deployment Pipelines (CI/CD)
- Beginner Notebooks
- Introduction to Python and Data Science Libraries
- Basic EDA on Sample Datasets
- Implementing Linear Regression
- Intermediate Notebooks
- K-Means Clustering Example
- Hyperparameter Tuning with GridSearchCV
- Building a Random Forest Classifier
- Advanced Notebooks
- Neural Network for Image Classification (CNN)
- Time Series Forecasting with ARIMA
- BERT for Text Classification
- RL agent training using Q-Learning
- Data Preprocessing
- Handling Missing Values
- Feature Engineering
- Scaling and Normalization
- Machine Learning
- Model Evaluation and Selection
- Overfitting vs Underfitting
- Feature Importance Analysis
- Deep Learning
- Building a Neural Network from Scratch
- Implementing CNNs and RNNs
- Transfer Learning in Deep Learning
- Predictive Analytics for Business
- Image Classification (Using CNN)
- Natural Language Processing for Sentiment Analysis
- Recommendation Systems (Collaborative Filtering, Content-Based Filtering)
We welcome contributions to this repository. If you have any ideas, improvements, or new content, feel free to fork the repository and submit a pull request.
This repository is licensed under the MIT License.
For any queries or suggestions, please contact Head of Community- Tushar Aggarwal or raise an issue.