Outline
Intro
- Core ideas
- Prediction
- Basic Interpretation
- Adding Complexity
- Assumptions of a linear (and other) model(s)
Question- how to fit classification here without spilling into other stuff?
- General fit
- Model Comparison
- Model Selection
- Model debugging
Misc:
- Model transparency (e.g. model cards)
- Model fairness
- OLS, MLE
- Classification
- Penalized
- Optimization, SGD
- Bayesian
- GLM, GAM, Mixed Models
- Optim/Linear Programming/Hungarian Algorithm?
- Latent Linear Models
- Mixture models/Clustering
- CV, metrics
- Lasso, Ridge, Elastic Net
- Trees
- RF
- GBM
- DL
- NN
- Autoencoders
- Reinforcement learning
- Ensemble models
- Bayesian inference
- Bootstrap
- Conformal Predictions
- Feature and Target Transformations
- missing data
- recommend assessment of predicted data similarity to observed data
- data quality and reliability/measurement
- Sparsity
- Outliers
- Imbalanced data
- 'Big' Data, Scalability
- Data types
- Categorical
- Ordinal
- Continuous
- Time series
- Text
- Images
- Audio
- Video
- Geospatial
- etc.
- Feature Engineering/Pre-processing/Categorical Embeddings/Dimensionality Reduction/Feature Selection/Feature Extraction
- misc feature types: ordinal, zero-infated, etc.
- Transformations: std, log, max
- Data leakage
- Data drift
- Data bias (lack of representativeness), vs. statistical bias
- Misc:
- Data privacy, security, ethics
- Data provenance, governance
- Causality
- Causal inference
- Techniques: experimental design, matching, meta-learners, uplift modeling, etc.