The classic binary classification, the Kaggle's 'hello-world' (
You can find my step-by-step notes here. The project is split into two standalone jupyter-notebooks.
desc | link | notable |
Examine the data. Build a robust and reproducible preprocessing pipe with feature transformation and selection. Visualize the data. | | custom-Transformer-classes, ColumnTransformers, FeatureUnion, Pipeline; feature-importance-tests; PCA, TSNE |
Fit and tune classifiers using grid-search with cross-validation. Report accuracy, confusion-matrix, ROC-curve. Select best based on independent validation score. | | GridSearchCV, LogisticRegression, NaiveBayes, GradientBoosting, AdaBoosting, VotingClassifier |
plus a snippet which prepares Kaggle-submittable csv: