Skip to content

Taking part in Kaggle challenges or simply picking random datasets and working on them

Notifications You must be signed in to change notification settings

paschok/Exploring-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Datasets from kaggle or other open-source directories

Commit rules:

  1. Dataset name: the work done - what to do next / what is needed
  2. README.md : update

Car-Evaluation Dataset

Evaluating the following famous Car Evaluation Data Set by Marco Bohavic with following two models.

  1. Decision Tree model with entropy and gini index acquired ±78% accuracy score on both criterions.
  2. However, Support Vector Michine model with GridSearch found that SVM with ninth degree polynomial brings accuracy score of ±90%. More cleaner representation of this method you can see in this Kaggle notebook

Both models are stored in one car.ipynb file.


Student Performance Dataset

Evaluating the following famous Student Performance Data Set by Paulo Cortez.

  1. Two netobooks:
  • Student performance - main file with all explanations and procedure
  • briefly testing models - simple draft for testing model
  1. Detailed visualization of feature engineering
  2. Detailed explanation of such important things, as scaling, encoding and feature chose

_neural networks

Directory dedicated to learning ANN

  1. Two netobooks:
  • Minimal Intro with TensorFlow 2 - implementing simple linear formula with simpliest ANN
  • Simple TensorFlow Example - TF as low-level API