Skip to content

This repo contains various Data Science projects involving image, text, tabular and graph dataset with classical ML as well as Deep Learning.

Notifications You must be signed in to change notification settings

ankishb/ml-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ml-projects

This reposatry contains my DS and ML-contest's projects, along with my personal fun project. I have dealt with diverse set of problem/data/metrics. Following is the summary of each project, which contains the type of dataset, type of problem and my-approach to handle that(all in very brief). More details can be found in each subdirectory.

If you want to look the following text in a table format, click here

  • DataSet:
    • Image
  • Objective:
    • Bounding Box prediction
  • My Approach:
    • Designed a visual feature pipeline with attention on the object in image
    • Data Augmentation Technique along with its bounding box
    • Used Single Stage Detector Approach
    • Focal Loss with YOLO and SSD
  • DataSet:
    • Text
  • Objective:
    • Classification
  • My Approach:
    • Data Cleaning/feature enginnering
    • Linear/Non-Linear Model
    • Deep Learning Attention Model
    • Pretrained Bert Model
    • Ensemble
  • DataSet:
    • 2500 unknown predictors
  • Objective:
    • Classification
  • My Approach:
    • Feature Understanding(EDA)
    • feature engineering
    • designed feature interaction tools
    • ensemble model using xgboost/lighgbm/catboost and linear/non-linear simple model
    • statistical model to understand the feature importance using p-values
  • DataSet:
    • Very big Dataset(45M observation, graph edge-representation)
    • Relational Feature
    • Category + Numerical
  • Objective:
    • Link Prediction
  • My Approach:
    • Graph Based features such as (adamic-adar, common-resource-allocation,...)
    • SVD feature for each user
    • Comunity-clustering
    • Subsemble(I did this after competition is over, to understand more about sampling and model building)
    • neighbour-based feature(Removed highly cardinal feature)
    • Also tried Deep learning approach (Graph Embedding), but couldn't handle at that time properly
  • DataSet:
    • Category + Numerical
    • Relational Dataset
  • Objective:
    • Regression
  • My Approach:
    • Feature engineering
      1. date-time based feature
      2. Aggregation based feature
      3. Relational Features
    • Ensemble using different set of tranformed target space
  • DataSet:
    • Image
  • Objective:
    • Comparison between ResNet and my modified feature pipeline
    • Classification
  • My Approach:
    • Developed a weighted feature pipeline using global and local feature.
    • Global feature put constrained on local feature, to specifically focused on features of object in image
    • Better attention map around object, which reflect its learned feature.
    • Improved score by 1.37% over Resnet
  • DataSet:
    • Image
  • Objective:
    • Face Verification
  • My Approach:
    • Matching Network Approach
    • Build a Student-Attentdance hardware using arduino
    • Hard Mining Approach(generate all permutation between classes to handle small dataset)
    • network-in-network approach to handle overfitting as i have very small dataset.
    • Achieved 93% accuracy
  • DataSet:
    • Image
  • Objective:
    • Classification (training on very small dataset)
  • My Approach:
    • Prototype Algorithm implementation
    • There is more to this(will update in future)
  • DataSet:
    • Category + Numerical
  • Objective:
    • Regression
  • My Approach:
    • Date based feature and Dummy feature
    • Interaction based feature
    • Bayesian optimization
    • out of fold prediction to generate Meta feature for ensemble
  • DataSet:
    • Text
  • Objective:
    • User-Problem Rating Prediction
  • My Approach:
    • My main concerns was to handle following question carefully:
      1. What is the strongest and weakest area of user?
      2. What is the level of problem?
      3. What problem user have just solved?
      4. If user gets stuck at current problem, what problem should help him(to gain confidence and to improve skill in that area)?
      5. Exploration and explotation strategy in recommending problem
      6. And many more?
  • DataSet:
    • Category + Numerical
  • Objective:
    • Classification
  • My Approach: +
  • DataSet:
    • Image
  • Objective:
    • Segmentation
  • My Approach:
    • Implemented an U-Net architecture on blood cell Dataset.
    • fully convolutional network on traffic-street dataset.
    • Finally experimented with generative adverserial network for better generalization in the presence of limited dataset.
  • DataSet:
    • Relational feature
    • Time-Series Feature
    • Categorical + Numerical
  • Objective:
    • Future Sales Prediction for different store in different cities
  • My Approach: +
  • DataSet:
    • Image
  • Objective:
    • Classification
  • My Approach:
    • EDA
    • Feature Engineering
  • DataSet:
    • Time-Series stock prices
  • Objective:
    • Future price prediction
    • Regression
  • My Approach:
    • Deep learning approach using RNN and LSTM

About

This repo contains various Data Science projects involving image, text, tabular and graph dataset with classical ML as well as Deep Learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages