GitHub - DEV270201/Employee-Salary-Classifier: Training classification models estimating the likelihood that an individual will achieve an annual income of $80,000 or more, utilizing a sophisticated analysis of multiple determinants. This comprehensive assessment incorporates work experience , Industry , job title , state and many other features to deliver a prediction of financial success

Employee Salary Classifier

Training classification models estimating the likelihood that an individual will achieve an annual income of $80,000 or more, utilizing a sophisticated analysis of multiple determinants. This comprehensive assessment incorporates work experience , Industry , job title , state and many other features to deliver a nuanced prediction of financial success.

Dataset

The dataset contains 17 columns and around 28000 rows. It contains different features like work experience , industry , job title , state, education degree etc..

Data Preprocessing

Used Fuzzy Wuzzy to match different combinations of the name USA as data was manually entered by different users.
Clubbed different education degrees into 4 most common degree categories. Similarly, done for Gender and Race.
Took top 10 Industries and replaced other industries with "Other" for convenience.
Took top 500 Job titles while replaced other job titles with "Other".
Scaled down the Bonus column using RobustScalar.
Used Frequency encoding for State, Industry, Job title, Race due to high cardinality.
Used Label encoding for target variable.
Used Ordinal encoding for Education degree, Work experience, Age.
Used One Hot encoding for Gender.
Removed outliers from our data.

Model Training

With this dataset, Random Forest exhibited the highest model accuracy, reaching 81%, surpassing Naive Bayes,Logistic Regression,KNN which hovered around 76%, with Decision Tree slightly ahead at 78%.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Employee_Salary_Classifier.ipynb		Employee_Salary_Classifier.ipynb
README.md		README.md
salary.csv		salary.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Employee Salary Classifier

Dataset

Data Preprocessing

Model Training

Developed with ❤️ by Devansh Shah, Smit Vora & Harsh Shah

About

Releases

Packages

Languages

DEV270201/Employee-Salary-Classifier

Folders and files

Latest commit

History

Repository files navigation

Employee Salary Classifier

Dataset

Data Preprocessing

Model Training

Developed with ❤️ by Devansh Shah, Smit Vora & Harsh Shah

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages