Skip to content

Repository to save code done competing on Kaggle competitions

Notifications You must be signed in to change notification settings

xmaps/kaggle_submissions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

kaggle_submissions

There are three main ways we can improve our submissions:

  • Use a better machine learning algorithm.
  • Generate better features.
  • Combine multiple machine learning algorithms.

Feature engineering is the most important part of any machine learning task, and there are a lot more features we could calculate. However, we also need a way to figure out which features are the best.

One way to accomplish this is to use univariate feature selection. This approach essentially involves reviewing a data set column by column to identify the ones that correlate most closely with what we're trying to predict.

One thing we can do to improve the accuracy of our predictions is ensemble different classifiers. Ensembling means generating predictions based on information from a set of classifiers, instead of just one. In practice, this means that we average their predictions.

Generally speaking, the more diverse the models we ensemble, the higher our accuracy will be. Diversity means that the models generate their results from different columns, or use very different methods to generate predictions. Ensembling a random forest classifier with a decision tree probably won't work extremely well, because they're very similar. On the other hand, ensembling a linear regression with a random forest can yield very good results.

One caveat with ensembling is that the classifiers we use have to be about the same in terms of accuracy. Ensembling one classifier that's much less accurate than the other will probably make the final result worse.

About

Repository to save code done competing on Kaggle competitions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages