An implementation of Adaboost for an university assignment (Machine Learning - Universidade Federal de Minas Gerais).
Adaptive Boosting is a Boosting learning algorithm which combines simple models (weak learners) in order to build a strong learner sensitive to noisy data and outliers and less prone to overfitting.
- Python (2.7)
- NumPy
- Matplotlib
The main.py
file accepts the following command line arguments:
- -t: number of iterations (max: 27).
- -i: input file.
- -o: output file (errors line plot).
- -h: shows possible command line arguments.
Example of execution command:
./main.py -t 27 -i data_formated.csv -o errors.out
This implementation was adapted to classify TicTacToe games from the dataset downloaded at the UC Irvine Machine Learning Repository. This data is composed by 958 instances where x is assumed to have played first and where the class - positive or negative - represents if the x player won or lose, respectively.
The original dataset (data.csv
) was formatted using the commands listed below in the text editor Vim, resulting in the file data_formated.csv
:
:%s/positive/1/g
:%s/negative/-1/g
:%s/b/0/g
:%s/x/1/g
:%s/o/2/g
As shown by the image below, the test error - E(out) - follows the training error - E(in) - by decreasing when the second one goes down and by increasing when the training error goes up, as it should be expected in a AdaBoost algorithm.
The average accuracy was about 76.4%.