Prepare data. Here there are 3 features: the first 2 are numerical and the last is nominal.
>>> import numpy as np
>>> X = np.array([[ 1, 1, 0],
[101, 101, 0],
[103, 103, 0],
[ 3, 3, 0],
[ 5, 5, 0],
[107, 107, 0],
[109, 109, 0],
[ 7, 7, 1],
[ 8, 8, 1]])
>>> y = np.array([0, 1, 1, 0, 0, 1, 1, 2, 2])
Import module
>>> from trees_and_forests import DecisionTreeClassifier
Initialise and fit data
>>> clf = DecisionTreeClassifier()
>>> clf.fit(X,y)
Inference
>>> clf.predict(np.array([[1,1,0]]))
Algorithms
- Decision tree classifier
- Decision tree regressor
- Simple bagging
- Random forest
- Extremely randomised trees
- AdaBoost
- Gradient boosting
Software development
- Unit tests
- API design document
- Tutorial
Optimisations
- Cythonise/PyTorchify
- Performance against scikit-learn
https://scikit-learn.org/stable/modules/tree.html
http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf https://scikit-learn.org