Skip to content

Commit

Permalink
L2 (#264)
Browse files Browse the repository at this point in the history
* new L2 regularization config

* docs fix

* up

* up
  • Loading branch information
jeremiedb authored Nov 10, 2023
1 parent cc1466e commit 488f2c9
Show file tree
Hide file tree
Showing 6 changed files with 80 additions and 61 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "EvoTrees"
uuid = "f6006082-12f8-11e9-0c9c-0d5d367ab1e5"
authors = ["jeremiedb <[email protected]>"]
version = "0.16.5"
version = "0.16.6"

[deps]
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
Expand Down
19 changes: 9 additions & 10 deletions benchmarks/Higgs-logloss.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ using CSV
using DataFrames
using StatsBase
using Statistics: mean, std
using CUDA
using EvoTrees
using Solage: Connectors
using AWS: AWSCredentials, AWSConfig, @service
Expand Down Expand Up @@ -33,23 +34,23 @@ dtest = df_tot[end-500_000+1:end, :];
config = EvoTreeRegressor(
loss=:logloss,
nrounds=5000,
eta=0.15,
nbins=128,
max_depth=9,
lambda=1.0,
eta=0.2,
nbins=224,
max_depth=11,
L2=1,
lambda=0.0,
gamma=0.0,
rowsample=0.8,
colsample=0.8,
min_weight=1,
rng=123,
)

device = "gpu"
device = "cpu"
metric = "logloss"
@time m_evo = fit_evotree(config, dtrain; target_name, fnames=feature_names, deval, metric, device, early_stopping_rounds=200, print_every_n=100);

p_test = m_evo(dtest);
@info extrema(p_test)
logloss_test = mean(-dtest.y .* log.(p_test) .+ (dtest.y .- 1) .* log.(1 .- p_test))
@info "LogLoss - dtest" logloss_test
error_test = 1 - mean(round.(Int, p_test) .== dtest.y)
Expand All @@ -63,8 +64,8 @@ error_test = 1 - mean(round.(Int, p_test) .== dtest.y)
@info "train"
using XGBoost
params_xgb = Dict(
:num_round => 4000,
:max_depth => 8,
:num_round => 2000,
:max_depth => 10,
:eta => 0.15,
:objective => "reg:logistic",
:print_every_n => 5,
Expand All @@ -81,8 +82,6 @@ watchlist = Dict("eval" => DMatrix(select(deval, feature_names), deval.y));
@time m_xgb = xgboost(dtrain_xgb; watchlist, nthread=Threads.nthreads(), verbosity=0, eval_metric="logloss", params_xgb...);

pred_xgb = XGBoost.predict(m_xgb, DMatrix(select(deval, feature_names)));
@info extrema(pred_xgb)
# (1.9394008f-6, 0.9999975f0)
logloss_test = mean(-dtest.y .* log.(pred_xgb) .+ (dtest.y .- 1) .* log.(1 .- pred_xgb))
@info "LogLoss - dtest" logloss_test
error_test = 1 - mean(round.(Int, pred_xgb) .== dtest.y)
Expand Down
81 changes: 43 additions & 38 deletions src/MLJ.jl
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,8 @@ A model type for constructing a EvoTreeRegressor, based on [EvoTrees.jl](https:/
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `lambda::T=0.0`: L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `alpha::T=0.5`: Loss specific parameter in the [0, 1] range:
- `:quantile`: target quantile for the regression.
Expand Down Expand Up @@ -286,22 +287,23 @@ EvoTreeClassifier is used to perform multi-class classification, using cross-ent
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `lambda::T=0.0`: L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain improvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to `2^max_depth`. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
# Internal API
Expand Down Expand Up @@ -410,23 +412,24 @@ EvoTreeCount is used to perform Poisson probabilistic regression on count target
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `lambda::T=0.0`: L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model. Must be >= 0.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `min_weight=1.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
# Internal API
Expand Down Expand Up @@ -539,24 +542,25 @@ EvoTreeGaussian is used to perform Gaussian probabilistic regression, fitting μ
# Hyper-parameters
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `lambda::T=0.0`: L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for Gaussian regression, constraints may not be enforce systematically.
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
# Internal API
Expand Down Expand Up @@ -676,24 +680,25 @@ EvoTreeMLE performs maximum likelihood estimation. Assumed distribution is speci
`loss=:gaussian`: Loss to be be minimized during training. One of:
- `:gaussian` / `:gaussian_mle`
- `:logistic` / `:logistic_mle`
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `nrounds=10`: Number of rounds. It corresponds to the number of trees that will be sequentially stacked. Must be >= 1.
- `eta=0.1`: Learning rate. Each tree raw predictions are scaled by `eta` prior to be added to the stack of predictions. Must be > 0.
A lower `eta` results in slower learning, requiring a higher `nrounds` but typically improves model performance.
- `lambda::T=0.0`: L2 regularization term on weights. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
- `L2::T=0.0`: L2 regularization factor on aggregate gain. Must be >= 0. Higher L2 can result in a more robust model.
- `lambda::T=0.0`: L2 regularization factor on individual gain. Must be >= 0. Higher lambda can result in a more robust model.
- `gamma::T=0.0`: Minimum gain imprvement needed to perform a node split. Higher gamma can result in a more robust model. Must be >= 0.
- `max_depth=5`: Maximum depth of a tree. Must be >= 1. A tree of depth 1 is made of a single prediction leaf.
A complete tree of depth N contains `2^(N - 1)` terminal leaves and `2^(N - 1) - 1` split nodes.
Compute cost is proportional to 2^max_depth. Typical optimal values are in the 3 to 9 range.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `min_weight=8.0`: Minimum weight needed in a node to perform a split. Matches the number of observations by default or the sum of weights as provided by the `weights` vector. Must be > 0.
- `rowsample=1.0`: Proportion of rows that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `colsample=1.0`: Proportion of columns / features that are sampled at each iteration to build the tree. Should be in `]0, 1]`.
- `nbins=32`: Number of bins into which each feature is quantized. Buckets are defined based on quantiles, hence resulting in equal weight bins. Should be between 2 and 255.
- `monotone_constraints=Dict{Int, Int}()`: Specify monotonic constraints using a dict where the key is the feature index and the value the applicable constraint (-1=decreasing, 0=none, 1=increasing).
!Experimental feature: note that for MLE regression, constraints may not be enforced systematically.
- `tree_type="binary"` Tree structure to be used. One of:
- `tree_type="binary"` Tree structure to be used. One of:
- `binary`: Each node of a tree is grown independently. Tree are built depthwise until max depth is reach or if min weight or gain (see `gamma`) stops further node splits.
- `oblivious`: A common splitting condition is imposed to all nodes of a given depth.
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
- `rng=123`: Either an integer used as a seed to the random number generator or an actual random number generator (`::Random.AbstractRNG`).
# Internal API
Expand Down
6 changes: 3 additions & 3 deletions src/loss.jl
Original file line number Diff line number Diff line change
Expand Up @@ -144,13 +144,13 @@ end
# GradientRegression
function get_gain(params::EvoTypes{L}, ∑::AbstractVector) where {L<:GradientRegression}
ϵ = eps(eltype(∑))
∑[1]^2 / max(ϵ, (∑[2] + params.lambda * ∑[3])) / 2
∑[1]^2 / max(ϵ, (∑[2] + params.lambda * ∑[3] + params.L2)) / 2
end

# GaussianRegression
function get_gain(params::EvoTypes{L}, ∑::AbstractVector) where {L<:MLE2P}
ϵ = eps(eltype(∑))
(∑[1]^2 / max(ϵ, (∑[3] + params.lambda * ∑[5])) + ∑[2]^2 / max(ϵ, (∑[4] + params.lambda * ∑[5]))) / 2
(∑[1]^2 / max(ϵ, (∑[3] + params.lambda * ∑[5] + params.L2)) + ∑[2]^2 / max(ϵ, (∑[4] + params.lambda * ∑[5] + params.L2))) / 2
end

# MultiClassRegression
Expand All @@ -159,7 +159,7 @@ function get_gain(params::EvoTypes{L}, ∑::AbstractVector{T}) where {L<:MLogLos
gain = zero(T)
K = (length(∑) - 1) ÷ 2
@inbounds for k = 1:K
gain += ∑[k]^2 / max(ϵ, (∑[k+K] + params.lambda * ∑[end])) / 2
gain += ∑[k]^2 / max(ϵ, (∑[k+K] + params.lambda * ∑[end] + params.L2)) / 2
end
return gain
end
Expand Down
Loading

0 comments on commit 488f2c9

Please sign in to comment.