Given a new sample, we denote it by
where the first element is the bias term and the others are the feature values.
-
Binary problem
Consider a binary classification task with a positive class and a negative class.
Denote
Then
and the probability that the new sample is positive is
-
Multiclass problem
Consider a multiclass classification task with classes
.
Denote
Then
In both cases, is a vector containing all weights,
and
is a constant
that determines the strength of regularization.
- penalty_type: the norm used in the regularization term (L1 or L2)
- penalty: inverse of regularization strength
(i.e. larger values lead to weaker regularization.)
- fit_intercept: whether to use a bias term
- intercept_scaling: scale of the bias term
- solver: learning algorithm used to optimize the loss function
- multi_class: mode for multiclass problems
- ovr: one vs. all (one classifier for each class)
- multinomial: one classifier for all classes
- class_weight: weights associated with the classes
- uniform: every class receives the same weight.
- balanced: class weights are inversely proportional to class frequencies.
Stopping criteria:
- tol: minimum reduction in loss required for optimization to continue.
- max_iter: maximum number of iterations allowed for the learning algorithm to converge.
Check out the documentation listed below to view the attributes that are available in sklearn but not exposed to the user in the software.
- sklearn tutorial on linear models (including Logistic Regression).
- sklearn
LogisticRegression
documentation