@@ -278,6 +278,19 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
278278====
279279end::chunking-config[]
280280
281+ tag::class-assignment-objective[]
282+ Defines the objective to optimize when assigning class labels:
283+ `maximize_accuracy` or `maximize_minimum_recall` . When maximizing accuracy,
284+ class labels are chosen to maximize the number of correct predictions. When
285+ maximizing minimum recall, labels are chosen to maximize the minimum recall
286+ for any class. Defaults to `maximize_minimum_recall` .
287+ end::class-assignment-objective[]
288+
289+ tag::compute-feature-influence[]
290+ Specifies whether the feature influence calculation is enabled. Defaults to
291+ `true` .
292+ end::compute-feature-influence[]
293+
281294tag::custom-rules[]
282295An array of custom rule objects, which enable you to customize the way detectors
283296operate. For example, a rule may dictate to the detector conditions under which
@@ -479,32 +492,15 @@ tag::dfas-downsample-factor[]
479492The value of the downsample factor.
480493end::dfas-downsample-factor[]
481494
482- tag::dfas-eta[]
483- The value of the eta hyperparameter.
484- end::dfas-eta[]
485-
486495tag::dfas-eta-growth[]
487496Specifies the rate at which the `eta` increases for each new tree that is added to the
488497forest. For example, a rate of `1.05` increases `eta` by 5%.
489498end::dfas-eta-growth[]
490499
491- tag::dfas-feature-bag-fraction[]
492- The fraction of features that is used when selecting a random bag for each
493- candidate split.
494- end::dfas-feature-bag-fraction[]
495-
496- tag::dfas-gamma[]
497- Regularization factor to penalize trees with large numbers of nodes.
498- end::dfas-gamma[]
499-
500500tag::dfas-iteration[]
501501The number of iterations on the analysis.
502502end::dfas-iteration[]
503503
504- tag::dfas-lambda[]
505- Regularization factor to penalize large leaf weights.
506- end::dfas-lambda[]
507-
508504tag::dfas-max-attempts[]
509505If the algorithm fails to determine a non-trivial tree (more than a single
510506leaf), this parameter determines how many of such consecutive failures are
@@ -519,10 +515,6 @@ The maximum number of steps is determined based on the number of undefined hyper
519515times the maximum optimization rounds per hyperparameter.
520516end::dfas-max-optimization-rounds[]
521517
522- tag::dfas-max-trees[]
523- The maximum number of trees in the forest.
524- end::dfas-max-trees[]
525-
526518tag::dfas-num-folds[]
527519The maximum number of folds for the cross-validation procedure.
528520end::dfas-num-folds[]
@@ -584,9 +576,9 @@ end::empty-bucket-count[]
584576
585577tag::eta[]
586578Advanced configuration option. The shrinkage applied to the weights. Smaller
587- values result in larger forests which have a better generalization error. However,
588- the smaller the value the longer the training will take. For more information,
589- about shrinkage, see
579+ values result in larger forests which have a better generalization error.
580+ However, the smaller the value the longer the training will take. For more
581+ information, about shrinkage, see
590582https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article].
591583end::eta[]
592584
@@ -605,9 +597,15 @@ end::exclude-interim-results[]
605597
606598tag::feature-bag-fraction[]
607599Advanced configuration option. Defines the fraction of features that will be
608- used when selecting a random bag for each candidate split.
600+ used when selecting a random bag for each candidate split. By default, this
601+ value is calculated during hyperparameter optimization.
609602end::feature-bag-fraction[]
610603
604+ tag::feature-influence-threshold[]
605+ The minimum {olscore} that a document needs to have in order to calculate its
606+ {fiscore} . Value range: 0-1 (`0.1` by default).
607+ end::feature-influence-threshold[]
608+
611609tag::filter[]
612610One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in
613611token filters, other plugins can provide more token filters. This property is
@@ -653,7 +651,8 @@ Advanced configuration option. Regularization parameter to prevent overfitting
653651on the training data set. Multiplies a linear penalty associated with the size of
654652individual trees in the forest. The higher the value the more training will
655653prefer smaller trees. The smaller this parameter the larger individual trees
656- will be and the longer training will take.
654+ will be and the longer training will take. By default, this value is calculated
655+ during hyperparameter optimization.
657656end::gamma[]
658657
659658tag::groups[]
@@ -785,6 +784,7 @@ more training will attempt to keep leaf weights small. This makes the prediction
785784function smoother at the expense of potentially not being able to capture
786785relevant relationships between the features and the {depvar} . The smaller this
787786parameter the larger individual trees will be and the longer training will take.
787+ By default, this value is calculated during hyperparameter optimization.
788788end::lambda[]
789789
790790tag::last-data-time[]
@@ -828,9 +828,18 @@ end::max-empty-searches[]
828828
829829tag::max-trees[]
830830Advanced configuration option. Defines the maximum number of trees the forest is
831- allowed to contain. The maximum value is 2000.
831+ allowed to contain. The maximum value is 2000. By default, this value is
832+ calculated during hyperparameter optimization.
832833end::max-trees[]
833834
835+ tag::method[]
836+ The method that {oldetection} uses. Available methods are `lof` , `ldof` ,
837+ `distance_kth_nn` , `distance_knn` , and `ensemble` . The default value is
838+ `ensemble` , which means that {oldetection} uses an ensemble of different methods
839+ and normalises and combines their individual {olscores} to obtain the overall
840+ {olscore} .
841+ end::method[]
842+
834843tag::missing-field-count[]
835844The number of input documents that are missing a field that the {anomaly-job} is
836845configured to analyze. Input documents with missing fields are still processed
@@ -973,6 +982,14 @@ NOTE: To use the `multivariate_by_fields` property, you must also specify
973982--
974983end::multivariate-by-fields[]
975984
985+ tag::n-neighbors[]
986+ Defines the value for how many nearest neighbors each method of {oldetection}
987+ uses to calculate its {olscore} . When the value is not set, different values are
988+ used for different ensemble members. This default behavior helps improve the
989+ diversity in the ensemble; only override it if you are confident that the value
990+ you choose is appropriate for the data set.
991+ end::n-neighbors[]
992+
976993tag::node-address[]
977994The network address of the node.
978995end::node-address[]
@@ -1015,6 +1032,12 @@ order documents are discarded, since jobs require time series data to be in
10151032ascending chronological order.
10161033end::out-of-order-timestamp-count[]
10171034
1035+ tag::outlier-fraction[]
1036+ The proportion of the data set that is assumed to be outlying prior to
1037+ {oldetection} . For example, 0.05 means it is assumed that 5% of values are real
1038+ outliers and 95% are inliers.
1039+ end::outlier-fraction[]
1040+
10181041tag::over-field-name[]
10191042The field used to split the data. In particular, this property is used for
10201043analyzing the splits with respect to the history of all splits. It is used for
@@ -1143,6 +1166,13 @@ number of data points. If your data contains many sparse buckets, consider using
11431166a longer `bucket_span` .
11441167end::sparse-bucket-count[]
11451168
1169+ tag::standardization-enabled[]
1170+ If `true` , the following operation is performed on the columns before computing
1171+ {olscores} : (x_i - mean(x_i)) / sd(x_i). Defaults to `true` . For more
1172+ information about this concept, see
1173+ https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[Wikipedia].
1174+ end::standardization-enabled[]
1175+
11461176tag::state-anomaly-job[]
11471177The status of the {anomaly-job} , which can be one of the following values:
11481178+
0 commit comments