Skip to content

Commit 60203fc

Browse files
author
Nick Pentreath
committed
[SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter
Update user guide entry for `FeatureHasher` to match the Scala / Python doc, to describe the `categoricalCols` parameter. ## How was this patch tested? Doc only Author: Nick Pentreath <[email protected]> Closes #20293 from MLnick/SPARK-23127-catCol-userguide.
1 parent 9c4b998 commit 60203fc

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/ml-features.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -222,9 +222,9 @@ The `FeatureHasher` transformer operates on multiple columns. Each column may co
222222
numeric or categorical features. Behavior and handling of column data types is as follows:
223223

224224
- Numeric columns: For numeric features, the hash value of the column name is used to map the
225-
feature value to its index in the feature vector. Numeric features are never treated as
226-
categorical, even when they are integers. You must explicitly convert numeric columns containing
227-
categorical features to strings first.
225+
feature value to its index in the feature vector. By default, numeric features are not treated
226+
as categorical (even when they are integers). To treat them as categorical, specify the relevant
227+
columns using the `categoricalCols` parameter.
228228
- String columns: For categorical features, the hash value of the string "column_name=value"
229229
is used to map to the vector index, with an indicator value of `1.0`. Thus, categorical features
230230
are "one-hot" encoded (similarly to using [OneHotEncoder](ml-features.html#onehotencoder) with

0 commit comments

Comments
 (0)