Skip to content

Commit 8e44ab8

Browse files
committed
updated doc
1 parent adc7315 commit 8e44ab8

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

docs/mllib-decision-tree.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,15 +77,17 @@ bins if the condition is not satisfied.
7777

7878
**Categorical features**
7979

80-
For `$M$` categorical features, one could come up with `$2^(M-1)-1$` split candidates. For
80+
For `$M$` categorical feature values, one could come up with `$2^(M-1)-1$` split candidates. For
8181
binary classification, we can reduce the number of split candidates to `$M-1$` by ordering the
8282
categorical feature values by the proportion of labels falling in one of the two classes (see
8383
Section 9.2.4 in
8484
[Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for
8585
details). For example, for a binary classification problem with one categorical feature with three
8686
categories A, B and C with corresponding proportion of label 1 as 0.2, 0.6 and 0.4, the categorical
8787
features are ordered as A followed by C followed B or A, B, C. The two split candidates are A \| C, B
88-
and A , B \| C where \| denotes the split.
88+
and A , B \| C where \| denotes the split. A similar ordering using impurity is performed
89+
for categorical feature values in multiclass classification when `$2^(M-1)-1$` is
90+
greater than the number of bins.
8991

9092
### Stopping rule
9193

0 commit comments

Comments
 (0)