Commit c19680b
committed
[SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data
## What changes were proposed in this pull request?
This PR is to maintain API parity with changes made in SPARK-17498 to support a new option
'keep' in StringIndexer to handle unseen labels or NULL values with PySpark.
Note: This is updated version of apache#17237 , the primary author of this PR is VinceShieh .
## How was this patch tested?
Unit tests.
Author: VinceShieh <[email protected]>
Author: Yanbo Liang <[email protected]>
Closes apache#18453 from yanboliang/spark-19852.1 parent c605fee commit c19680b
2 files changed
+27
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2132 | 2132 | | |
2133 | 2133 | | |
2134 | 2134 | | |
| 2135 | + | |
| 2136 | + | |
| 2137 | + | |
| 2138 | + | |
| 2139 | + | |
| 2140 | + | |
2135 | 2141 | | |
2136 | 2142 | | |
2137 | 2143 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
551 | 551 | | |
552 | 552 | | |
553 | 553 | | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
554 | 575 | | |
555 | 576 | | |
556 | 577 | | |
| |||
0 commit comments