Commit 9e26473
[SPARK-3159][ML] Add decision tree pruning
## What changes were proposed in this pull request?
Added subtree pruning in the translation from LearningNode to Node: a learning node having a single prediction value for all the leaves in the subtree rooted at it is translated into a LeafNode, instead of a (redundant) InternalNode
## How was this patch tested?
Added two unit tests under "mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala":
- test("SPARK-3159 tree model redundancy - classification")
- test("SPARK-3159 tree model redundancy - regression")
4 existing unit tests relying on the tree structure (existence of a specific redundant subtree) had to be adapted as the tested components in the output tree are now pruned (fixed by adding an extra _prune_ parameter which can be used to disable pruning for testing)
Author: Alessandro Solimando <[email protected]>
Closes #20632 from asolimando/master.1 parent 487377e commit 9e26473
File tree
5 files changed
+115
-65
lines changed- mllib/src
- main/scala/org/apache/spark/ml/tree
- impl
- test/scala/org/apache/spark
- mllib/tree
- ml
- classification
- tree/impl
5 files changed
+115
-65
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
| |||
266 | 265 | | |
267 | 266 | | |
268 | 267 | | |
| 268 | + | |
| 269 | + | |
269 | 270 | | |
270 | 271 | | |
271 | 272 | | |
272 | | - | |
273 | | - | |
274 | | - | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
275 | 277 | | |
276 | | - | |
277 | | - | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
278 | 285 | | |
279 | 286 | | |
280 | 287 | | |
| |||
283 | 290 | | |
284 | 291 | | |
285 | 292 | | |
286 | | - | |
287 | 293 | | |
288 | 294 | | |
289 | 295 | | |
| |||
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| |||
223 | 224 | | |
224 | 225 | | |
225 | 226 | | |
226 | | - | |
| 227 | + | |
227 | 228 | | |
228 | 229 | | |
229 | 230 | | |
230 | 231 | | |
231 | | - | |
| 232 | + | |
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
235 | 236 | | |
236 | 237 | | |
237 | | - | |
| 238 | + | |
238 | 239 | | |
239 | 240 | | |
240 | 241 | | |
241 | | - | |
| 242 | + | |
| 243 | + | |
242 | 244 | | |
243 | 245 | | |
244 | 246 | | |
| |||
Lines changed: 0 additions & 38 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
318 | | - | |
319 | | - | |
320 | | - | |
321 | 283 | | |
322 | 284 | | |
323 | 285 | | |
| |||
Lines changed: 90 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
320 | 323 | | |
321 | 324 | | |
322 | 325 | | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
327 | 330 | | |
328 | 331 | | |
329 | 332 | | |
| |||
362 | 365 | | |
363 | 366 | | |
364 | 367 | | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
369 | 372 | | |
370 | 373 | | |
371 | 374 | | |
| |||
407 | 410 | | |
408 | 411 | | |
409 | 412 | | |
410 | | - | |
| 413 | + | |
| 414 | + | |
411 | 415 | | |
412 | 416 | | |
413 | 417 | | |
| |||
631 | 635 | | |
632 | 636 | | |
633 | 637 | | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
634 | 702 | | |
635 | 703 | | |
636 | 704 | | |
637 | | - | |
638 | 705 | | |
639 | 706 | | |
640 | 707 | | |
641 | 708 | | |
642 | 709 | | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
643 | 723 | | |
Lines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
363 | 363 | | |
364 | 364 | | |
365 | 365 | | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
370 | 370 | | |
371 | 371 | | |
372 | 372 | | |
| |||
541 | 541 | | |
542 | 542 | | |
543 | 543 | | |
544 | | - | |
| 544 | + | |
545 | 545 | | |
546 | 546 | | |
547 | 547 | | |
| |||
0 commit comments