-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3162][MLlib] Local Tree Training Pt 1: Refactor RandomForest.scala into utility classes #19758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #83910 has finished for PR 19758 at commit
|
|
Test build #83911 has finished for PR 19758 at commit
|
|
Test build #83914 has finished for PR 19758 at commit
|
WeichenXu123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. I mainly reviewed the new added testsuite part.
| // label: 2 --> values: 2 | ||
| // Expected split: feature value 1 on the left, values (0, 2) on the right | ||
| val values = Array(1, 1, 0, 2, 2) | ||
| val featureArity = values.max + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make the test more strict, can you increase the featureArity, numExamples and numClasses ? e.g., featureArity = 6 and numExamples = 10 and numClasses = 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 thanks for the feedback! Definitely agree that the test is a little weak right now.
IMO it's mainly weak due to the low feature arity (there only three possible splits, so the right one could be picked by chance). I think increasing the number of classes/examples substantially might make the test harder to reason about, but not opposed to that either - let me know what you think.
What about something like:
val values = Array(0, 1, 2, 3, 2, 2, 4)
val labels = Array(0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 2.0)
// label: 0 --> values: 0, 1
// label: 1 --> values: 2, 3
// label: 2 --> values: 2, 2, 4
// Expected split: feature values (0, 1) on the left, values (2, 3, 4) on the right
This way we still test multiclass classification & test the split-selection logic more rigorously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK. thanks!
|
Test build #84379 has finished for PR 19758 at commit
|
|
ping? |
|
Any updates? this PR seems to address critical issue: https://issues.apache.org/jira/browse/SPARK-3162 |
Closes apache#17422 Closes apache#17619 Closes apache#18034 Closes apache#18229 Closes apache#18268 Closes apache#17973 Closes apache#18125 Closes apache#18918 Closes apache#19274 Closes apache#19456 Closes apache#19510 Closes apache#19420 Closes apache#20090 Closes apache#20177 Closes apache#20304 Closes apache#20319 Closes apache#20543 Closes apache#20437 Closes apache#21261 Closes apache#21726 Closes apache#14653 Closes apache#13143 Closes apache#17894 Closes apache#19758 Closes apache#12951 Closes apache#17092 Closes apache#21240 Closes apache#16910 Closes apache#12904 Closes apache#21731 Closes apache#21095 Added: Closes apache#19233 Closes apache#20100 Closes apache#21453 Closes apache#21455 Closes apache#18477 Added: Closes apache#21812 Closes apache#21787 Author: hyukjinkwon <[email protected]> Closes apache#21781 from HyukjinKwon/closing-prs.
What changes were proposed in this pull request?
Breaks up #19433 to help unblock #19666; after this PR is merged, #19666 can be merged.
This PR contains the changes made to migrate functionality from RandomForest.scala into the following utility classes:
The PR also adds tests for split selection logic in TreeSplitUtilsSuite.
A follow-up PR will include the other changes from #19433:
How was this patch tested?
Adds unit tests for split selection logic in TreeSplitUtilsSuite