-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25586][MLlib][Core] Replace toString method with summarize m… #22604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ethod in GeneralizedLinearRegressionTrainingSummary After the change in SPARK-25118, which enables spark-shell to run with default log level, test_glr_summary started failing with StackOverflow error. Cause: ClosureCleaner calls logDebug on various objects and when it is called for GeneralizedLinearRegressionTrainingSummary, it starts a spark job which runs into infinite loop and fails with the below exception. Fix: Remove toString method and move the existing logic to a new method "summarize". This follows the general guideline as other summary objects do not have a toString method as well. Testing Done: Ran the python "mllib" and scala/junit tests for this module
|
Jenkins, ok to test |
|
This just removes Alternatively, we go back to the other JIRA and figure out why it was causing an infinite loop in the first place. |
|
Few variables (I think there are 3) printed in toString cause a Spark Job to be started and the main reason is that those variables are lazily evaluated. I can remove those variables from toString altogether. An alternative fix will be to change those lazily evaluated variables to instantaneously evaluated but that may cause some other issues that I am not aware of. |
|
Test build #96830 has finished for PR 22604 at commit
|
|
If I remove the following code, then the test succeeds and no spark jobs are started. Shall I do this instead? ` ` |
|
OK, backing up here, yes it's important to specify what the problem was that started this. It's exhibited in the error logs from your PR for SPARK-25118 (which is not submitted; might be worth clarifying your description). Yes, inside and that calls While I don't think it's ideal to call distributed jobs in While we can remove the But then, the value of this debug statement goes away almost entirely, I think, in Scala 2.12, where closures aren't implemented with outer classes. I think I favor just removing this last debug statement in Any other thoughts out there -- anyone care a lot about the debug info from the closure cleaner about outer class(es)? |
|
Makes sense and let me update the description about SPARK-25118 |
|
I have opened another PR for this, with the recommended changes: #22616 I had to change a few other logDebug statements as well. Please let me know if that works. |
|
OK, this PR should be closed then. |
…ethod in GeneralizedLinearRegressionTrainingSummary
What changes were proposed in this pull request?
After the change in SPARK-25118 (not submitted yet), which enables spark-shell to run with default
log level, test_glr_summary started failing with StackOverflow error.
Cause: ClosureCleaner calls logDebug on various objects and when it is called
for GeneralizedLinearRegressionTrainingSummary, it starts a spark job which runs
into infinite loop and fails with the below exception.
Fix: Remove toString method and move the existing logic to a new method
"summarize". This follows the general guideline as other summary objects do not
have a toString method as well.
How was this patch tested?
Ran the python "mllib" and scala/junit tests for this module