-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5955][MLLIB] add checkpointInterval to ALS #5076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #28739 has finished for PR 5076 at commit
|
|
I've seen the first point before and thus I'm +1 for this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda forget how checkpoint gets executed here. Is this count necessary? Or this is for caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for implicit preference, this is not necessary because we are computing YtY anyway.
|
Test build #28801 has finished for PR 5076 at commit
|
|
test this please |
|
Test build #28820 has finished for PR 5076 at commit
|
|
Test build #28903 has finished for PR 5076 at commit
|
|
test this please |
|
Test build #28908 has finished for PR 5076 at commit
|
|
LGTM! |
|
Thanks! Merged into master. |
Add checkpiontInterval to ALS to prevent: 1. StackOverflow exceptions caused by long lineage, 2. large shuffle files generated during iterations, 3. slow recovery when some node fail. srowen coderxiang Author: Xiangrui Meng <[email protected]> Closes #5076 from mengxr/SPARK-5955 and squashes the following commits: df56791 [Xiangrui Meng] update impl to reuse code 29affcb [Xiangrui Meng] do not materialize factors in implicit 20d3f7f [Xiangrui Meng] add checkpointInterval to ALS (cherry picked from commit 6b36470) Signed-off-by: Xiangrui Meng <[email protected]> Conflicts: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
|
Merged this into branch-1.3 as well because this helps with scalability. |
|
Hi guys, First of all, I would like to thank you guys for developing spark and putting it open source that we can use. I'm new to Spark and Scala, and working in a project involving matrix factorizations in Spark. I have a problem regarding running ALS in Spark. It has a stackoverflow due to long linage chain as per comments on the internet. One of their suggestion is to use the setCheckpointInterval so that for every 10-20 iterations, we can checkpoint the RDDs and it prevents the error. Just want to ask details on how to do checkpointing with ALS. I am using spark-kernel developed by IBM: https://github.com/ibm-et/spark-kernel instead of spark-shell. Here are some of my specific questions regarding details on checkpoint:
Thanks a lot! |
Add checkpiontInterval to ALS to prevent:
@srowen @coderxiang