-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-733] Add documentation on use of accumulators in lazy transformation #4022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #25474 has started for PR 4022 at commit
|
|
Test build #25474 has finished for PR 4022 at commit
|
|
Test PASSed. |
|
lgtm |
docs/programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this is worded a bit confusingly: what would it mean for an accumulator to "maintain lineage"? I think this is from @JoshRosen's PR description, but IMO it might be better to remove that particular phrasing. What about a slight re-wording:
Accumulators do not change the lazy evaluation model of Spark. Their value is only
updated once the RDD in which they are being modified is computed as part of an
action. The below code fragment demonstrates this property:
I also didn't call it an "issue" because it's just a property of how they work, I don't think it's necessarily a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion - I've updated the doc.
|
Test build #25668 has started for PR 4022 at commit
|
|
Test build #25668 has finished for PR 4022 at commit
|
|
Test FAILed. |
|
Okay new version LGTM! Jenkins, test this please. |
|
Test build #25671 has started for PR 4022 at commit
|
|
Test build #25671 has finished for PR 4022 at commit
|
|
Test PASSed. |
…mation I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation. Author: Ilya Ganelin <[email protected]> Closes #4022 from ilganeli/SPARK-733 and squashes the following commits: 587def5 [Ilya Ganelin] Updated to clarify verbage df3afd7 [Ilya Ganelin] Revert "Partially updated task metrics to make some vars private" 3f6c512 [Ilya Ganelin] Revert "Completed refactoring to make vars in TaskMetrics class private" 58034fb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 4dc2cdb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 3a38db1 [Ilya Ganelin] Verified documentation update by building via jekyll 33b5a2d [Ilya Ganelin] Added code examples for java and python 1fd59b2 [Ilya Ganelin] Updated documentation for accumulators to highlight lazy evaluation issue 5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private (cherry picked from commit fd3a8a1) Signed-off-by: Imran Rashid <[email protected]>
I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation.