Skip to content

Commit 1fd59b2

Browse files
author
Ilya Ganelin
committed
Updated documentation for accumulators to highlight lazy evaluation issue
1 parent 5525c20 commit 1fd59b2

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/programming-guide.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1316,7 +1316,15 @@ For accumulator updates performed inside <b>actions only</b>, Spark guarantees t
13161316
will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware
13171317
of that each task's update may be applied more than once if tasks or job stages are re-executed.
13181318

1319+
In addition, accumulators do not maintain lineage for the operations that use them. Consequently, accumulators updates are not guaranteed to be executed when made within a lazy transformation like `map()`. Unless something has triggered the evaluation of the lazy transformation that updates the value of the accumlator, subsequent operations will not themselves trigger that evaluation and the value of the accumulator will remain unchanged. The below code fragment demonstrates this issue:
13191320

1321+
<div data-lang="scala" markdown="1">
1322+
{% highlight scala %}
1323+
val acc = sc.accumulator(0)
1324+
data.map(x => acc += x; f(x))
1325+
// Here, acc is still 0 because no actions have cause the `map` to be computed.
1326+
{% endhighlight %}
1327+
</div>
13201328

13211329
# Deploying to a Cluster
13221330

0 commit comments

Comments
 (0)