Commit f5eff85
Merge pull request alteryx#83 from ewencp/pyspark-accumulator-add-method
Add an add() method to pyspark accumulators.
Add a regular method for adding a term to accumulators in
pyspark. Currently if you have a non-global accumulator, adding to it
is awkward. The += operator can't be used for non-global accumulators
captured via closure because it's involves an assignment. The only way
to do it is using __iadd__ directly.
Adding this method lets you write code like this:
def main():
sc = SparkContext()
accum = sc.accumulator(0)
rdd = sc.parallelize([1,2,3])
def f(x):
accum.add(x)
rdd.foreach(f)
print accum.value
where using accum += x instead would have caused UnboundLocalError
exceptions in workers. Currently it would have to be written as
accum.__iadd__(x).
(cherry picked from commit 747f538)
Signed-off-by: Reynold Xin <[email protected]>1 parent 59d6f06 commit f5eff85
1 file changed
+12
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
45 | 52 | | |
46 | 53 | | |
47 | 54 | | |
| |||
139 | 146 | | |
140 | 147 | | |
141 | 148 | | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
142 | 153 | | |
143 | 154 | | |
144 | | - | |
| 155 | + | |
145 | 156 | | |
146 | 157 | | |
147 | 158 | | |
| |||
0 commit comments