Commit 3f3201a
[SPARK-37203][SQL] Fix NotSerializableException when observe with TypedImperativeAggregate
### What changes were proposed in this pull request?
Currently,
```
val namedObservation = Observation("named")
val df = spark.range(100)
val observed_df = df.observe(
namedObservation, percentile_approx($"id", lit(0.5), lit(100)).as("percentile_approx_val"))
observed_df.collect()
namedObservation.get
```
throws exception as follows:
```
15:16:27.994 ERROR org.apache.spark.util.Utils: Exception encountered
java.io.NotSerializableException: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55)
at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1434)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:616)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
This PR will fix the issue. After the change,
`assert(namedObservation.get === Map("percentile_approx_val" -> 49))`
`java.io.NotSerializableException` will not happen.
### Why are the changes needed?
Fix `NotSerializableException` when observe with `TypedImperativeAggregate`.
### Does this PR introduce _any_ user-facing change?
No. This PR change the implement of `AggregatingAccumulator` who uses serialize and deserialize of `TypedImperativeAggregate` now.
### How was this patch tested?
New tests.
Closes apache#34474 from beliefer/SPARK-37203.
Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 2b4099f commit 3f3201a
File tree
5 files changed
+46
-8
lines changed- core/src/main/scala/org/apache/spark/util
- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate
- core/src
- main/scala/org/apache/spark/sql/execution
- test/scala/org/apache/spark/sql
- streaming
5 files changed
+46
-8
lines changedLines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
160 | 164 | | |
161 | 165 | | |
162 | 166 | | |
| |||
179 | 183 | | |
180 | 184 | | |
181 | 185 | | |
182 | | - | |
| 186 | + | |
183 | 187 | | |
184 | 188 | | |
185 | 189 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
287 | | - | |
| 287 | + | |
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
| |||
Lines changed: 23 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
166 | | - | |
167 | | - | |
168 | | - | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
169 | 178 | | |
170 | 179 | | |
171 | 180 | | |
| |||
188 | 197 | | |
189 | 198 | | |
190 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
191 | 211 | | |
192 | 212 | | |
193 | 213 | | |
| |||
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
754 | 754 | | |
755 | 755 | | |
756 | 756 | | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
757 | 770 | | |
758 | 771 | | |
759 | 772 | | |
| |||
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
420 | | - | |
| 420 | + | |
| 421 | + | |
421 | 422 | | |
422 | 423 | | |
423 | 424 | | |
| |||
444 | 445 | | |
445 | 446 | | |
446 | 447 | | |
447 | | - | |
| 448 | + | |
448 | 449 | | |
449 | 450 | | |
450 | 451 | | |
451 | 452 | | |
452 | 453 | | |
453 | 454 | | |
454 | 455 | | |
455 | | - | |
| 456 | + | |
456 | 457 | | |
457 | 458 | | |
458 | 459 | | |
| |||
0 commit comments