Commit cc12cf6
committed
[SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API
### What changes were proposed in this pull request?
[[SPARK-29376] Upgrade Apache Arrow to version 0.15.1](#26133) upgrades to Arrow 0.15 at Scala/Java/Python. This PR aims to upgrade `SparkR` to use Arrow 0.15 API. Currently, it's broken.
### Why are the changes needed?
First of all, it turns out that our Jenkins jobs (including PR builder) ignores Arrow test. Arrow 0.15 has a breaking R API changes at [ARROW-5505](https://issues.apache.org/jira/browse/ARROW-5505) and we missed that. AppVeyor was the only one having SparkR Arrow tests but it's broken now.
**Jenkins**
```
Skipped ------------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#25)
- arrow not installed
```
Second, Arrow throws OOM on AppVeyor environment (Windows JDK8) like the following because it still has Arrow 0.14.
```
Warnings -----------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization. Reason: Error in handleErrors(returnStatus, conn): java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:669)
at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)
```
It is due to the version mismatch.
```java
int messageLength = MessageSerializer.bytesToInt(buffer.array());
if (messageLength == IPC_CONTINUATION_TOKEN) {
buffer.clear();
// ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
if (in.readFully(buffer) == 4) {
messageLength = MessageSerializer.bytesToInt(buffer.array());
}
}
// Length of 0 indicates end of stream
if (messageLength != 0) {
// Read the message into the buffer.
ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);
```
After upgrading this to 0.15, we are hitting ARROW-5505. This PR upgrades Arrow version in AppVeyor and fix the issue.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the AppVeyor.
This PR passed here.
- https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28909044
```
SparkSQL Arrow optimization: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
................
```
Closes #26555 from dongjoon-hyun/SPARK-R-TEST.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>1 parent e88267c commit cc12cf6
3 files changed
+4
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
169 | | - | |
| 169 | + | |
170 | 170 | | |
171 | | - | |
| 171 | + | |
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
245 | | - | |
| 245 | + | |
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
| 45 | + | |
47 | 46 | | |
48 | | - | |
49 | 47 | | |
50 | 48 | | |
51 | 49 | | |
| |||
0 commit comments