You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
ParquetCachedBatchSerializer is crashing
Steps/Code to reproduce bug
scala> val df = Seq(1231989812323l, 21989893421l, 1989823523l, 123122312123l).toDF
df: org.apache.spark.sql.DataFrame = [value: bigint]
scala> df.selectExpr("cast(value as timestamp)").cache.count
23/05/11 15:25:30 WARN GpuOverrides:
! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec; not all expressions can be replaced
!Expression <AttributeReference> value#5 cannot run on GPU because expression AttributeReference value#5 produces an unsupported type TimestampType
23/05/11 15:25:30 WARN GpuOverrides:
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
*Expression <Alias> count(1)#13L AS count#14L will run on GPU
*Exec <ShuffleExchangeExec> will run on GPU
*Partitioning <SinglePartition$> will run on GPU
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> partial_count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
!Exec <InMemoryTableScanExec> cannot run on GPU because unsupported data types in output: TimestampType [value]
23/05/11 15:25:30 WARN GpuOverrides:
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
*Expression <Alias> count(1)#13L AS count#14L will run on GPU
*Exec <ShuffleExchangeExec> will run on GPU
*Partitioning <SinglePartition$> will run on GPU
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> partial_count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
!Exec <InMemoryTableScanExec> cannot run on GPU because unsupported data types in output: TimestampType [value]
23/05/11 15:25:30 WARN GpuOverrides:
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
*Expression <Alias> count(1)#13L AS count#14L will run on GPU
*Exec <ShuffleExchangeExec> will run on GPU
*Partitioning <SinglePartition$> will run on GPU
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> partial_count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
!Exec <InMemoryTableScanExec> cannot run on GPU because unsupported data types in output: TimestampType [value]
23/05/11 15:25:30 WARN GpuOverrides:
*Exec <ShuffleExchangeExec> will run on GPU
*Partitioning <SinglePartition$> will run on GPU
*Exec <HashAggregateExec> will run on GPU
*Expression <AggregateExpression> partial_count(1) will run on GPU
*Expression <Count> count(1) will run on GPU
!Exec <InMemoryTableScanExec> cannot run on GPU because unsupported data types in output: TimestampType [value]
23/05/11 15:25:31 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ArrayIndexOutOfBoundsException: 0
at com.nvidia.spark.rapids.GpuColumnVector$GpuColumnarBatchBuilder.builder(GpuColumnVector.java:409)
at com.nvidia.spark.rapids.GpuColumnVector$GpuColumnarBatchBuilder.copyColumnar(GpuColumnVector.java:405)
at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.$anonfun$addBatchToConcat$2(HostColumnarToGpu.scala:259)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.$anonfun$addBatchToConcat$1(HostColumnarToGpu.scala:258)
at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.$anonfun$addBatchToConcat$1$adapted(HostColumnarToGpu.scala:256)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.HostToGpuCoalesceIterator.addBatchToConcat(HostColumnarToGpu.scala:256)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.addBatch(GpuCoalesceBatches.scala:590)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.populateCandidateBatches(GpuCoalesceBatches.scala:414)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$next$1(GpuCoalesceBatches.scala:549)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:529)
at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:248)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.aggregateInputBatches(aggregate.scala:604)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.$anonfun$next$2(aggregate.scala:556)
at scala.Option.getOrElse(Option.scala:189)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:553)
at com.nvidia.spark.rapids.GpuHashAggregateIterator.next(aggregate.scala:498)
at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:318)
at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:340)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Expected behavior
It should return the count
**Additional Context
Tested on Spark 3.3.2 and Spark 3.4.0
The text was updated successfully, but these errors were encountered:
Describe the bug
ParquetCachedBatchSerializer is crashing
Steps/Code to reproduce bug
Expected behavior
It should return the count
**Additional Context
Tested on Spark 3.3.2 and Spark 3.4.0
The text was updated successfully, but these errors were encountered: