Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark-3.4 - Integration test failures due to GpuCreateDataSourceTableAsSelectCommand #8019

Closed
5 tasks
Tracked by #6991
nartal1 opened this issue Apr 3, 2023 · 4 comments · Fixed by #8177
Closed
5 tasks
Tracked by #6991
Assignees
Labels
bug Something isn't working Spark 3.4+ Spark 3.4+ issues test Only impacts tests

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Apr 3, 2023

Describe the bug
Some of the integration tests are failing due to Exception thrown from GpuCreateDataSourceTableAsSelectCommand.

Failing tests:

Exception stack trace:

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o326.saveAsTable.
E                   : java.lang.UnsupportedOperationException: org.apache.spark.sql.rapids.shims.GpuCreateDataSourceTableAsSelectCommand does not support row-based execution
E                   	at com.nvidia.spark.rapids.GpuRunnableCommand.run(GpuRunnableCommandExec.scala:49)
E                   	at com.nvidia.spark.rapids.GpuRunnableCommand.run$(GpuRunnableCommandExec.scala:47)
E                   	at org.apache.spark.sql.rapids.shims.GpuCreateDataSourceTableAsSelectCommand.run(GpuCreateDataSourceTableAsSelectCommandShims.scala:37)
E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult$lzycompute(GpuExecutedCommandExec.scala:52)
E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult(GpuExecutedCommandExec.scala:50)
E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.executeCollect(GpuExecutedCommandExec.scala:61)
E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)

Not certain if this is related but we do see CommandResultExec is not supported on GPU

! <CommandResultExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.CommandResultExec
  @Expression <AttributeReference> namespace#52 could run on GPU
  @Expression <AttributeReference> tableName#53 could run on GPU
  @Expression <AttributeReference> isTemporary#54 could run on GPU

Steps/Code to reproduce bug
Run integration tests for failing tests
cd spark-rapids/integration_tests/
./run_pyspark_from_build.sh -k <name of test> //Example : ./run_pyspark_from_build.sh -k test_write_save_table

Expected behavior

  1. Tests should pass OR
  2. XFAIL the test or fallback to CPU if there is a valid reason.

Environment details (please complete the following information)
local

@nartal1 nartal1 added bug Something isn't working test Only impacts tests Spark 3.4+ Spark 3.4+ issues labels Apr 3, 2023
@NVnavkumar NVnavkumar self-assigned this Apr 18, 2023
@NVnavkumar
Copy link
Collaborator

Difference in Physical Plans (using logTransformations config):

Spark 3.3.0:

 23/04/20 17:43:47 WARN GpuOverrides: Transformed query:
Original Plan:
Execute CreateDataSourceTableAsSelectCommand `tmp_table_master_1058976670_1`, Overwrite, [_c0]
+- Coalesce 1
   +- Scan ExistingRDD[_c0#3]

Transformed Plan:
Execute GpuCreateDataSourceTableAsSelectCommand `tmp_table_master_1058976670_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6eb9737b, false, 0
+- GpuCoalesce 1
   +- Scan ExistingRDD[_c0#3]

23/04/20 17:43:47 WARN GpuTransitionOverrides: Transformed query:
Original Plan:
ColumnarToRow
+- Execute GpuCreateDataSourceTableAsSelectCommand `tmp_table_master_1058976670_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6eb9737b, false, 0
   +- GpuCoalesce 1
      +- RowToColumnar
         +- Scan ExistingRDD[_c0#3]

Transformed Plan:
GpuColumnarToRow false
+- Execute GpuCreateDataSourceTableAsSelectCommand `tmp_table_master_1058976670_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6eb9737b, false, 0
   +- GpuCoalesce 1
      +- GpuRowToColumnar targetsize(104857600)
         +- Scan ExistingRDD[_c0#3]

Spark 3.4.0

23/04/20 17:50:43 WARN GpuOverrides: Transformed query:
Original Plan:
Execute CreateDataSourceTableAsSelectCommand
   +- CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`tmp_table_master_2129092913_1`, Overwrite, [_c0]
         +- Repartition 1, false
            +- LogicalRDD [_c0#3], false

Transformed Plan:
GpuExecute GpuCreateDataSourceTableAsSelectCommand
   +- GpuCreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`tmp_table_master_2129092913_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6d21e8a6, false, 0
      +- Repartition 1, false
         +- LogicalRDD [_c0#3], false

23/04/20 17:50:43 WARN GpuTransitionOverrides: Transformed query:
Original Plan:
GpuExecute GpuCreateDataSourceTableAsSelectCommand
   +- GpuCreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`tmp_table_master_2129092913_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6d21e8a6, false, 0
      +- Repartition 1, false
         +- LogicalRDD [_c0#3], false

Transformed Plan:
GpuExecute GpuCreateDataSourceTableAsSelectCommand
   +- GpuCreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`tmp_table_master_2129092913_1`, Overwrite, [_c0], class org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat, com.nvidia.spark.rapids.GpuParquetFileFormat@6d21e8a6, false, 0
      +- Repartition 1, false
         +- LogicalRDD [_c0#3], false

@NVnavkumar
Copy link
Collaborator

Upon further investigation, Spark 3.3 uses DataWritingCommandExec, while Spark 3.4 uses ExecutedCommandExec.

GpuDataWritingCommandExec implementation of sideEffectResult:

  private lazy val sideEffectResult: Seq[ColumnarBatch] =
    cmd.runColumnar(sparkSession, child)

But GpuExecutedCommandExec implementation of sideEffectResult:

  lazy val sideEffectResult: Seq[InternalRow] = {
    val converter = CatalystTypeConverters.createToCatalystConverter(schema)
    cmd.run(SparkSession.getActiveSession.orNull).map(converter(_).asInstanceOf[InternalRow])
  }

That's what's causing the exception thrown in these test failures.

There is a GpuRunnableCommandExec which essentially is the same implementation as GpuDataWritingCommandExec but is not currently used, but takes a GpuRunnableCommand as what would be needed. I think we would use that instead of GpuExecutedCommandExec.

@jlowe any thoughts on this, since GpuExecutedCommandExec would be the normal replacement but it's implementation is done the way it is due to Delta Lake?

@jlowe
Copy link
Contributor

jlowe commented Apr 20, 2023

IMO GpuRunnableCommand should not exist, see #7619.

GpuExecutedCommandExec is the right replacement here. The problem seems to be primarily with GpuDataWritingCommand. Per the discussion in #7619 that Command arguably should return rows and not complain when run is called. Arguably GpuDataWritingCommandExec should call cmd.run and GpuDataWritingCommand should implement the existing run rather than a separate runColumnar.

@andygrove
Copy link
Contributor

@NVnavkumar I added the test test_non_empty_ctas to this issue since it fails with the same error as the other tests in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 3.4+ Spark 3.4+ issues test Only impacts tests
Projects
None yet
4 participants