Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR is a followup of #38970 which makes the test pass with ANSI mode on.

Why are the changes needed?

To recover the build with ANSI mode on. Currently it's broke as follows:

======================================================================
ERROR [2.651s]: test_cast (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_column.py", line 119, in test_cast
    df.select(df.id.cast(x)).toPandas(), df2.select(df2.id.cast(x)).toPandas()
  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1466, in toPandas
    return self._session.client._to_pandas(query)
  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
    return self._execute_and_fetch(req)
  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "[DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION] Cannot resolve "id" due to data type mismatch: cannot cast "BIGINT" to "BINARY" with ANSI mode on.
If you have to cast "BIGINT" to "BINARY", you can set "spark.sql.ansi.enabled" as 'false'.;
'Project [unresolvedalias(cast(id#31L as binary), None)]
+- SubqueryAlias spark_catalog.default.test_connect_basic_table_1
   +- Relation spark_catalog.default.test_connect_basic_table_1[id#31L,name#32] parquet
"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {created_time:"2022-12-09T01:54:45.378316841+00:00", grpc_status:2, grpc_message:"[DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION] Cannot resolve \"id\" due to data type mismatch: cannot cast \"BIGINT\" to \"BINARY\" with ANSI mode on.\nIf you have to cast \"BIGINT\" to \"BINARY\", you can set \"spark.sql.ansi.enabled\" as \'false\'.;\n\'Project [unresolvedalias(cast(id#31L as binary), None)]\n+- SubqueryAlias spark_catalog.default.test_connect_basic_table_1\n   +- Relation spark_catalog.default.test_connect_basic_table_1[id#31L,name#32] parquet\n"}"
>

https://github.com/apache/spark/actions/runs/3671813752

Does this PR introduce any user-facing change?

No, test-only.

How was this patch tested?

This PR fixes the unittest to make passed. I manually tested.

@HyukjinKwon
Copy link
Member Author

cc @amaliujia @zhengruifeng FYI

@zhengruifeng
Copy link
Contributor

LGTM

@HyukjinKwon
Copy link
Member Author

Merged to master.

@amaliujia
Copy link
Contributor

late LGTM!

HyukjinKwon added a commit that referenced this pull request Dec 13, 2022
…ke the tests to pass with/without ANSI mode

### What changes were proposed in this pull request?

This PR is another followup of #39034 that, instead, make the tests to pass with/without ANSI mode.

### Why are the changes needed?

Spark Connect uses isolated Spark session so setting the configuration in PySpark side does not take an effect. Therefore, the test still fails, see https://github.com/apache/spark/actions/runs/3681383627/jobs/6228030132.

We should make the tests pass with/without ANSI mode for now.

### Does this PR introduce _any_ user-facing change?
No, test-only

### How was this patch tested?

Manually tested via:

```bash
SPARK_ANSI_SQL_MODE=true ./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_column'
```

Closes #39050 from HyukjinKwon/SPARK-41412.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…e on

### What changes were proposed in this pull request?

This PR is a followup of apache#38970 which makes the test pass with ANSI mode on.

### Why are the changes needed?

To recover the build with ANSI mode on. Currently it's broke as follows:

```
======================================================================
ERROR [2.651s]: test_cast (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_column.py", line 119, in test_cast
    df.select(df.id.cast(x)).toPandas(), df2.select(df2.id.cast(x)).toPandas()
  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1466, in toPandas
    return self._session.client._to_pandas(query)
  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 333, in _to_pandas
    return self._execute_and_fetch(req)
  File "/__w/spark/spark/python/pyspark/sql/connect/client.py", line 418, in _execute_and_fetch
    for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()):
  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.9/dist-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "[DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION] Cannot resolve "id" due to data type mismatch: cannot cast "BIGINT" to "BINARY" with ANSI mode on.
If you have to cast "BIGINT" to "BINARY", you can set "spark.sql.ansi.enabled" as 'false'.;
'Project [unresolvedalias(cast(id#31L as binary), None)]
+- SubqueryAlias spark_catalog.default.test_connect_basic_table_1
   +- Relation spark_catalog.default.test_connect_basic_table_1[id#31L,name#32] parquet
"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {created_time:"2022-12-09T01:54:45.378316841+00:00", grpc_status:2, grpc_message:"[DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION] Cannot resolve \"id\" due to data type mismatch: cannot cast \"BIGINT\" to \"BINARY\" with ANSI mode on.\nIf you have to cast \"BIGINT\" to \"BINARY\", you can set \"spark.sql.ansi.enabled\" as \'false\'.;\n\'Project [unresolvedalias(cast(id#31L as binary), None)]\n+- SubqueryAlias spark_catalog.default.test_connect_basic_table_1\n   +- Relation spark_catalog.default.test_connect_basic_table_1[id#31L,name#32] parquet\n"}"
>
```

https://github.com/apache/spark/actions/runs/3671813752

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

This PR fixes the unittest to make passed. I manually tested.

Closes apache#39034 from HyukjinKwon/SPARK-41412-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…ke the tests to pass with/without ANSI mode

### What changes were proposed in this pull request?

This PR is another followup of apache#39034 that, instead, make the tests to pass with/without ANSI mode.

### Why are the changes needed?

Spark Connect uses isolated Spark session so setting the configuration in PySpark side does not take an effect. Therefore, the test still fails, see https://github.com/apache/spark/actions/runs/3681383627/jobs/6228030132.

We should make the tests pass with/without ANSI mode for now.

### Does this PR introduce _any_ user-facing change?
No, test-only

### How was this patch tested?

Manually tested via:

```bash
SPARK_ANSI_SQL_MODE=true ./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_column'
```

Closes apache#39050 from HyukjinKwon/SPARK-41412.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-41412-followup branch January 15, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants