Skip to content

Conversation

@SCHJonathan
Copy link
Contributor

@SCHJonathan SCHJonathan commented Nov 21, 2025

What changes were proposed in this pull request?

In PR #53024, we added SDP support for spark.sql(...) inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function.

However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response.

This went unnoticed in tests because, when the client sees an empty spark.sql(...) response, it falls back to creating an empty DataFrame holding the raw logical plan, which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback.

Why are the changes needed?

Does this PR introduce any user-facing change?

This PR fixes a bug introduced in #53024 where the server did not return the constructed spark.sql(...) response to the client.

How was this patch tested?

New tests

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @SCHJonathan . FYI, Apache Spark community does not use this kind of tag, '[bugfix]' in the PR title.

@SCHJonathan SCHJonathan changed the title [SPARK-54020][bugfix] empty response from SparkConnect server for spark.sql(...) inside FlowFunction [SPARK-54020]empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025
@SCHJonathan SCHJonathan changed the title [SPARK-54020]empty response from SparkConnect server for spark.sql(...) inside FlowFunction [SPARK-54020] empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025
@SCHJonathan
Copy link
Contributor Author

fyi @vicennial

@dongjoon-hyun
Copy link
Member

cc @sryza

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't ind, please file a new JIRA issue as a bug fix instead of reusing SPARK-54020, @SCHJonathan .

@SCHJonathan SCHJonathan changed the title [SPARK-54020] empty response from SparkConnect server for spark.sql(...) inside FlowFunction [SPARK-54452] empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025

test(
"SPARK-54452: spark.sql() outside a pipeline flow function should return a " +
"sql_command_result") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect the same behavior? According to the log, it is guarded by insidePipelineFlowFunction condition, isn't it?

if (insidePipelineFlowFunction) {
result.setRelation(relation)
return
}

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54452] empty response from SparkConnect server for spark.sql(...) inside FlowFunction [SPARK-54452] Fix empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025
proto.Command
.newBuilder()
.setSqlCommand(
proto.SqlCommand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why we use different setSqlCommand in two test cases? Can we use the following like the other test case in the same way?

            .setSqlCommand(
              proto.SqlCommand
                .newBuilder()
                .setInput(proto.Relation
                  .newBuilder()
                  .setSql(proto.SQL.newBuilder().setQuery("SELECT * FROM RANGE(5)"))
                  .build())
                .build())
            .build())

@dongjoon-hyun
Copy link
Member

Gentle ping, @SCHJonathan .

dongjoon-hyun pushed a commit that referenced this pull request Nov 23, 2025
…sql(...)` inside FlowFunction

### What changes were proposed in this pull request?

In PR #53024, we added SDP support for `spark.sql(...)` inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function.

However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response.

This went unnoticed in tests because, when the client sees an empty `spark.sql(...)` response, [it falls back to creating an empty DataFrame holding the raw logical plan](https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/session.py#L829-L835), which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

This PR fixes a bug introduced in #53024 where the server did not return the constructed spark.sql(...) response to the client.

### How was this patch tested?

New tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #53156 from SCHJonathan/jonathan-chang_data/fix-spark-sql-bug.

Authored-by: Yuheng Chang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 997525c)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to master/4.1.

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…sql(...)` inside FlowFunction

### What changes were proposed in this pull request?

In PR apache#53024, we added SDP support for `spark.sql(...)` inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function.

However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response.

This went unnoticed in tests because, when the client sees an empty `spark.sql(...)` response, [it falls back to creating an empty DataFrame holding the raw logical plan](https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/session.py#L829-L835), which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

This PR fixes a bug introduced in apache#53024 where the server did not return the constructed spark.sql(...) response to the client.

### How was this patch tested?

New tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#53156 from SCHJonathan/jonathan-chang_data/fix-spark-sql-bug.

Authored-by: Yuheng Chang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants