Skip to content

Conversation

@gjxdxh
Copy link

@gjxdxh gjxdxh commented Aug 11, 2023

What changes were proposed in this pull request?

Add ProducedRowCount field to SparkListenerConnectOperationFinished

Why are the changes needed?

Needed for showing number of rows getting produced

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Unit test

@HyukjinKwon HyukjinKwon changed the title [SPARK-44776] Add ProducedRowCount to SparkListenerConnectOperationFinished [SPARK-44776][CONNECT] Add ProducedRowCount to SparkListenerConnectOperationFinished Aug 11, 2023
@gjxdxh gjxdxh requested a review from jdesjean August 14, 2023 20:10
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run

./build/mvn -Pscala-2.12 scalafmt:format -Dscalafmt.skip=false -Dscalafmt.validateOnly=false -Dscalafmt.changedOnly=false -pl connector/connect/common -pl connector/connect/server -pl connector/connect/client/jvm

to lint

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two nits

/**
* Post @link org.apache.spark.sql.connect.service.SparkListenerConnectOperationFinished.
* @param producedRowsCountOpt
* Number of rows that are returned to the user. None is expected when the operation does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: why not use 0 if not rows are returned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning None would make it clear to us whether the corresponding query should have produced row or not, for example for a SELECT statement that has no result, we would put 0 here. And for query Like INSERT INTO, we would put a None instead of 0 here. This could better tell us whether a produced row is expected or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with gjxdxh that it's clearer with None that no rows are expected for this operation.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I would like @gengliangwang 's comment to be addressed before merging it :-).

@gjxdxh gjxdxh requested a review from gengliangwang August 21, 2023 13:27
@HyukjinKwon
Copy link
Member

Merged to master and branch-3.5.

HyukjinKwon pushed a commit that referenced this pull request Aug 22, 2023
…erationFinished

### What changes were proposed in this pull request?
Add ProducedRowCount field to SparkListenerConnectOperationFinished

### Why are the changes needed?
Needed for showing number of rows getting produced

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test

Closes #42454 from gjxdxh/SPARK-44776.

Authored-by: Lingkai Kong <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 4646991)
Signed-off-by: Hyukjin Kwon <[email protected]>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…erationFinished

### What changes were proposed in this pull request?
Add ProducedRowCount field to SparkListenerConnectOperationFinished

### Why are the changes needed?
Needed for showing number of rows getting produced

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test

Closes apache#42454 from gjxdxh/SPARK-44776.

Authored-by: Lingkai Kong <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
dongjoon-hyun added a commit that referenced this pull request May 7, 2024
…-43923: commands send events - get_resources_command`

### What changes were proposed in this pull request?

This PR aims to disable a flaky test, `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`, temporarily.

To re-enable this, SPARK-48164 is created as a blocker issue for 4.0.0.

### Why are the changes needed?

This test case was added at `Apache Spark 3.5.0`, but it has been flaky and causes many re-tries in our GitHub Action CI environment.

- #42454

- https://github.com/apache/spark/actions/runs/8979348499/job/24661200052
```
[info] - SPARK-43923: commands send events ((get_resources_command {
[info] }
[info] ,None)) *** FAILED *** (35 milliseconds)
[info]   VerifyEvents.this.listener.executeHolder.isDefined was false (SparkConnectServiceSuite.scala:873)
```

This PR aims to stabilize CI first and to focus this flaky issue as a blocker level before going on `Spark Connect GA` in SPARK-48164 before Apache Spark 4.0.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46425 from dongjoon-hyun/SPARK-48163.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants