Skip to content

Conversation

@xupefei
Copy link
Contributor

@xupefei xupefei commented Oct 23, 2024

What changes were proposed in this pull request?

This PR tries to stabilise flaky tests which involve thread pools. Some tests are failing due to the thread pool having 2 threads instead of 3 or 4.

Why are the changes needed?

To let CI pass.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI will tell.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, @xupefei .

@HyukjinKwon HyukjinKwon changed the title [SPARK-48139] Try stabilising multi-thread tests in CI [SPARK-48139][CONNECT][TESTS] Try stabilising multi-thread tests in CI Oct 23, 2024
@dongjoon-hyun
Copy link
Member

To @xupefei , could you add multiple empty commits on this PR in order to verify your claim?

CI will tell.

@xupefei
Copy link
Contributor Author

xupefei commented Oct 28, 2024

An inrelavent test is failing:


TypeError: Channel.unary_stream() got an unexpected keyword argument '_registered_method'
[info] - python listener process: process terminates after listener is removed *** FAILED *** (2 seconds, 445 milliseconds)

@xupefei
Copy link
Contributor Author

xupefei commented Oct 30, 2024

Failing on unrelated tests.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Thank you, @xupefei and @HyukjinKwon .

Yes, the irrelevant failures seem to block the testing. Let's merge this PR.


// TODO(SPARK-48139): Re-enable `SparkSessionE2ESuite.interrupt tag`
ignore("interrupt tag") {
test("interrupt tag") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, unfortunately, it turns out this doesn't help.

[info] - interrupt tag *** FAILED *** (1 minute)
[info]   Expected exception org.apache.spark.SparkException to be thrown, but java.util.concurrent.TimeoutException was thrown (SparkSessionE2ESuite.scala:250)

@dongjoon-hyun
Copy link
Member

Sorry but let me revert this because this failed twice a day already in master commit builder, @xupefei .

I guess there might be more failures on PR Builders after this PR.

@dongjoon-hyun
Copy link
Member

This is reverted via the following.

@xupefei xupefei deleted the ci-threadpool branch October 31, 2024 22:12
@xupefei
Copy link
Contributor Author

xupefei commented Oct 31, 2024

Sounds good!
We should refactor these tests to use only two threads. That would definitely resolve this issue.

@xupefei
Copy link
Contributor Author

xupefei commented Oct 31, 2024

@dongjoon-hyun Could you disable Cancellation APIs in SparkSession are isolated and link it to SPARK-48139?

@dongjoon-hyun
Copy link
Member

To @xupefei , I believe we should handle them separately. You can apply most of your patch if you exclude these lines.

-  // TODO(SPARK-48139): Re-enable `SparkSessionE2ESuite.interrupt tag`
-  ignore("interrupt tag") {
+  test("interrupt tag") {

In other words, you should file and use a different JIRA ID for the rest of your contribution because SPARK-48139 is dedicated to this ignore("interrupt tag") exactly.

@dongjoon-hyun
Copy link
Member

If you need the rest of patch, please open a new PR (by excluding ignore part) with new ID. Then, we can merge it back without brining back this flakiness.

@xupefei
Copy link
Contributor Author

xupefei commented Nov 1, 2024

It's weird that only the interrupt tag test is failing, as I believe they are suffering from the same issue (thread pool only got 2 threads instead of requested 4). If my change didn't fix interrupt tag then I think Cancellation APIs in SparkSession are isolated is still flaky. I'd disable it for now.

@xupefei
Copy link
Contributor Author

xupefei commented Nov 1, 2024

PR created at #48736.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants