Skip to content

Conversation

@xi-db
Copy link
Contributor

@xi-db xi-db commented Apr 12, 2024

What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling runner.join() in ExecutorHolder.close(), we set a post-cleanup function as the callback through runner.processOnCompletion, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests and a new test Async cleanup callback gets called after the execution is closed in SparkConnectServiceE2ESuite.scala.

Was this patch authored or co-authored using generative AI tooling?

No.

@xi-db xi-db changed the title [SPARK-47819] Use asynchronous callback for execution cleanup [SPARK-47819][CONNECT] Use asynchronous callback for execution cleanup Apr 12, 2024
Copy link
Contributor

@vicennial vicennial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vicennial
Copy link
Contributor

cc @HyukjinKwon @hvanhovell

@hvanhovell
Copy link
Contributor

@vicennial @xi-db should we also fix this in 3.5?

hvanhovell pushed a commit that referenced this pull request Apr 24, 2024
…leanup

([Original PR](#46027))

### What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

### Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46064 from xi-db/SPARK-47819-async-cleanup-3.5.

Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…leanup (apache#383)

([Original PR](apache#46027))

### What changes were proposed in this pull request?

Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread.

### Why are the changes needed?

In the rare cases mentioned above, performance can be severely affected.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46064 from xi-db/SPARK-47819-async-cleanup-3.5.

Authored-by: Xi Lyu <[email protected]>

Signed-off-by: Herman van Hovell <[email protected]>
Co-authored-by: Xi Lyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants