Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Aug 2, 2023

What changes were proposed in this pull request?

This PR proposes to remove session-based directories when the isolated session is evicted from the cache.

Why are the changes needed?

SPARK-44078 added the cache for isolated sessions, and SPARK-44348 added the session-based directory for isolation.
When the isolated session cache is evicted, we should remove the session-based directory so it doesn't fail when the same session is used, see also #41625 (comment)

Does this PR introduce any user-facing change?

No to end users since the feature has not been released yet.

How was this patch tested?

I manually tested as described in #41292. Especially, I reduced the TTL to few minutes, and tested as below at the last step:

spark.range(10).select(plug_one("id")).show()
spark.range(10).select(plug_one("id")).show()
# Wait few minutes
spark.range(10).select(plug_one("id")).show()

I verified that the same session can be added back to the cache, and creates the directory with the same name by reading executor's stderr at Spark UI.

@github-actions github-actions bot added the CORE label Aug 2, 2023
@HyukjinKwon
Copy link
Member Author

cc @vicennial and @hvanhovell mind taking a look please?

@HyukjinKwon
Copy link
Member Author

Merged to master and branch-3.5.

HyukjinKwon added a commit that referenced this pull request Aug 2, 2023
…isolated session cache is evicted

### What changes were proposed in this pull request?

This PR proposes to remove session-based directories when the isolated session is evicted from the cache.

### Why are the changes needed?

SPARK-44078 added the cache for isolated sessions, and SPARK-44348 added the session-based directory for isolation.
When the isolated session cache is evicted, we should remove the session-based directory so it doesn't fail when the same session is used, see also #41625 (comment)

### Does this PR introduce _any_ user-facing change?

No to end users since the feature has not been released yet.

### How was this patch tested?

I manually tested as described in #41292. Especially, I reduced the TTL to few minutes, and tested as below at the last step:

```python
spark.range(10).select(plug_one("id")).show()
spark.range(10).select(plug_one("id")).show()
# Wait few minutes
spark.range(10).select(plug_one("id")).show()
```

I verified that the same session can be added back to the cache, and creates the directory with the same name by reading executor's stderr at Spark UI.

Closes #42289 from HyukjinKwon/SPARK-44631.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 35d4765)
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-44631 branch January 15, 2024 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants