-
Notifications
You must be signed in to change notification settings - Fork 408
[CELEBORN-1339] Mark connection as timedOut in TransportClient.close #2400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2400 +/- ##
==========================================
+ Coverage 48.85% 48.85% +0.01%
==========================================
Files 209 209
Lines 13101 13102 +1
Branches 1134 1134
==========================================
+ Hits 6399 6400 +1
Misses 6282 6282
Partials 420 420 ☔ View full report in Codecov by Sentry. |
SteNicholas
approved these changes
Mar 19, 2024
Member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
cxzl25
approved these changes
Mar 19, 2024
SteNicholas
pushed a commit
that referenced
this pull request
Mar 20, 2024
### What changes were proposed in this pull request? Importing details from apache/spark#43162: -- This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use. This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue. Looking at the history of the code I believe this was an oversight in apache/spark#9853. -- ### Why are the changes needed? We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there. (I am yet to file the overall TLS support jira yet, but this is enabling work). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #2400 from mridulm/add-SPARK-45375. Authored-by: Mridul Muralidharan <mridulatgmail.com> Signed-off-by: SteNicholas <[email protected]> (cherry picked from commit 21d5698) Signed-off-by: SteNicholas <[email protected]>
Member
|
Merging to main(v0.5.0) and branch-0.4(v0.4.1). |
cfmcgrady
pushed a commit
to cfmcgrady/incubator-celeborn
that referenced
this pull request
Aug 21, 2025
### What changes were proposed in this pull request? Importing details from apache/spark#43162: -- This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use. This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue. Looking at the history of the code I believe this was an oversight in apache/spark#9853. -- ### Why are the changes needed? We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there. (I am yet to file the overall TLS support jira yet, but this is enabling work). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes apache#2400 from mridulm/add-SPARK-45375. Authored-by: Mridul Muralidharan <mridulatgmail.com> Signed-off-by: SteNicholas <[email protected]> (cherry picked from commit 21d5698) Signed-off-by: SteNicholas <[email protected]>
cfmcgrady
pushed a commit
to cfmcgrady/incubator-celeborn
that referenced
this pull request
Aug 21, 2025
Importing details from apache/spark#43162: -- This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use. This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue. Looking at the history of the code I believe this was an oversight in apache/spark#9853. -- We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there. (I am yet to file the overall TLS support jira yet, but this is enabling work). No Unit tests Closes apache#2400 from mridulm/add-SPARK-45375. Authored-by: Mridul Muralidharan <mridulatgmail.com> Signed-off-by: SteNicholas <[email protected]> (cherry picked from commit 21d5698) Signed-off-by: SteNicholas <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Importing details from apache/spark#43162:
--
This PR avoids a race condition where a connection which is in the process of being closed could be returned by the TransportClientFactory only to be immediately closed and cause errors upon use.
This race condition is rare and not easily triggered, but with the upcoming changes to introduce SSL connection support, connection closing can take just a slight bit longer and it's much easier to trigger this issue.
Looking at the history of the code I believe this was an oversight in apache/spark#9853.
--
Why are the changes needed?
We are working towards adding TLS support, which is essentially based on Spark 4.0 TLS support, and this is one of the fixes from there.
(I am yet to file the overall TLS support jira yet, but this is enabling work).
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests