-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49300][CORE] Fix Hadoop delegation token leak when tokenRenewalInterval is not set. #47800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Interval is not set.
| val interval = newExpiration - getIssueDate(tokenKind, identifier) | ||
| logInfo(log"Renewal interval is ${MDC(TOTAL_TIME, interval)} for" + | ||
| log" token ${MDC(TOKEN_KIND, tokenKind)}") | ||
| token.cancel(hadoopConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested it? it requires some time to propagate the new token from driver to executors, I think such a change may cause auth failures on the executor side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! These tokens have not actually been used. The tokens that are truly distributed to each executor are obtained at Line57:
spark/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
Lines 57 to 64 in dd259b0
| val fetchCreds = fetchDelegationTokens(getTokenRenewer(sparkConf, hadoopConf), fileSystems, | |
| creds, fsToExclude) | |
| // Get the token renewal interval if it is not set. It will only be called once. | |
| if (tokenRenewalInterval == null) { | |
| tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, fileSystems) | |
| } | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, that makes sense
|
Not sure why there is one method named |
@Hexiaoqiao I believe it gets explained in L135-L137
Well, for non-YARN mode, e.g.(K8s mode), we do have a chance to save one fetchDelegationTokens call. spark/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala Lines 93 to 99 in 79aeae1
|
|
Well, Got it. If that I think @zhangshuyan0 's PR should be one direct solution. Thanks again. |
|
Looks reasonable. |
|
Hi All, any progress here? Thanks. |
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit
| val interval = newExpiration - getIssueDate(tokenKind, identifier) | ||
| logInfo(log"Renewal interval is ${MDC(TOTAL_TIME, interval)} for" + | ||
| log" token ${MDC(TOKEN_KIND, tokenKind)}") | ||
| token.cancel(hadoopConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add some comments here just to make it easier to understand in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! I have added it.
|
+CC @attilapiros, @tgravescs |
|
Thanks! Merged to master |
|
@zhangshuyan0 do you think we should backport this to branch-3.5 too? if so, feel free to create another PR for the backport. (I tried to just cherry-pick this commit but there are some conflicts) |
I think so. I'll create another PR for it. |
|
@sunchao, if there are additionally folks tagged for review, it would be good to give them a chance to get to the PR - given Apache Spark has a distributed community. |
|
@mridulm I'm sorry. I didn't know you tagged them for review. Feel free to leave any comments on the PR. We can always address them in follow-ups. |
When
tokenRenewalIntervalis not set, HadoopFSDelegationTokenProvider#getTokenRenewalInterval will fetch some tokens and renew them to get a interval value.spark/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
Lines 60 to 64 in dd259b0
These tokens do not call cancel(), resulting in a large number of existing tokens on HDFS not being cleared in a timely manner, causing additional pressure on the HDFS server.