-
Notifications
You must be signed in to change notification settings - Fork 411
[CELEBORN-1648] Refine AppUniqueId with UUID suffix #2810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| def appUniqueIdWithUUID(appUiqueId: String): String = { | ||
| appUiqueId + "-" + UUID.randomUUID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add config to allow user to add UUID suffix? IMO, not all cases should add UUID suffix. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to move appUniqueIdWithUUID(appUiqueId: String) to CelebornConf. Additionally, we can simplify the UUID by removing the dashes to create a shorter version, then we can use celebornConf.appUniqueIdWithUUIDSuffix
| appUiqueId + "-" + UUID.randomUUID() | |
| def appUniqueIdWithUUIDSuffix(appId: String): String = { | |
| if (clientApplicationUUIDSuffixEnabled) { | |
| appId + "-" + UUID.randomUUID().toString().replaceAll("-", "") | |
| } else { | |
| appId | |
| } | |
| } | |
| if (lifecycleManager == null) { | ||
| celebornAppId = FlinkUtils.toCelebornAppId(lifecycleManagerTimestamp, jobID); | ||
| celebornAppId = | ||
| Utils.appUniqueIdWithUUID( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does flink app id need to add UUID suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you think there's no application id colision problem for flink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect that flink app id(JobId + timestamp) might conflict. Did you actually suffer from this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteNicholas please review it again, I reverted the change.
| .createWithDefault(0.4) | ||
|
|
||
| val CLIENT_APPLICATION_UUID: ConfigEntry[Boolean] = | ||
| buildConf("celeborn.client.application.uuid.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfer celeborn.client.application.uuidSuffix.enabled
| def clientExcludeReplicaOnFailureEnabled: Boolean = | ||
| get(CLIENT_EXCLUDE_PEER_WORKER_ON_FAILURE_ENABLED) | ||
| def clientMrMaxPushData: Long = get(CLIENT_MR_PUSH_DATA_MAX) | ||
| def clientApplicationUUID: Boolean = get(CLIENT_APPLICATION_UUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clientApplicationUUID -> clientApplicationUUIDSuffixEnabled
| } | ||
|
|
||
| def appUniqueIdWithUUID(appUiqueId: String): String = { | ||
| appUiqueId + "-" + UUID.randomUUID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to move appUniqueIdWithUUID(appUiqueId: String) to CelebornConf. Additionally, we can simplify the UUID by removing the dashes to create a shorter version, then we can use celebornConf.appUniqueIdWithUUIDSuffix
| appUiqueId + "-" + UUID.randomUUID() | |
| def appUniqueIdWithUUIDSuffix(appId: String): String = { | |
| if (clientApplicationUUIDSuffixEnabled) { | |
| appId + "-" + UUID.randomUUID().toString().replaceAll("-", "") | |
| } else { | |
| appId | |
| } | |
| } | |
| buildConf("celeborn.client.application.uuid.enabled") | ||
| .categories("client") | ||
| .version("0.6.0") | ||
| .doc("When `true`, add uuid suffix to application id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .doc("When `true`, add uuid suffix to application id") | |
| .doc("Whether to add UUID suffix for application id for unique. When `true`, add UUID suffix for unique application id.") |
| if (lifecycleManager == null) { | ||
| celebornAppId = FlinkUtils.toCelebornAppId(lifecycleManagerTimestamp, jobID); | ||
| LOG.info("CelebornAppId: {}", celebornAppId); | ||
| String applicationId = FlinkUtils.toCelebornAppId(lifecycleManagerTimestamp, jobID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change. Meanwhile, remove line 87~89.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change. Meanwhile, remove line 87~89.
ok, let's revert it.
should we describe that the flag is only for spark and mr in doc ?
| synchronized (CelebornTierMasterAgent.class) { | ||
| if (lifecycleManager == null) { | ||
| celebornAppId = FlinkUtils.toCelebornAppId(lifecycleManagerTimestamp, jobID); | ||
| String applicationId = FlinkUtils.toCelebornAppId(lifecycleManagerTimestamp, jobID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
09fe17a to
ff6a9e9
Compare
RexXiong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only a minor comment
| buildConf("celeborn.client.application.uuidSuffix.enabled") | ||
| .categories("client") | ||
| .version("0.6.0") | ||
| .doc("Whether to add UUID suffix for application id for unique. When `true`, add UUID suffix for unique application id.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration only takes effect when using the Spark and MR engines, so we must specify this in doc.
| buildConf("celeborn.client.application.uuidSuffix.enabled") | ||
| .categories("client") | ||
| .version("0.6.0") | ||
| .doc("Whether to add UUID suffix for application id for unique. When `true`, add UUID suffix for unique application id. Currently, this only applies to Spark and MapPartition.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapPartition?
655bba2 to
d7a1613
Compare
reswqa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
|
Thanks, merge to main(v0.6.0) |
### What changes were proposed in this pull request? We can add randomUUID as an suffix to solve it ### Why are the changes needed? currently, we cannot guarantee application id is really unique. this may lead to data issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? test locally Closes apache#2810 from chenkovsky/feature/uuid_appid. Authored-by: Chongchen Chen <[email protected]> Signed-off-by: Shuang <[email protected]>
|
Circling back to this - is this a case where multiple application attempts were involved ? (in which case, using appId + attemptId is the right approach) - or when different application;s application id actually collided? Given spark (and MR ?) use application id + attempt id for a few things already (like event files), I am trying to understand why UUID was necessary. |
Hi, @mridulm . In our cluster, most of time, applicationId is unique, but id collision will take place occasionally. when it happens, it's very hard to find the root cause of data missing problem. For spark or MR, user can also override appId, add uuid suffix is also a fool-proof design. |
What changes were proposed in this pull request?
We can add randomUUID as an suffix to solve it
Why are the changes needed?
currently, we cannot guarantee application id is really unique. this may lead to data issue.
Does this PR introduce any user-facing change?
No
How was this patch tested?
test locally