Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial changes for yarn integration for dynamic scaling #1

Draft
wants to merge 21 commits into
base: temporal-dynamic-scaling
Choose a base branch
from

Conversation

Blazer-007
Copy link

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Copy link
Owner

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start!

Comment on lines 45 to 47
scalingDirectives.forEach(directive -> this.workforceStaffing.reviseStaffing(
directive.getProfileName(), directive.getSetPoint(), directive.getTimestampEpochMillis())
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember: some ScalingDirectives may be rejected by WorkforcePlan (e.g. all kinds of WorkforcePlan::IllegalRevisionException). accordingly, it's NOT safe to process them directly and make any presumptions about which to use. by contrast, it IS safe to iterate over StaffingDeltas

anyway, that may only be safer to perform inside requestNewContainersForStaffingDeltas, once each requestContainersForWorkerProfile succeeds

private synchronized void requestContainersForWorkerProfile(WorkerProfile workerProfile, int numContainers) {
int containerMemoryMbs = workerProfile.getConfig().getInt(GobblinYarnConfigurationKeys.CONTAINER_MEMORY_MBS_KEY);
int containerCores = workerProfile.getConfig().getInt(GobblinYarnConfigurationKeys.CONTAINER_CORES_KEY);
long allocationRequestId = allocationRequestIdGenerator.getAndIncrement();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on how these IDs are to be used... but wanting to verify it's OK to request multiple containers together w/ the same ID vs. having a unique ID for each container requested. given we ultimately may request N containers together but later want to decomm only a subset, I'd have thought per-container IDs to be clearer.

edit: I see, you're using merely to look up the config for the container. for that, yes per-request ID looks fine. I was imagining for correlating to guide which to terminate for scale-down. for that, we'll probably need per-container IDs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for terminating or scaling down we will use container.getId() which is unique for every container.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should WorkforceStaffing adjustment occur as part of the AMRMClientCallbackHandler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants