-
Notifications
You must be signed in to change notification settings - Fork 9.2k
YARN-11005. Implement the core QUEUE_LENGTH_THEN_RESOURCES OContainer allocation policy #3717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… allocation policy * Implement the `QUEUE_LENGTH_THEN_RESOURCES` `LoadComparator` * Add `Resource` requested to select node methods when allocating `OPPORTUNISTIC` containers * Add node capacity and node allocated resources to `ClusterNode` * Extend `DominantResourceCalculator` to be able to compute min-share * Add tests and extend existing ones to consider 3 resource dimensions
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| } | ||
|
|
||
| @Test | ||
| public void testQueueLengthThenResourcesSort() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case when the resource size of node changes. How its getting handled in O containers
|
@goiri and @bibinchundatt thanks for your reviews! |
|
🎊 +1 overall
This message was automatically generated. |
| capability, currAllocated); | ||
| if (resourceCalculator.fitsIn(requested, currAvailable)) { | ||
| final Resource newAllocated = Resources.add(currAllocated, requested); | ||
| c.setAllocatedResource(newAllocated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with multiple application running in parallel the allocated resource update could be wrong here. I think we take clusterNode level lock when we are updating / reading and during checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that since the original code also wasn't handling parallelism correctly I thought it might've been fine to not use locks, but ensuring correctness is probably better here.
|
@bibinchundatt thanks for the review! Added a new commit addressing your comments. |
| readLock.lock(); | ||
| try { | ||
| return this.capability; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} finally {
| readLock.lock(); | ||
| try { | ||
| return this.allocatedResource; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} finally {
| } | ||
| } | ||
|
|
||
| public boolean compareAndIncrementAllocation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this fit in one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so? We need to set queueLength if it fits, and ignore it if it does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in terms of style, I think:
public boolean compareAndIncrementAllocation(final int incrementQLen) {
Fits in less than 100 characters (it may even be 120 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, good catch.
| int diff = 0; | ||
| switch (this) { | ||
| case QUEUE_LENGTH_THEN_RESOURCES: | ||
| diff = o1.getQueueLength() - o2.getQueueLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This math is a little intense.
It might be better to extract it to a function just for this so you can return things right away and have comments throughout.
| break; | ||
| } | ||
|
|
||
| // NOTE: cluster resource should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we just go ahead with leaving a note?
It doesn't seem very maintanable.
...rg/apache/hadoop/yarn/server/resourcemanager/scheduler/distributed/NodeQueueLoadMonitor.java
Show resolved
Hide resolved
|
🎊 +1 overall
This message was automatically generated. |
…on, add javadocs, formatting issues
|
@goiri thanks for the additional round of review! Latest commit should address your comments, please let me know if any other concerns. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM added minor comment .
| comparator == LoadComparator.QUEUE_LENGTH || | ||
| comparator == LoadComparator.QUEUE_LENGTH_THEN_RESOURCES)) { | ||
| clusterNode | ||
| .setQueueWaitTime(estimatedQueueWaitTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every second we are taking too many write locks of ClusterNode. IIUC that not the best optimized option. We should probably take the write lock once and update all the values like waitTime,SetQueueLength,labels etc .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will address in a new iteration.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
… allocation policy (apache#3717)
Description of PR
QUEUE_LENGTH_THEN_RESOURCESLoadComparatorResourcerequested to select node methods when allocatingOPPORTUNISTICcontainersClusterNodeDominantResourceCalculatorto be able to compute min-shareHow was this patch tested?