-
Notifications
You must be signed in to change notification settings - Fork 7k
Perf fix #2069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf fix #2069
Conversation
src/ray/raylet/node_manager.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the .count() == 0 is supposed to be idiomatically equivalent. Do you find your proposed way better in performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count() means we need try to find a item in the map. and cluster_resource_map_[client_id] will find it again. Although find is O(n) operation for unordered_map, but calculation hash and function call will have cost, right. The fix will change find operation from twice to once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I know, I said "nevermind" below, when I saw you reusing the iterator ;)
src/ray/raylet/node_manager.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. nvm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_handlers_ and rem_handlers_ are vectors of function pointers. Did you find switching to a reference giving you a significant performance improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::function copy is not so cheap, the std::function may be a wrapper of lambda. In the lambda, it may catch some variables, it need copy of the object. For it you can debug how lambda was converted to a std::function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only lambda, but also std::bind will generate a very heavy std::function object, it is better to use reference if it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I agree with that and have been pushing internally to use refs and const refs as much as possible... Just curious if you actually saw performance differences in your testing environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't get the perf number now. Because the current code has too many difference from our internal branch.
I did such change because I met similar problem before.
src/ray/raylet/node_manager.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, here I explicitly want to work with a copy (as per the comment on L290), because we're calling RemoveTasks on local queues below, which will change the contents of scheduled_tasks_. This will modify a container we're iterating over, if it's a const ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change the following change to
std::vector dispatched_taskIDs;
dispatched_taskIDs.reserve(scheduled_tasks.size());
const auto &local_resources =
cluster_resource_map_[my_client_id].GetAvailableResources();
for (const auto &task : scheduled_tasks) {
const auto &task_resources = task.GetTaskSpecification().GetRequiredResources();
if (!task_resources.IsSubset(local_resources)) {
// Not enough local resources for this task right now, skip this task.
continue;
}
dispatched_taskIDs.push_back(task.GetTaskSpecification().TaskId());
}
// We have enough resources for this task. Assign task.
// TODO(atumanov): perform the task state/queue transition inside AssignTask.
auto dispatched_tasks =
local_queues_.RemoveTasks(dispatched_taskIDs);
for (auto& task : dispatched_tasks)
AssignTask(task);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this way, we don't need copy scheduled_tasks, and only need call the RemoveTasks for once.
|
Test PASSed. |
src/ray/raylet/node_manager.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of correctness, it should be fine to batch dispatched ids and then dispatch them. The intention was to dispatch them ASAP, to minimize latency. I'm ok with this change for now.
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Hey @eric-jj, you should be able to expedite the Travis builds by doing the following to test your PR. You can view the results at https://travis-ci.com/robertnishihara/ray-private-travis/branches. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
|
|
I can reproduce the issue (stochastically) On Ubuntu with Python 3.5 by running will look into it a little. |
| return; | ||
| } | ||
| const ClientID &my_client_id = gcs_client_->client_table().GetLocalClientId(); | ||
| const auto &local_resources = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a bug here (or at least a change in behavior). Before, we called AssignTask inside the for loop, which presumably modified local_resources. However, now we call AssignTask at the very end, so the check if (!task_resources.IsSubset(local_resources)) { may give different results now.
I will try reverting some of this and see if that makes the problem I'm seeing go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will investigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertnishihara good catch. That's the reason why I didn't batch AssignTask calls before.
this->cluster_resource_map_[my_client_id].Acquire(spec.GetRequiredResources())); this line acquires resources when the task is assigned. This makes sure that the locally available resources are updated at each for loop iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. you are right, I have reverted it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why I can't reopen the request after I rebase from master branch, have created a new pull request.
What do these changes do?
Related issue number