-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fix broken pipe callback #4513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken pipe callback #4513
Changes from all commits
da00fb5
6ce57c8
a668a66
a98c207
b59c81d
74929aa
343e1d3
e4c3b74
e15b6bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -257,12 +257,15 @@ void ResourceIds::Release(const ResourceIds &resource_ids) { | |
| if (fractional_pair_it == fractional_ids_.end()) { | ||
| fractional_ids_.push_back(fractional_pair_to_return); | ||
| } else { | ||
| RAY_CHECK(fractional_pair_it->second < 1); | ||
| fractional_pair_it->second += fractional_pair_to_return.second; | ||
| RAY_CHECK(fractional_pair_it->second <= 1); | ||
| // If this makes the ID whole, then return it to the list of whole IDs. | ||
| if (fractional_pair_it->second == 1) { | ||
| if (fractional_pair_it->second >= 1) { | ||
| whole_ids_.push_back(resource_id); | ||
| fractional_ids_.erase(fractional_pair_it); | ||
| fractional_pair_it->second -= 1; | ||
| if (fractional_pair_it->second < 1e-6) { | ||
| fractional_ids_.erase(fractional_pair_it); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not too sure if this is what we want. @robertnishihara could you also take a look at this part?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reading more about this, I think this fix is correct. |
||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be
RAY_CHECK(fractional_pair_it->second < 1 + std::numeric_limits<double>::epsilon());I believe @romilbhardwaj is making this change in #4555.
However, all of the imprecision is just a temporary solution. There are too many places that we'd have to add
epsilonin order to make things correct. @williamma12 is working on a solution that simply makes the fractional resource bookkeeping exact (instead of approximate) by using integers instead of doubles (where the integer can represent 1/1000th of a resource).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem in this case isn't precision. The problem is that the remaining value and returned value can add up to be larger than 1. E.g., returning 0.3 to existing 0.8 will be 1.1 in total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @raulchen is right. From my observation in
test_actor.py::test_resource_assignment, there will be 1.1 which is generated by adding 0.2 to 0.9.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulchen @guoyuhong that shouldn't be possible. If that happens, then it is a bug.
Each fractional resource corresponds to a specific resource ID. Suppose there are two GPUs with IDs 0 and 1, and suppose there are two tasks that require 0.2 and 0.9 GPUs.
We won't add the 0.2 and the 0.9 together because they correspond to different IDs. It isn't possible for one worker to have (ID1, 0.2) and another worker to have (ID1, 0.9) because the total quantity of a given ID is 1.
Does my notation make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertnishihara Thanks for the information. I looked at the code again. The
resource_idisint64_tnot astring. I may misunderstand the code. I run the test several time and cannot repro the case that adding 0.2 to 0.9, which is strange.Sorry to this bad change. Shall I revert the change or this will be fixed in #4555 by @romilbhardwaj?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @guoyuhong, we'll fix it in #4555.