-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Add heartbeat test + Fix monitor.py #5191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test PASSed. |
|
Test PASSed. |
Initial implementation for heartbeat test
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
@richardliaw sorry for introducing this bug. It should be fixed by the following patch. Do you want to include the patch in this PR? Or I can submit another fix as well. |
|
Test FAILed. |
|
Hi Hao,
Does this patch fix the test? Right now I think we are missing all resource
information when we receive a heartbeat message.
Feel free to push to this branch too.
…On Mon, Jul 15, 2019 at 6:30 AM UCB AMPLab ***@***.***> wrote:
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1674/
Test FAILed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5191?email_source=notifications&email_token=ABCRZZKHVXLKSAGQRIKBLTLP7R3WNA5CNFSM4IDEOOPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ5WD7A#issuecomment-511402492>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABCRZZPHIIANRUJXP7KOQO3P7R3WNANCNFSM4IDEOOPA>
.
|
This reverts commit 60aebdd.
|
Hey @raulchen, thanks for the patch! I tried those changes, but they don't seem to solve the underlying issue. As Richard said, it seems like all the resource information is missing currently. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
hartikainen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
python/ray/monitor.py
Outdated
| # Update the load metrics for this raylet. | ||
| client_id = ray.utils.binary_to_hex(heartbeat_message.client_id) | ||
| ip = self.raylet_id_to_ip_map.get(client_id) | ||
| load_metrics_id = ip + "-" + client_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a corresponding change to somewhere else. I currently see the following in my autoscaler logs:
2019-07-16 17:22:38,247 INFO autoscaler.py:198 -- LoadMetrics: Removed 1 stale ip mappings: {'10.138.0.90-e7a33a624c198a4a3280ed3ee9fbe3fdd13d287e'} not in {'10.138.0.94', '10.138.0.95', '10.138.0.91', '10.138.0.92', '10.138.0.90', '10.138.0.93'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think this might be wrong (or requires a larger change to get right)...
|
Hi, @richardliaw, it seems that the test in this PR has already covered my bug. Should I add another one? |
|
@raulchen, I think it would be useful to have a test for it on the backend side. What do you think? |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
yeah, I agree that having such a test in C++ would be useful. However, it's not easy to do that. Because we don't have C++ workers. Currently, there are only some unit tests in C++ that test particular components of the backend. And integration tests are written in Python and Java. |
|
Test PASSed. |
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
What do these changes do?
Related issue number
Linter
scripts/format.shto lint the changes in this PR.