-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realm: missed detection of HIP hijack #1557
Comments
@elliottslaughter, please add this issue to #1032 |
I do not think this is a bug. Here is the output from circuit:
We use lazy initialization for hip hijack, the first warning |
Do you mean that the first warning should be disregarded or that the first Realm internal task should be issued on the |
the fist warning can be disregarded |
Is there any plan to silence the initial warning (to avoid confusion for other users in the future) or should I close the issue? |
I think it is OK to have the warning because as long as |
Ok, I'll close the issue. Thanks for the clarification |
@eddy16112 I just want to confirm why the first warning appears? Are we actually launching a task that does not use the hijack? If so, where does it come from? |
A NOP task is launched by Legion to make sure the processor is ready, https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/runtime_impl.cc#L2709, which is where the first warning from. |
Would it be appropriate to fetch the HIP stream from that task, so we can make the initial warning go away? In that case, I think we would want to disable the print on "HIP hijack is active" so that if the user misses the hijack in their own code, they'll get the "HIP hijack code not active" warning only and not think they're using the hijack when they aren't. |
We can do that, but I do not think it helps. Once the hijack is enabled, we can not go back to the non-hijack mode, so we can not throw the "hijack code not active" warning. One solution is to completely remove the hijack. and use the |
I'm not crazy about Could we instead treat the internal task as a special operation somehow, so that we know it does not launch any kernels? It really shouldn't have an impact on the hijack either way, since we know it's not doing anything. |
@eddy16112 Instead of requiring a global
Legion can then use this in its dummy tasks to avoid the warning, without poisoning the state of the application and requiring a problematic global state like |
Why not just call |
Ok. I guess I don't understand the semantics of |
It only influences the current task. By default, the |
This issue follows up on a discussion on Zulip (https://legion.zulipchat.com/#narrow/stream/187787-general/topic/Hip.20hijack).
It seems that the HIP hijack is not active even if Realm is compiled with
REALM_USE_HIP_HIJACK
and if all the kernels are launched on the streams returned byhipGetTaskStream
.In particular, the runtime always issues the following warning:
@eddy16112 has been able to reproduce the problem using circuit.
The values of
cfg_task_context_sync
andcudart_hijack_active
printed at https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/hip/hip_module.cc#L1515 are:The text was updated successfully, but these errors were encountered: