Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm: missed detection of HIP hijack #1557

Open
Tracked by #1032
mariodirenzo opened this issue Sep 30, 2023 · 16 comments
Open
Tracked by #1032

Realm: missed detection of HIP hijack #1557

mariodirenzo opened this issue Sep 30, 2023 · 16 comments

Comments

@mariodirenzo
Copy link

This issue follows up on a discussion on Zulip (https://legion.zulipchat.com/#narrow/stream/187787-general/topic/Hip.20hijack).
It seems that the HIP hijack is not active even if Realm is compiled with REALM_USE_HIP_HIJACK and if all the kernels are launched on the streams returned by hipGetTaskStream.
In particular, the runtime always issues the following warning:

[0 - 7f1fa3ef9000]    0.374056 {4}{hip}: HIP hijack code not active - device synchronizations required after every GPU task!

@eddy16112 has been able to reproduce the problem using circuit.

The values of cfg_task_context_sync and cudart_hijack_active printed at https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/hip/hip_module.cc#L1515 are:

[0 - 1554499f3140]    0.251715 {4}{hip}: gpu_proc->gpu->module->config->cfg_task_context_sync = -1
[0 - 1554499f3140]    0.251741 {4}{hip}: cudart_hijack_active = 1
@mariodirenzo
Copy link
Author

@elliottslaughter, please add this issue to #1032

@eddy16112
Copy link
Contributor

I do not think this is a bug. Here is the output from circuit:

(base) wwu@g0003:/scratch2/wwu/legion-master/examples/circuit$ ./circuit -ll:gpu 1
[0 - 7f5afa950000]    0.149888 {4}{hip}: HIP hijack code not active - device synchronizations required after every GPU task!
[0 - 7f5b0254d000]    0.240584 {3}{circuit}: circuit settings: loops=2 pieces=4 nodes/piece=1024 wires/piece=1024 pct_in_piece=95 seed=12345
[0 - 7f5b0254d000]    0.251828 {3}{circuit}: Initializing circuit simulation...
[0 - 7f5b0254d000]    0.417142 {3}{circuit}: Finished initializing simulation...
Starting main simulation loop
[0 - 7f5afa950000]    0.502278 {4}{hip}: HIP hijack is active - device synchronizations not required after every GPU task!
SUCCESS!
ELAPSED TIME =   0.097 s
GFLOPS =  80.688 GFLOPS
[0 - 7f5b0254d000]    0.515076 {3}{circuit}: simulation complete - destroying regions

We use lazy initialization for hip hijack, the first warning {hip}: HIP hijack code not active - device synchronizations required after every GPU task! is triggered by a realm internal task without calling hipGetTaskStream. Then the circuit GPU kernels are launched with hipGetTaskStream, so we will see HIP hijack is active - device synchronizations not required after every GPU task!

@mariodirenzo
Copy link
Author

Do you mean that the first warning should be disregarded or that the first Realm internal task should be issued on the hipGetTaskStream?

@eddy16112
Copy link
Contributor

the fist warning can be disregarded

@mariodirenzo
Copy link
Author

Is there any plan to silence the initial warning (to avoid confusion for other users in the future) or should I close the issue?

@eddy16112
Copy link
Contributor

I think it is OK to have the warning because as long as hipGetTaskStream is used, we would see another output "HIP hijack is active - device synchronizations not required after every GPU task!"

@mariodirenzo
Copy link
Author

Ok, I'll close the issue. Thanks for the clarification

@elliottslaughter
Copy link
Contributor

@eddy16112 I just want to confirm why the first warning appears? Are we actually launching a task that does not use the hijack? If so, where does it come from?

@eddy16112
Copy link
Contributor

@eddy16112 I just want to confirm why the first warning appears? Are we actually launching a task that does not use the hijack? If so, where does it come from?

A NOP task is launched by Legion to make sure the processor is ready, https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/runtime_impl.cc#L2709, which is where the first warning from.

@elliottslaughter
Copy link
Contributor

Would it be appropriate to fetch the HIP stream from that task, so we can make the initial warning go away?

In that case, I think we would want to disable the print on "HIP hijack is active" so that if the user misses the hijack in their own code, they'll get the "HIP hijack code not active" warning only and not think they're using the hijack when they aren't.

@eddy16112
Copy link
Contributor

Would it be appropriate to fetch the HIP stream from that task,

We can do that, but I do not think it helps. Once the hijack is enabled, we can not go back to the non-hijack mode, so we can not throw the "hijack code not active" warning. One solution is to completely remove the hijack. and use the get_task_hip_stream and set_task_ctxsync_required API, users can control which task need ctxsync by calling set_task_ctxsync_required inside the task body.

@elliottslaughter
Copy link
Contributor

I'm not crazy about set_task_ctxsync_required. I can see how it would help in special cases where you need to go back and forth, but in the common case of always using the hijack, it would be nice if things "just worked".

Could we instead treat the internal task as a special operation somehow, so that we know it does not launch any kernels? It really shouldn't have an impact on the hijack either way, since we know it's not doing anything.

@elliottslaughter
Copy link
Contributor

@eddy16112 Instead of requiring a global set_task_ctxsync_required, could we add an API like skip_task_ctxsync_this_task (name subject to change). The semantics of this API are:

  1. The caller task promises not to perform any work on the GPU whatsoever
  2. The caller does NOT promise anything about the rest of the application: i.e., Realm should not make assumptions about the application's behavior from this one task and should essentially just skip it for the purpose of determining whether to synchronize or not

Legion can then use this in its dummy tasks to avoid the warning, without poisoning the state of the application and requiring a problematic global state like set_task_ctxsync_required.

@eddy16112
Copy link
Contributor

@eddy16112 Instead of requiring a global set_task_ctxsync_required, could we add an API like skip_task_ctxsync_this_task (name subject to change). The semantics of this API are:

  1. The caller task promises not to perform any work on the GPU whatsoever
  2. The caller does NOT promise anything about the rest of the application: i.e., Realm should not make assumptions about the application's behavior from this one task and should essentially just skip it for the purpose of determining whether to synchronize or not

Legion can then use this in its dummy tasks to avoid the warning, without poisoning the state of the application and requiring a problematic global state like set_task_ctxsync_required.

Why not just call set_task_ctxsync_required(false) for the dummy task? I think the only issue is that it will record an event for the dummy task, which is not necessary.

@elliottslaughter
Copy link
Contributor

Ok. I guess I don't understand the semantics of set_task_ctxsync_required. Is it a global option or does it influence only the current task? If we set_task_ctxsync_required(false) in one task, will the runtime will auto-detect correctly in the next one?

@eddy16112
Copy link
Contributor

It only influences the current task. By default, the ThreadLocal::context_sync_required is set to -1 inside every task. In this case, context sync is always needed https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/cuda/cuda_module.cc#L1642-1647.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants