Skip to content

Conversation

@gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Oct 29, 2025

The example in the specification was using an old API that we removed in #14785. Update the example and the spec wording to use launch_config from sycl_ext_oneapi_enqueue_functions to pass the property with the size of the scratch memory.

The example in the specification was using an old API that we removed
in intel#14785.  Update the example and the spec wording to use
`launch_config` from sycl_ext_oneapi_enqueue_functions to pass the
property with the size of the scratch memory.
@gmlueck gmlueck requested a review from a team as a code owner October 29, 2025 20:13
local memory capacity as reported by `info::device::local_mem_size`
then the implementation must throw a synchronous exception with the
`errc::memory_allocation` error code from the kernel invocation command
(e.g. `parallel_for`). This check must take all APIs that allocation device

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have a check to confirm whether the SLM size is set for certain kernel?
Say, if a kernel calls the syclex::get_work_group_scratch_memory() in kernel body, but we do not set the syclex::work_group_scratch_size in nd_launch/parallel_for, will an exception be thrown?
I'm asking this because looks like there will be no any warning or error in current implementation.

Copy link
Contributor Author

@gmlueck gmlueck Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, and it seems to me that it would be easy to implement. I think we just need a check here to see if WorkGroupMemorySize has been set by the user.

https://github.com/intel/llvm/blob/sycl/sycl/source/detail/scheduler/commands.cpp#L2474-L2481

We should be careful, though, not to throw an exception just because the user set WorkGroupMemorySize to 0. Setting the size to 0 should be allowed.

@AlexeySachkov and @steffenlarsen do you see any problem with this? I assume this code is executed synchronously by the same application thread that launches the kernel, right? Or, is this code executed asynchronously by another thread? If that is the case, the error would be an async error.

I think we could throw errc::memory_allocation in this scenario too, and we can distinguish this case from the out-of-memory case via the message string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, it should be possible to make this a synchronous exception as the information about the existence of the implicit local argument is present both in the compiler-generated specializations of DeviceKernelInfo and the kernel properties. For SYCLBIN or RTC compiled kernels, we should have the kernel object ahead of enqueuing the kernel launch, which should be enough to find its properties. How much overhead the lookup for the latter would be is TBD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: The information is only in the properties, so the implementation will have to look up the kernel's information. However, it should just be a single map lookup so it should be relatively cheap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code I point to in the link above already seems to have all the necessary information. Essentially, the code looks like:

if (/*kernel has the implicit local arg*/) {
  /* Push "WorkGroupMemorySize" as an argument */
}

Therefore, the code already knows whether the kernel has the implicit argument, and it knows the value of the work_group_scratch_size property that was passed at launch time. Therefore, it seems like we just need to rewrite the code here like:

if (/*kernel has the implicit local arg*/) {
  if (WorkGroupMemorySize == -1) {
    /* report synchronous exception */
  }
  /* Push "WorkGroupMemorySize" as an argument */
}

And we need to trace back the calls to this code and arrange to pass -1 for WorkGroupMemorySize in the case when the property is not set. (Or, choose some other sentinel value other than -1.)

If we all agree, I think this should be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to drive this to some sort of decision. I think there are three options:

  1. We can add the error check I propose above. When I wrote that, I didn't realize that this would result in an asynchronous error, though. That means you won't get the error unless you call queue::wait_and_throw, which makes the error a lot less useful.

  2. We could add an error check to the "submit" code path, so that you get a synchronous exception. (This means you will get the error even without calling either queue::wait or queue::wait_and_throw.) This is a much better error diagnostic. However, we are really trying to optimize the submit code path. Therefore, I don't think we should do this if it adds any significant code to the hot path.

  3. If the error check is expensive, maybe we can do the error check only if the user compiles with some special -D flag.

I think option (1) would not be a great solution. @steffenlarsen what are your thoughts on options (2) and (3)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go for (2) and see how expensive it will be. I don't suspect the implementation will be incredibly difficult and we could implement it selectively at first while we evaluate the performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'd like to create a separate spec PR with the wording about the error condition. Let's keep this PR focused on the example.

If everything else is good, can I get an approval on this one? Note that I just made a small clarification to the example in 1cf4ff9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR that updates the wording on error conditions: #20651

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steffenlarsen @gujinghui: I think this conversation is resolved now?

Can one of you approve this PR, so we can merge the update to the example?

These `constexpr` declarations aren't necessary, and they make the
example more confusing.  The purpose of this extension is to allocate
a runtime-defined amount of local memory, so the memory size shouldn't
be declared `constexpr`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants