[SYCL][Doc] Fix work_group_scratch_memory example #20506

gmlueck · 2025-10-29T20:13:03Z

The example in the specification was using an old API that we removed in #14785. Update the example and the spec wording to use launch_config from sycl_ext_oneapi_enqueue_functions to pass the property with the size of the scratch memory.

The example in the specification was using an old API that we removed in intel#14785. Update the example and the spec wording to use `launch_config` from sycl_ext_oneapi_enqueue_functions to pass the property with the size of the scratch memory.

gujinghui · 2025-10-31T03:00:48Z

sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_scratch_memory.asciidoc

 local memory capacity as reported by `info::device::local_mem_size`
 then the implementation must throw a synchronous exception with the
 `errc::memory_allocation` error code from the kernel invocation command
 (e.g. `parallel_for`). This check must take all APIs that allocation device


Will we have a check to confirm whether the SLM size is set for certain kernel?
Say, if a kernel calls the syclex::get_work_group_scratch_memory() in kernel body, but we do not set the syclex::work_group_scratch_size in nd_launch/parallel_for, will an exception be thrown?
I'm asking this because looks like there will be no any warning or error in current implementation.

I think this makes sense, and it seems to me that it would be easy to implement. I think we just need a check here to see if WorkGroupMemorySize has been set by the user.

https://github.com/intel/llvm/blob/sycl/sycl/source/detail/scheduler/commands.cpp#L2474-L2481

We should be careful, though, not to throw an exception just because the user set WorkGroupMemorySize to 0. Setting the size to 0 should be allowed.

@AlexeySachkov and @steffenlarsen do you see any problem with this? I assume this code is executed synchronously by the same application thread that launches the kernel, right? Or, is this code executed asynchronously by another thread? If that is the case, the error would be an async error.

I think we could throw errc::memory_allocation in this scenario too, and we can distinguish this case from the out-of-memory case via the message string.

Off the top of my head, it should be possible to make this a synchronous exception as the information about the existence of the implicit local argument is present both in the compiler-generated specializations of DeviceKernelInfo and the kernel properties. For SYCLBIN or RTC compiled kernels, we should have the kernel object ahead of enqueuing the kernel launch, which should be enough to find its properties. How much overhead the lookup for the latter would be is TBD.

Correction: The information is only in the properties, so the implementation will have to look up the kernel's information. However, it should just be a single map lookup so it should be relatively cheap.

The code I point to in the link above already seems to have all the necessary information. Essentially, the code looks like:

if (/*kernel has the implicit local arg*/) { /* Push "WorkGroupMemorySize" as an argument */ }

Therefore, the code already knows whether the kernel has the implicit argument, and it knows the value of the work_group_scratch_size property that was passed at launch time. Therefore, it seems like we just need to rewrite the code here like:

if (/*kernel has the implicit local arg*/) { if (WorkGroupMemorySize == -1) { /* report synchronous exception */ } /* Push "WorkGroupMemorySize" as an argument */ }

And we need to trace back the calls to this code and arrange to pass -1 for WorkGroupMemorySize in the case when the property is not set. (Or, choose some other sentinel value other than -1.)

If we all agree, I think this should be done in a separate PR.

I'd like to drive this to some sort of decision. I think there are three options:

We can add the error check I propose above. When I wrote that, I didn't realize that this would result in an asynchronous error, though. That means you won't get the error unless you call queue::wait_and_throw, which makes the error a lot less useful.

We could add an error check to the "submit" code path, so that you get a synchronous exception. (This means you will get the error even without calling either queue::wait or queue::wait_and_throw.) This is a much better error diagnostic. However, we are really trying to optimize the submit code path. Therefore, I don't think we should do this if it adds any significant code to the hot path.

If the error check is expensive, maybe we can do the error check only if the user compiles with some special -D flag.

I think option (1) would not be a great solution. @steffenlarsen what are your thoughts on options (2) and (3)?

I think we should go for (2) and see how expensive it will be. I don't suspect the implementation will be incredibly difficult and we could implement it selectively at first while we evaluate the performance.

OK. I'd like to create a separate spec PR with the wording about the error condition. Let's keep this PR focused on the example.

If everything else is good, can I get an approval on this one? Note that I just made a small clarification to the example in 1cf4ff9.

Here's the PR that updates the wording on error conditions: #20651

@steffenlarsen @gujinghui: I think this conversation is resolved now?

Can one of you approve this PR, so we can merge the update to the example?

These `constexpr` declarations aren't necessary, and they make the example more confusing. The purpose of this extension is to allocate a runtime-defined amount of local memory, so the memory size shouldn't be declared `constexpr`.

gmlueck requested a review from a team as a code owner October 29, 2025 20:13

gujinghui reviewed Oct 31, 2025

View reviewed changes

Remove "constexpr" from example

1cf4ff9

These `constexpr` declarations aren't necessary, and they make the example more confusing. The purpose of this extension is to allocate a runtime-defined amount of local memory, so the memory size shouldn't be declared `constexpr`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][Doc] Fix work_group_scratch_memory example #20506

[SYCL][Doc] Fix work_group_scratch_memory example #20506

Uh oh!

gmlueck commented Oct 29, 2025

Uh oh!

gujinghui Oct 31, 2025

Uh oh!

gmlueck Oct 31, 2025 •

edited

Loading

Uh oh!

steffenlarsen Nov 5, 2025

Uh oh!

steffenlarsen Nov 5, 2025

Uh oh!

gmlueck Nov 5, 2025

Uh oh!

gmlueck Nov 13, 2025

Uh oh!

steffenlarsen Nov 14, 2025

Uh oh!

gmlueck Nov 14, 2025

Uh oh!

gmlueck Nov 14, 2025

Uh oh!

gmlueck Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SYCL][Doc] Fix work_group_scratch_memory example #20506

Are you sure you want to change the base?

[SYCL][Doc] Fix work_group_scratch_memory example #20506

Uh oh!

Conversation

gmlueck commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmlueck Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gmlueck Oct 31, 2025 •

edited

Loading