Skip to content

Update spec to make kernel validation optional #2564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

RossBrunton
Copy link
Contributor

Several adapters don't support validating kernel signatures when
enqueued. To handle this, we now allow urEnqueueKernelLaunch to return
SUCCESS even when parameters are invalid.

@github-actions github-actions bot added loader Loader related feature/bug conformance Conformance test suite issues. specification Changes or additions to the specification labels Jan 15, 2025
@RossBrunton RossBrunton force-pushed the ross/argval branch 4 times, most recently from 7c1118b to f249b9b Compare January 22, 2025 17:19
@github-actions github-actions bot added the cuda CUDA adapter specific issues label Jan 22, 2025
@RossBrunton RossBrunton force-pushed the ross/argval branch 3 times, most recently from 25f458d to 7affcd2 Compare January 31, 2025 12:38
@RossBrunton RossBrunton changed the title RFC: Update spec to make kernel validation optional Update spec to make kernel validation optional Jan 31, 2025
@RossBrunton RossBrunton marked this pull request as ready for review January 31, 2025 15:50
@RossBrunton RossBrunton requested review from a team as code owners January 31, 2025 15:50
ThreadsPerBlock[0], ThreadsPerBlock[1], ThreadsPerBlock[2], LocalSize,
CuStream, const_cast<void **>(ArgPointers.data()), nullptr));
} catch (ur_result_t Err) {
// Cuda returns UR_RESULT_ERROR_INVALID_VALUE if the args are incorrect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be true, but it's not only if the args are incorrect. I suspect cuLaunchKernel can return CUDA_ERROR_INVALID_VALUE -> UR_RESULT_ERROR_INVALID_VALUE for many reasons (the docs don't say, obviously). Would that be a problem, if we return UR_RESULT_ERROR_INVALID_KERNEL_ARGS mistakenly in some cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just as a matter of style, should we say that "Cuda" returns a UR error code? Could that mislead people? Its' really UR_CHECK_ERROR that's translating CUDA_ERROR_INVALID_VALUE to UR_RESULT_ERROR_INVALID_VALUE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a better option; INVALID_VALUE isn't a correct return for the test (since all the arguments to enqueueKernelLaunch are valid - It's just the arguments passed to the kernel itself that are wrong).

Should I leave it like it is in this MR (and issues can be patched when they are found), or revert this change and mark the test as XFAIL until a better solution is found?

Copy link
Contributor

@frasercrmck frasercrmck Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've searched the internet a bit for cuLaunchKernel and INVALID_VALUE and all results I've seen are because people are compiling for a different architecture than the one they're running on ([1]) or they're launching with incorrect threads/dimensions ([2], [3]) or even CUDA context mismanagement ([4]). I've not seen anything saying that this error is returning when arguments are invalid.

There's also [5] which states:

The error CUDA_ERROR_INVALID_VALUE will be returned if kernel parameters are specified with both kernelParams and extra (i.e. both kernelParams and extra are non-NULL).

Of course, some of the situations I've encountered above may well never occur in the corresponding UR usage, because they'd be caught by other UR APIs.

But still my gut is telling me we're going to report false positives, which is less useful than just accepting the CUDA API's weaknesses. If we can't be sure, I'd lean towards keeping it a known test failure (XFAIL).

I'd take @npmiller's thoughts on this.

[1], [2], [3], [4], [5]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a tough one, in practice I suspect that error code mostly comes from [1], but we can, and likely already are, ensuring the sizes are correct in the adapter before calling cuLaunchKernel, and it still goes through UR_CHECK_ERROR so the actual error code information still goes in the logger no matter what code we actually return.

That being said I think I also lean towards just keeping the error code as-is and leaving the test XFAIL, ultimately there are still some cases where this error code might get returned that aren't about the kernel arguments and that would be very confusing. Since this is optional now I think it makes more sense to just not support it for the CUDA adapter as we can't support it properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it XFAIL for the moment, and it can be fixed properly in a separate task.

Several adapters don't support validating kernel signatures when
enqueued. To handle this, we now allow urEnqueueKernelLaunch to return
`SUCCESS` even when parameters are invalid.

Some tests have also been updated. The CUDA adapter has also been
updated to handle invalid arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conformance Conformance test suite issues. cuda CUDA adapter specific issues loader Loader related feature/bug specification Changes or additions to the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants