-
Notifications
You must be signed in to change notification settings - Fork 236
Skip IPC mempool tests on WSL #1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Wow, you both jumped on this PR fast. LOL. I had to double check to make sure I didn't mark this as ready by mistake. |
Co-authored-by: Phillip Cloud <[email protected]>
Co-authored-by: Phillip Cloud <[email protected]>
Co-authored-by: Phillip Cloud <[email protected]>
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we avoid skipping tests based on only the platform or hard-coded conditions in cuda.core. What we should do is to query the driver and see if a functionality is supported or not. The reason is that the driver can be updated to gain new capabilities, and checks like this would not keep up.
@rparolin could you check what's the returned values on WSL?
>>> dev = Device()
>>> dev.properties.mempool_supported_handle_types
9
>>> dev.properties.handle_type_posix_file_descriptor_supported
True
>>> dev.properties.handle_type_win32_handle_supported
False
>>> dev.properties.handle_type_win32_kmt_handle_supported
FalseIt could be that our existing skip condition is not sufficient:
cuda-python/cuda_core/tests/test_ipc_mempool.py
Lines 21 to 36 in 62d6963
| @pytest.fixture(scope="function") | |
| def ipc_device(): | |
| """Obtains a device suitable for IPC-enabled mempool tests, or skips.""" | |
| # Check if IPC is supported on this platform/device | |
| device = Device() | |
| device.set_current() | |
| if not device.properties.memory_pools_supported: | |
| pytest.skip("Device does not support mempool operations") | |
| # Note: Linux specific. Once Windows support for IPC is implemented, this | |
| # test should be updated. | |
| if not device.properties.handle_type_posix_file_descriptor_supported: | |
| pytest.skip("Device does not support IPC") | |
| return device |
This is what I'm seeing on my Windows 11 24H2 WSL2 Ubuntu 24.04 workstation: Using current cuda-python main (at commit dbde2b4). Ubuntu side: Windows side: |
|
|
/ok to test 57ead02 |
|
/ok to test 0a9be40 |
|
/ok to test f7deec9 |
|
/ok to test 90590ec |
…hon into rparolin/skip_ipc_on_wsl
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkraus14 Sorry but I think making this an installable package is a very poor UX. It means every single new contributor would stumble during first test run, and have to learn that there is a separate package that we must install locally first.
Also, by doing so we need to touch the CI workflows so as to make it installed first before testing. We seem to be shooting ourselves in the foot without justifiable gain?
I had similar thoughts but figured that it could be mitigated by our |
|
/ok to test cd6b714 |
|
The CI shows what I noted earlier:
|
|
/ok to test baac405 |
|
/ok to test bbe82c8 |
This reverts commit baac405.
|
/ok to test 7726e05 |
|
I pushed commit bbe82c8 so that we don't get blocked by this package discussion. If the helper module is installed as a package, it's used. Otherwise, we find it via relative path. We can revisit this discussion later and find a way to avoid friction with local development and CI testing. |
|
I agree that packaging this as a separate package is a poor developer UX. Maybe we should introduce a Regardless, we can do that in a follow up PR. |
|
Skip IPC mempool tests on WSL
On WSL2, cuMemPoolCreate with POSIX handle returns CUDA_ERROR_INVALID_VALUE despite capability flags indicating support. This is a platform/driver limitation, not a bug in our code.
I confirmed the failures are driver-related rather than a bug in our code. The host is WSL2 (kernel “microsoft-standard-WSL2”) with NVIDIA driver 581.15 (CUDA 13.0). Device attributes report mempools and POSIX FD handle support, yet a minimal, direct driver repro calling cuMemPoolCreate with handleTypes=CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR consistently returns CUDA_ERROR_INVALID_VALUE, while the same call with handleTypes=CU_MEM_HANDLE_TYPE_NONE succeeds.
Output:
Because this reproduces outside our code with the same CUmemPoolProps we set, the issue lies in the driver/runtime path under WSL2 (consistent with known IPC limitations there), not our implementation. Therefore, we skip IPC mempool tests on WSL to keep the suite portable and green, while leaving the tests enabled on native Linux and other environments where the driver accepts IPC pools.