Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapters/level_zero_batch_event_status.cpp fails in pre-commit on PVC #16695

Closed
KornevNikita opened this issue Jan 20, 2025 · 5 comments · Fixed by #16843
Closed

Adapters/level_zero_batch_event_status.cpp fails in pre-commit on PVC #16695

KornevNikita opened this issue Jan 20, 2025 · 5 comments · Fixed by #16843
Assignees
Labels
bug Something isn't working confirmed

Comments

@KornevNikita
Copy link
Contributor

Describe the bug

test (E2E tests on Intel Ponte Vecchio GPU, ["Linux", "pvc"], ghcr.io/intel/llvm/ubuntu2404_intel... / E2E tests on Intel Ponte Vecchio GPU

******************** TEST 'SYCL :: Adapters/level_zero_batch_event_status.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
/__w/llvm/llvm/toolchain/bin//clang++  -Werror -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp -o /__w/llvm/llvm/build-e2e/Adapters/Output/level_zero_batch_event_status.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp -o /__w/llvm/llvm/build-e2e/Adapters/Output/level_zero_batch_event_status.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 7
env SYCL_PI_LEVEL_ZERO_BATCH_SIZE=4 SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 SYCL_UR_TRACE=2 UR_L0_DEBUG=1 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/Adapters/Output/level_zero_batch_event_status.cpp.tmp.out 2>&1 | /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp
# executed command: env SYCL_PI_LEVEL_ZERO_BATCH_SIZE=4 SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 SYCL_UR_TRACE=2 UR_L0_DEBUG=1 env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/Adapters/Output/level_zero_batch_event_status.cpp.tmp.out
# note: command had no output on stdout or stderr
# executed command: /__w/llvm/llvm/toolchain/bin/FileCheck /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp
# .---command stderr------------
# | /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp:30:11: error: CHECK: expected string not found in input
# | // CHECK: ---> urEnqueueKernelLaunch
# |           ^
# | <stdin>:502:21: note: scanning from here
# |  ---> urEventGetInfo
# |                     ^
# | <stdin>:530:2: note: possible intended match here
# |  ---> urQueueRelease
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /__w/llvm/llvm/llvm/sycl/test-e2e/Adapters/level_zero_batch_event_status.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           497: UR ---> CleanupCompletedEvent(Event, QueueLocked, true ) 
# |           498: UR <--- CleanupCompletedEvent(Event, QueueLocked, true )(UR_RESULT_SUCCESS) 
# |           499: UR ---> urEventReleaseInternal(Event) 
# |           500: UR <--- urEventReleaseInternal(Event)(UR_RESULT_SUCCESS) 
# |           501:  <--- urQueueFinish(.hQueue = 0x11eccdc0) -> UR_RESULT_SUCCESS; 
# |           502:  ---> urEventGetInfo 
# | check:30'0                         X error: no match found
# |           503:  <--- urEventGetInfo(.hEvent = 0x11ee4f50, .propName = UR_EVENT_INFO_COMMAND_EXECUTION_STATUS, .propSize = 4, .pPropValue = 0x7ffe09aec514 (UR_EVENT_STATUS_COMPLETE), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS; 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           504: Ev2 has completed 
# | check:30'0     ~~~~~~~~~~~~~~~~~~
# |           505: Test Pass 
# | check:30'0     ~~~~~~~~~~
# |           506:  ---> urEventRelease 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~
# |           507: UR ---> urEventReleaseInternal(Event) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           525: Inserting 0x11ee[48](https://github.com/intel/llvm/actions/runs/12868856345/job/35880282254?pr=16671#step:24:49)20 event (Host Visible: 0x11ee5b70, Profiling: 0, Counter: 0, Device: 0x11e8c970) into cache 0x12009b60 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           526: UR ---> urQueueReleaseInternal(Queue) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           [52](https://github.com/intel/llvm/actions/runs/12868856345/job/35880282254?pr=16671#step:24:53)7: UR <--- urQueueReleaseInternal(Queue)(UR_RESULT_SUCCESS) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           528: UR <--- urEventReleaseInternal(Event)(UR_RESULT_SUCCESS) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           529:  <--- urEventRelease(.hEvent = 0x11ee4820) -> UR_RESULT_SUCCESS; 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           [53](https://github.com/intel/llvm/actions/runs/12868856345/job/35880282254?pr=16671#step:24:54)0:  ---> urQueueRelease 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~
# | check:30'1      ?                    possible intended match
# |           531: UR ---> Queue->synchronize() 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           532: UR <--- Queue->synchronize()(UR_RESULT_SUCCESS) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           533: ZE ---> zeFenceDestroy(it->second.ZeFence) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           534: UR ---> urQueueReleaseInternal(Queue) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           535: ZE ---> zeCommandQueueDestroy(ZeQueue) 
# | check:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

To reproduce

  1. Include a code snippet that is as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the command which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment

  • OS: [e.g Windows/Linux]
  • Target device and vendor: [e.g. Intel GPU]
  • DPC++ version: [e.g. commit hash or output of clang++ --version]
  • Dependencies version: [e.g. the output of sycl-ls --verbose]

Additional context

No response

@aelovikov-intel
Copy link
Contributor

@intel/unified-runtime-reviewers , can you follow up on this? I believe Adapters/ tests are owned by your team.

@KornevNikita KornevNikita changed the title Adapters/level_zero_batch_event_status.cpp fails in post-commit on PVC Adapters/level_zero_batch_event_status.cpp fails in pre-commit on PVC Jan 29, 2025
@nrspruit
Copy link
Contributor

Hello @KornevNikita ,

Is this sporadic? I cannot get this to reproduce on several PVC machines, it also looks like this test passes in many pre-commit test runs ie:
Passed here:
https://github.com/intel/llvm/actions/runs/13020477272/job/36320344367?pr=16431
https://github.com/intel/llvm/actions/runs/13035060706/job/36365533811

this does not appear to be consistent, do we have a full log of the differences in the execution output? The test is actually passing, but the text check is what is failing when it should not.

@aelovikov-intel
Copy link
Contributor

Is this sporadic?

Yes, we see random failures in pre-commit jobs of unrelated changes.

@nrspruit
Copy link
Contributor

Is this sporadic?

Yes, we see random failures in pre-commit jobs of unrelated changes.

ok, is there anyway for one to get the full e2e.log from the test run that fails? I have been unable to reproduce this issue. The most likely fix will be to update the text check in the test to be more flexible, the adapter has changed and this test is too rigid in the expected prints to console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants