Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zeCommandQueueCreate spontaneously segfault when creating one queue per thread #568

Open
pengtu opened this issue Sep 29, 2022 · 4 comments

Comments

@pengtu
Copy link

pengtu commented Sep 29, 2022

This is a cutdown case from CHIP-SPV/chipStar#146. When calling zeCommandQueueCreate from multiple threads, it spontaneously segfaults.

The call stack trace is:

Thread 101 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff88404700 (LWP 7105)]
0x00007ffff741f5c4 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
(gdb) bt
#0 0x00007ffff741f5c4 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#1 0x00007ffff7153958 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#2 0x00007ffff714f3c5 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#3 0x00007ffff714f523 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#4 0x00007ffff714f69a in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#5 0x00007ffff71567be in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#6 0x00007ffff7ef56fd in zeCommandQueueCreate () from /usr/local/lib/libze_loader.so.1
#7 0x0000555555555479 in QueuePerThread () at test_queue.cc:28
#8 0x00005555555575aa in std::__invoke_impl<void, void ()()> (__f=@0x55555585b308: 0x55555555543a <Q
ueuePerThread()>) at /usr/include/c++/9/bits/invoke.h:60
#9 0x0000555555557542 in std::__invoke<void (
)()> (__fn=@0x55555585b308: 0x55555555543a <QueuePerThr
ead()>) at /usr/include/c++/9/bits/invoke.h:95
#10 0x00005555555574d4 in std::thread::_Invoker<std::tuple<void ()()> >::_M_invoke<0ul> (this=0x55555
585b308) at /usr/include/c++/9/thread:244
#11 0x0000555555557491 in std::thread::_Invoker<std::tuple<void (
)()> >::operator() (this=0x55555585b
308) at /usr/include/c++/9/thread:251
#12 0x0000555555557462 in std::thread::_State_impl<std::thread::Invoker<std::tuple<void (*)()> > >::
M_run (this=0x55555585b300) at /usr/include/c++/9/thread:195
#13 0x00007ffff7d9fde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff7eb3609 in start_thread (arg=) at pthread_create.c:477
#15 0x00007ffff7bdb133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@pengtu
Copy link
Author

pengtu commented Sep 29, 2022

Here is a reproducer:

#include <cassert>
#include <climits>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <level_zero/ze_api.h>
#include <iostream>
#include <vector>
#include <limits>
#include <thread>

#define check(ans)                                                             \
  { do_check((ans), __FILE__, __LINE__); }
void do_check(ze_result_t code, const char *file, int line) {
  if (code != ZE_RESULT_SUCCESS) {
    fprintf(stderr, "Failed: %d at %s %d\n", code, file, line);
    exit(1);
  }
}

ze_context_handle_t context;
ze_device_handle_t device;
ze_command_queue_desc_t cmdQueueDesc;

static void QueuePerThread()
{
  ze_command_queue_handle_t command_queue;
  check(zeCommandQueueCreate(context, device, &cmdQueueDesc, &command_queue));
}

int main()
{
  // Initialize driver
  check(zeInit(ZE_INIT_FLAG_GPU_ONLY));

  // Retrieve driver
  uint32_t driverCount = 0;
  check(zeDriverGet(&driverCount, nullptr));

  ze_driver_handle_t driverHandle;
  check(zeDriverGet(&driverCount, &driverHandle));

  ze_context_desc_t contextDesc = {};
  check(zeContextCreate(driverHandle, &contextDesc, &context));

  // Retrieve device
  uint32_t deviceCount = 0;
  check(zeDeviceGet(driverHandle, &deviceCount, nullptr));

  // ze_device_handle_t device;
  deviceCount = 1;
  check(zeDeviceGet(driverHandle, &deviceCount, &device));

  // Print some properties
  ze_device_properties_t deviceProperties = {};
  check(zeDeviceGetProperties(device, &deviceProperties));

  // Create command queue
  uint32_t numQueueGroups = 0;
  check(zeDeviceGetCommandQueueGroupProperties(device, &numQueueGroups, nullptr));
  if (numQueueGroups == 0)
  {
    return 1;
  }
  std::vector<ze_command_queue_group_properties_t> queueProperties(numQueueGroups);
  check(zeDeviceGetCommandQueueGroupProperties(device, &numQueueGroups,
                                               queueProperties.data()));

  ze_command_queue_handle_t command_queue;
  cmdQueueDesc = {};

  for (uint32_t i = 0; i < numQueueGroups; i++)
  {
    if (queueProperties[i].flags & ZE_COMMAND_QUEUE_GROUP_PROPERTY_FLAG_COMPUTE)
    {
      cmdQueueDesc.ordinal = i;
    }
  }
  cmdQueueDesc.index = 0;
  cmdQueueDesc.mode = ZE_COMMAND_QUEUE_MODE_ASYNCHRONOUS;

  // 1) Create qeueu with the main tread
  QueuePerThread();

  // 2) Create queue with a different thread
  constexpr unsigned int MAX_THREAD_CNT = 100;
  std::vector<std::thread> threads(MAX_THREAD_CNT);

  for (auto &th : threads) {
    th = std::thread(QueuePerThread);
  }

  for (auto& th : threads) {
    th.detach();
  }
}

To compile, use "g++ -O0 -g test_queue.cc -lze_loader -lpthread" or clang++.

@jandres742
Copy link

thanks. Taking a look.

@pvelesko
Copy link

Any updates on this? @pengtu @JablonskiMateusz

@pengtu
Copy link
Author

pengtu commented Apr 26, 2023

@pvelesko: The bug was rejected by the driver team.

Quote of the analysis below:

This is a problem in your application.

You have the threads spawning here:

// 1) Create qeueu with the main thread
QueuePerThread();
// 2) Create queue with a different thread
constexpr unsigned int MAX_THREAD_CNT = 100;
std::vectorstd::thread threads(MAX_THREAD_CNT);
printf("spawning threads\n");
for (auto &th : threads) {
th = std::thread(QueuePerThread);
}
Then, you are attempting to "detach" from the threads and have the main thread exit, ie:

for (auto& th : threads) {
th.detach();
}
printf("done\n");
This is not a legal usage of the resources because the L0 Driver/Device resources are "shared" between the threads and the L0 device and L0 driver resources are allocated at zeInit (which only occurs once per process, not once per thread).

What is occurring is that the main program is exiting before all the threads finished, with the device and driver resources freed while your threads were still running.

The correct way to write this program is to change to the following:

// 1) Create qeueu with the main thread
QueuePerThread();
// 2) Create queue with a different thread
constexpr unsigned int MAX_THREAD_CNT = 100;
std::vectorstd::thread threads(MAX_THREAD_CNT);
printf("spawning threads\n");
for (auto &th : threads) {
th = std::thread(QueuePerThread);
}
printf("finished with spawning threads\n");
for (auto& th : threads)
th.join();
printf("finished joining the threads\n");
printf("done\n");

The main program must not start releasing the resources for the devices and driver before the threads have finished otherwise this segfault is expected.

Basically, the L0 device resources were freed resulting in a thread attempting to create a queue without any data structures for the device being available ie:

process locked thread id read, 0x5575b7043e20
140001723520768
processLocked function
freeing allocation for reuse <- The L0 device was freed from memory and thus the allocation list for reuse was removed.
140000951785216
140000926607104
processLocked function
process locked thread id read, 0x5575b6fca3a8
140003760049024
process locked thread id read, 0x5575b7043e20
140001287329536140000012261120

processLocked function
process locked thread id read, 0x5575b7043e20
140000020653824
processLocked function
process locked thread id read, (nil) <- This "nil" should have not occurred, this means that the thread was still trying to allocate when the process exited removing the device resources.
140000951785216
processLocked function
process locked thread id read, 0x5575b7043e20
140000934999808
./test_queue.run: line 1: 3411044 Segmentation fault (core dumped) ./test_queue
failed program

Please fix your test program. This is not a bug, but a misunderstanding of the functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants