-
Couldn't load subscription status.
- Fork 0
Yolo examples updates #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| find_package(absl CONFIG REQUIRED PATHS ${EXECUTORCH_ROOT}/cmake-out) | ||
| find_package(re2 CONFIG REQUIRED PATHS ${EXECUTORCH_ROOT}/cmake-out) | ||
| find_package(tokenizers CONFIG REQUIRED PATHS ${EXECUTORCH_ROOT}/cmake-out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need tokenizers and other dependencies for YOLO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to build it without these? The yolo example doesn't use them but I thought some dependencies need them. I will check again if we can build without these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without these I see the error below:
CMake Error at CMakeLists.txt:37 (find_package):
Found package configuration file:
/home/mcavus/executorch/executorch/cmake-out/lib/cmake/ExecuTorch/executorch-config.cmake
but it set executorch_FOUND to FALSE so package "executorch" is considered
to be NOT FOUND. Reason given by package:
The following imported targets are referenced, but are missing:
tokenizers::tokenizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the example was fully functional when it was merged. That's a bit strange, how do you build the example?
You can find a test script over there https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_yolo12.sh
I think meta guys could potentially help with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build commands below should work right? I can reproduce with the main branch. I see similar error either with OV backend or XNNPACK by the way. I will ping them in discord.
rm -rf build
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DUSE_XNNPACK_BACKEND=OFF -DUSE_OPENVINO_BACKEND=ON ..
make -j$(nproc)
| while (!ready_q.empty() && scale_q.size() < frame_queue_size) { | ||
| frame_ctx *scale_f = ready_q.front(); | ||
| scale_q.push(std::make_pair(scale_f, std::async(std::launch::async, scale_with_padding, std::ref(scale_f->frame), &(scale_f->pad_x), &(scale_f->pad_y), &(scale_f->scale), img_dims))); | ||
| ready_q.pop(); | ||
| } | ||
| const et_timestamp_t after_execute = et_pal_current_ticks(); | ||
| time_spent_executing += after_execute - before_execute; | ||
| iters++; | ||
|
|
||
| if (!(iters % progress_bar_tick)) { | ||
| const int precent_ready = (100 * iters) / video_lenght; | ||
| std::cout << iters << " out of " << video_lenght | ||
| << " frames are are processed (" << precent_ready << "\%)" | ||
| << std::endl; | ||
| while (!scale_q.empty() && input_q.size() < frame_queue_size) { | ||
| auto status = scale_q.front().second.wait_for(std::chrono::milliseconds(1)); | ||
| if (status == std::future_status::ready) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General questions:
- Looks like you are implementing inference request queue, is it possible to utilize the standard openvino API somehow?
- This is a real-time demo, the data is streamed sequentially and should be shoved sequentially, how does it work with your updated?
- I believe it is unfair to collect only model inference time without pre and post processing and claim it as a FPS stats. In real application the pre and post processing will affect the FPS
In general - could you please state the motives behind this PR? What are the purpose and improvements this PR introducing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We could try using async call inside openvino backend. This way, we could simply call forward function from executorch application and let openvino schedule the tasks. But I found two issues with it (explained below). But I don't think we need it anyways. The model inference still executes sequentially as we have a mutex lock on that part and we don't need to execute model inference asynchronously for this use case. I explained it more in 2.
- We claim to support xnnpack with this application as well. We may need to add a lot of customizations only for openvino in that case.
- Single executorch module seems to be using same output buffer for all executions. Upcoming task may override the result of the previous task. This seems to be risky (and it fails for xnnpack). We can create multiple executorch modules but in that case, I don't know if it would share the same openvino backend object. If it doesn't we may not be able to use async execution as intended and we may have additional memory overheads.
- We can assume the data is streamed sequentially and shoved sequentially. But still we can use pipelining for preprocess, infer, and postprocess which was the intention in this PR. So, as the first frame completes preprocessing on CPU and starts model execution on GPU (or NPU), we can also start preprocessing for the second frame as long as it is ready. Once the first frame completes GPU process, the second frame task can be assigned to GPU while the first frame start postprocessing.
Also, in a real time stream, it will be better to limit the size of ready queue (maybe 2 or even 1). Larger ready queue size can cause delays in the output video. - I didn't understand this part. The time measurement should be already for end-to-end object detection process (collecting timing before and after the whole while loop). It increases iters only when a frame retires. At the end it calculates the timing based on total while loop timing and total number of frames retired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks
CPU: Intel(R) Core(TM) Ultra 5 238V
Model: Yolo12s
Model input size: 640x640