How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? #7491

antonysigma · 2023-04-10T15:02:33Z

Hi I noticed the PR #5602 has been merged to the main branch. Thank you!

If I were to compare the performance between Li2018 and Anderson2021, how do I configure machine_param=n_threads,cache_size,compute_cache_tradeoff accordingly to equalize the autoscheduler environment?

Similarly, how do I configure the machine_param to compare the performance between Mullapudi2016 and Anderson2021?

To clarify, the current version of Maullapudi2016 can only generate CPU schedules. But, the original paper suggested a manual intervention step to generate GPU schedules. That is, to edit the generated *.schedule.h code to replace all .vectorize(...).parallel(...) with gpu_threads(...).gpu_tiles(...) in accordance to the following rules. In the past, I found it highly effective for algorithms having less than ~5 Halide stages.

We configure the auto-scheduler to target the GPU by setting the PARALLELISM_THRESHOLD to 128, VECTOR_WIDTH to 32, and
CACHE_SIZE to 48 KB. Additionally, we add two new parameters
TARGET_THREADS_PER_BLOCK and MAX_THREADS_PER_BLOCK whose values are set to 128 and 2048 respectively. These parameters enable the
auto-scheduler to avoid tiling configurations that generate too few
or too many threads per GPU thread block. The inlining, tiling, and
grouping processes are otherwise similar to the CPU case. Groups
resulting from merging are mapped to CUDA kernels by designating
the outer tile loops as GPU block grid dimensions and the inner tile
loops as GPU thread block dimensions. All intermediate buffers
within a group are allocated in GPU shared memory.
http://graphics.cs.cmu.edu/projects/halidesched/

The text was updated successfully, but these errors were encountered:

antonysigma · 2023-08-02T06:36:05Z

how do I configure machine_param=n_threads,cache_size,compute_cache_tradeoff accordingly to equalize the autoscheduler environment?

Found it. since Halide 16.0, the machine_param command line argument becomes a named struct. Previously, we specify the params with comma separated tuple.

Halide/src/autoschedulers/anderson2021/CostModel.h

Line 18 in 0839270

struct Anderson2021Params {

Closing this thread.

antonysigma changed the title ~~How to evaluate/compare the GPU auto-scheduler performance?~~ How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? Apr 10, 2023

antonysigma closed this as completed Aug 2, 2023

antonysigma mentioned this issue Aug 21, 2023

GPU autoscheduling with Mullapdui2016: the reference implementation #7787

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? #7491

How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? #7491

antonysigma commented Apr 10, 2023 •

edited

Loading

antonysigma commented Aug 2, 2023

How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? #7491

How to evaluate/compare the GPU auto-scheduler performance btw Anderson2021 and Mullapudi2016? #7491

Comments

antonysigma commented Apr 10, 2023 • edited Loading

antonysigma commented Aug 2, 2023

antonysigma commented Apr 10, 2023 •

edited

Loading