Fixes the incorrect argument in the prefix-prefill test cases by sighingnow · Pull Request #3246 · vllm-project/vllm

sighingnow · 2024-03-07T03:01:12Z

See also comment in #3007

Signed-off-by: Tao He <sighingnow@gmail.com>

zhuohan123

LGTM! Thanks for the fix!

zhuohan123 · 2024-03-07T07:20:53Z

tests/kernels/test_prefix_prefill.py

+    # Need this, otherwise when we capture the graph the process for GPU 1 would run on both
+    # GPU0 and GPU1 and things would hang
+    #
+    # see also similar issue: https://github.com/Dao-AILab/flash-attention/issues/523
+    torch.cuda.set_device(device)


I am confused why do we need this? Can you give a more detailed example?

There would be an error if we run the test case in environments with 2 GPU card, the test case test_contexted_kv_attention[cuda:0-dtype0-128-64-64] passed, but when run
test_contexted_kv_attention[cuda:1-dtype0-128-64-64] (note now it uses cuda:1), it failed and complains:

bin = self.cache[device][key] if not warmup: > bin.c_wrapper( grid_0, grid_1, grid_2, bin.num_warps, bin.num_ctas, bin.clusterDims[0], bin.clusterDims[1], bin.clusterDims[2], bin.shared, stream, bin.cu_function, CompiledKernel.launch_enter_hook, CompiledKernel.launch_exit_hook, bin, *bin.assemble_tensormap_to_arg(non_constexpr_arg_values), ) E ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) /usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py:550: ValueError

I'm not very clear about the root causes, but I found the same issue report in flash-attention and the fix from here: Dao-AILab/flash-attention#523 (comment), and confirmed it works.

sighingnow · 2024-03-11T02:35:25Z

Hi @zhuohan123 any further comments on this patch?

Thanks!

sighingnow · 2024-03-16T03:52:45Z

Hi @zhuohan123 @simon-mo, could you please take another look at this PR?

Thanks!

Fixes the incorrect argument in the prefix-prefill test cases

bfcf4e0

Signed-off-by: Tao He <sighingnow@gmail.com>

sighingnow mentioned this pull request Mar 7, 2024

Enables GQA support in the prefix prefill kernels #3007

Merged

zhuohan123 approved these changes Mar 7, 2024

View reviewed changes

simon-mo merged commit 3123f15 into vllm-project:main Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes the incorrect argument in the prefix-prefill test cases#3246

Fixes the incorrect argument in the prefix-prefill test cases#3246
simon-mo merged 1 commit intovllm-project:mainfrom
sighingnow:ht/fixes-prefix-prefill-tests

sighingnow commented Mar 7, 2024

Uh oh!

zhuohan123 left a comment

Uh oh!

zhuohan123 Mar 7, 2024

Uh oh!

sighingnow Mar 7, 2024

Uh oh!

sighingnow commented Mar 11, 2024

Uh oh!

sighingnow commented Mar 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sighingnow commented Mar 7, 2024

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

sighingnow Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

sighingnow commented Mar 11, 2024

Uh oh!

sighingnow commented Mar 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants