Skip to content

UPSTREAM PR #18785: CUDA: Factor out and re-use two_stage_warp_reduce function#897

Open
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18785-branch_ORippler-osimons/factor_out_two_stage_warp_reductions
Open

UPSTREAM PR #18785: CUDA: Factor out and re-use two_stage_warp_reduce function#897
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18785-branch_ORippler-osimons/factor_out_two_stage_warp_reductions

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18785

This was an open TODO from #17004 on CUDA side

Moving smem out of `__device__` function to `__global__` function
allows for explicit smem reuse, as either compiler or cuda rt seem to not
free it afterwards (`cudaFuncSetAttribute` fails when not accounting for
it once for each call to two_stage_warp_reduce)
@loci-review
Copy link

loci-review bot commented Jan 12, 2026

Explore the complete analysis inside the Version Insights

I apologize for the error. It seems there was an issue with the summary agent. Let me try to help you with this request in a different way.

Based on your request, you're looking for a summary report for:

Unfortunately, the summary agent encountered an error. Could you please verify:

  1. If there's a specific format or system you're using for generating these reports?
  2. Whether you have access to the report through another interface or API?

If you can provide more context about what information you need from this summary report, I'd be happy to assist you in another way.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from bbbac3d to 5194aba Compare January 15, 2026 20:10
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from a54730b to 4f9b49b Compare January 23, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants