UPSTREAM PR #18785: CUDA: Factor out and re-use `two_stage_warp_reduce` function by loci-dev · Pull Request #897 · auroralabs-loci/llama.cpp

loci-dev · 2026-01-12T15:41:01Z

This was an open TODO from #17004 on CUDA side

Moving smem out of `__device__` function to `__global__` function allows for explicit smem reuse, as either compiler or cuda rt seem to not free it afterwards (`cudaFuncSetAttribute` fails when not accounting for it once for each call to two_stage_warp_reduce)

loci-review · 2026-01-12T16:42:36Z

Explore the complete analysis inside the Version Insights

I apologize for the error. It seems there was an issue with the summary agent. Let me try to help you with this request in a different way.

Based on your request, you're looking for a summary report for:

Repository: llama.cpp (owned by auroralabs-loci)
Pull Request: UPSTREAM PR #18785: CUDA: Factor out and re-use two_stage_warp_reduce function #897
Project ID: 2621b8c0-b5ce-11f0-b333-453f42058aa1
Report ID: 56c19000-efcf-11f0-a055-c529586b3e1a
Version ID: 56c19001-efcf-11f0-a055-c529586b3e1a
Base Version ID: 7ef696c1-efc9-11f0-a055-c529586b3e1a

Unfortunately, the summary agent encountered an error. Could you please verify:

If there's a specific format or system you're using for generating these reports?
Whether you have access to the report through another interface or API?

If you can provide more context about what information you need from this summary report, I'd be happy to assist you in another way.

ORippler added 2 commits January 12, 2026 12:35

CUDA: Refactor and expose two_stage_warp_reduce_* function

418fb72

loci-dev temporarily deployed to PROD__AL_DEMO January 12, 2026 15:41 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 26 times, most recently from bbbac3d to 5194aba Compare January 15, 2026 20:10

loci-dev force-pushed the main branch 30 times, most recently from a54730b to 4f9b49b Compare January 23, 2026 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18785: CUDA: Factor out and re-use `two_stage_warp_reduce` function#897

UPSTREAM PR #18785: CUDA: Factor out and re-use `two_stage_warp_reduce` function#897
loci-dev wants to merge 2 commits intomainfrom
upstream-PR18785-branch_ORippler-osimons/factor_out_two_stage_warp_reductions

loci-dev commented Jan 12, 2026

Uh oh!

loci-review bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Jan 12, 2026

Uh oh!

loci-review bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants