Use `AcceleratedKernels.mapreduce` in `max_scaled_speed` and `integrate_via_indices` by vchuravy · Pull Request #2882 · trixi-framework/Trixi.jl

vchuravy · 2026-03-25T09:53:55Z

Demonstrate how to use AcceleratedKernels.

#2590 (comment)

There are several more places where we would need to do this,
but this is the crux of it.

fixes: #2823

github-actions · 2026-03-25T09:54:07Z

src/callbacks_step/stepsize.jl

codecov · 2026-03-25T11:00:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.08%. Comparing base (ce719f3) to head (5a13214).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2882      +/-   ##
==========================================
+ Coverage   96.76%   97.08%   +0.32%     
==========================================
  Files         610      610              
  Lines       47500    47515      +15     
==========================================
+ Hits        45960    46128     +168     
+ Misses       1540     1387     -153

Flag	Coverage Δ
unittests	`97.08% <100.00%> (+0.32%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/callbacks_step/stepsize.jl

sloede · 2026-03-25T12:58:32Z

Thanks for this PR with a proof of concept. I'd hold off with merging this to #2590 and do this at a later stage, so as not to further delay the merge of #2590 to main.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ranocha

Thanks! Do you have some benchmark results showing the impact of this?

src/callbacks_step/analysis_dg2d.jl

src/callbacks_step/analysis_dg3d.jl

src/callbacks_step/stepsize.jl

src/callbacks_step/analysis_dg2d.jl

src/callbacks_step/analysis_dg3d.jl

Co-authored-by: Valentin Churavy <v.churavy@gmail.com>

src/callbacks_step/analysis_dg2d.jl

src/callbacks_step/analysis_dg3d.jl

src/callbacks_step/analysis_dg2d.jl

vchuravy · 2026-03-29T16:03:16Z

One of the "annoying" things here is that mapreduce implies a contraction to a scalar; thus, at the end of the computation, we have to move data from the device to the host. There has been some discussion before to use an AsyncNumber type, but that wouldn't help here since we immediately use the values.

ranocha · 2026-03-30T15:02:12Z

What is the status of this PR? Is it basically ready except for formatting issues and failing tests due to the OrdinaryDiffEqSDIRK problem?

vchuravy · 2026-04-02T09:23:18Z

Is it basically ready

Yeah, ready from myside.

ranocha · 2026-04-02T11:22:50Z

Can you please show some benchmarks comparing this to the current version on main?

vchuravy · 2026-04-02T13:57:11Z

Using the profiler to look at how time is being spent, the previous implementation used a KernelAbstraction kernel (2.244ms) + GPUArrays mapreduce (5us). The AcceleratedKernel implementation is 2.274ms (only one kernel launch)

Using the NVTX ranges from #2908
Before:

│    1.16% │   10.71 ms │     5 │   2.14 ms ± 0.01   (  2.13 ‥ 2.16)    │ Trixi.calculate dt       │

After:

│    1.09% │   10.74 ms │     5 │   2.15 ms ± 0.04   (  2.09 ‥ 2.2)     │ Trixi.calculate dt       │

I think I mistakenly said during our weekly meeting that the previous calulcate_dt was host based, but that is only true for the other code I rewrote as part of this PR.

ranocha · 2026-04-02T14:29:26Z

So it's within the error bars of having the same performance, right? Is it worth the additional dependency and expected to be better in the log term?

vchuravy · 2026-04-02T14:37:59Z

Is it worth the additional dependency and expected to be better in the log term?

It allows us to use one consistent code-pattern for reductions, so I would say the additional dependency is worth it.

ranocha

Thanks!

vchuravy force-pushed the vc/ak branch 2 times, most recently from f39512c to 203e311 Compare March 25, 2026 10:08

vchuravy commented Mar 25, 2026

View reviewed changes

src/callbacks_step/stepsize.jl Outdated Show resolved Hide resolved

vchuravy force-pushed the vc/ak branch from 203e311 to 181f405 Compare March 25, 2026 10:31

vchuravy force-pushed the vc/ak branch from 08c1d03 to 79206de Compare March 25, 2026 12:49

vchuravy marked this pull request as ready for review March 25, 2026 12:49

vchuravy commented Mar 25, 2026

View reviewed changes

src/callbacks_step/stepsize.jl Show resolved Hide resolved

vchuravy commented Mar 25, 2026

View reviewed changes

src/callbacks_step/stepsize.jl Show resolved Hide resolved

vchuravy changed the title ~~Use AcceleratedKernels.mapreduce max_scaled_speed~~ Use AcceleratedKernels.mapreduce in max_scaled_speed Mar 25, 2026

vchuravy changed the title ~~Use AcceleratedKernels.mapreduce in max_scaled_speed~~ Use AcceleratedKernels.mapreduce in max_scaled_speed and integrate_via_indices Mar 25, 2026

vchuravy force-pushed the vc/ak branch from cfd09d5 to 0c7ff40 Compare March 26, 2026 09:44

Base automatically changed from feature-gpu-offloading to main March 26, 2026 16:48

vchuravy force-pushed the vc/ak branch from 0c7ff40 to 340ad71 Compare March 26, 2026 16:54

vchuravy requested a review from benegee March 26, 2026 16:55

vchuravy and others added 6 commits March 27, 2026 08:29

use mapreduce from AcceleratedKernels in calc_max_scaled_speed

b7039db

convince Julia to not store MeshT in a closure

c095b47

Update src/callbacks_step/stepsize.jl

87c40b3

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

port integrate_via_indices to the GPU

93b5391

Apply suggestions from code review

8b61f81

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

remove allocate import

ebefe89

vchuravy force-pushed the vc/ak branch from 340ad71 to ebefe89 Compare March 27, 2026 07:30

vchuravy requested a review from sloede March 27, 2026 07:30

sloede mentioned this pull request Mar 27, 2026

Add synchronization statements to ensure timer output correctness on the GPU #2892

Merged

Merge branch 'main' into vc/ak

7f45694

ranocha requested changes Mar 29, 2026

View reviewed changes

src/callbacks_step/analysis_dg2d.jl Outdated Show resolved Hide resolved

src/callbacks_step/analysis_dg3d.jl Outdated Show resolved Hide resolved

src/callbacks_step/stepsize.jl Show resolved Hide resolved

src/callbacks_step/analysis_dg2d.jl Outdated Show resolved Hide resolved

cleanup

46bf944

vchuravy commented Mar 29, 2026

View reviewed changes

src/callbacks_step/analysis_dg2d.jl Outdated Show resolved Hide resolved

vchuravy commented Mar 29, 2026

View reviewed changes

src/callbacks_step/analysis_dg3d.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

f2cfe81

Co-authored-by: Valentin Churavy <v.churavy@gmail.com>

github-actions bot reviewed Mar 29, 2026

View reviewed changes

src/callbacks_step/analysis_dg2d.jl Outdated Show resolved Hide resolved

src/callbacks_step/analysis_dg3d.jl Outdated Show resolved Hide resolved

vchuravy commented Mar 29, 2026

View reviewed changes

src/callbacks_step/analysis_dg2d.jl Show resolved Hide resolved

benegee mentioned this pull request Mar 29, 2026

GPU porting #2822

Open

18 tasks

benegee added 2 commits March 31, 2026 15:59

Merge branch 'main' into vc/ak

f2b0871

fmt

c9c4898

DanielDoehring added the gpu label Mar 31, 2026

Merge branch 'main' into vc/ak

a0af8bc

vchuravy requested a review from ranocha April 2, 2026 09:22

Merge branch 'main' into vc/ak

6c6da0e

Merge branch 'main' into vc/ak

15d18ba

ranocha approved these changes Apr 2, 2026

View reviewed changes

ranocha enabled auto-merge (squash) April 2, 2026 15:05

ranocha disabled auto-merge April 3, 2026 07:24

Merge branch 'main' into vc/ak

5a13214

ranocha enabled auto-merge (squash) April 3, 2026 16:27

ranocha merged commit 9fc914f into main Apr 3, 2026
41 checks passed

ranocha deleted the vc/ak branch April 3, 2026 17:22

Conversation

vchuravy commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

sloede commented Mar 25, 2026

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vchuravy commented Mar 29, 2026

Uh oh!

ranocha commented Mar 30, 2026

Uh oh!

vchuravy commented Apr 2, 2026

Uh oh!

ranocha commented Apr 2, 2026

Uh oh!

vchuravy commented Apr 2, 2026

Uh oh!

ranocha commented Apr 2, 2026

Uh oh!

vchuravy commented Apr 2, 2026

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Mar 25, 2026 •

edited

Loading