Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
f39512c to
203e311
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2882 +/- ##
==========================================
+ Coverage 96.76% 97.08% +0.32%
==========================================
Files 610 610
Lines 47500 47515 +15
==========================================
+ Hits 45960 46128 +168
+ Misses 1540 1387 -153
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
AcceleratedKernels.mapreduce in max_scaled_speed
AcceleratedKernels.mapreduce in max_scaled_speedAcceleratedKernels.mapreduce in max_scaled_speed and integrate_via_indices
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ranocha
left a comment
There was a problem hiding this comment.
Thanks! Do you have some benchmark results showing the impact of this?
Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
|
One of the "annoying" things here is that mapreduce implies a contraction to a scalar; thus, at the end of the computation, we have to move data from the device to the host. There has been some discussion before to use an AsyncNumber type, but that wouldn't help here since we immediately use the values. |
|
What is the status of this PR? Is it basically ready except for formatting issues and failing tests due to the OrdinaryDiffEqSDIRK problem? |
Yeah, ready from myside. |
|
Can you please show some benchmarks comparing this to the current version on |
|
Using the profiler to look at how time is being spent, the previous implementation used a KernelAbstraction kernel (2.244ms) + GPUArrays mapreduce (5us). The AcceleratedKernel implementation is 2.274ms (only one kernel launch) Using the NVTX ranges from #2908 After: I think I mistakenly said during our weekly meeting that the previous |
|
So it's within the error bars of having the same performance, right? Is it worth the additional dependency and expected to be better in the log term? |
It allows us to use one consistent code-pattern for reductions, so I would say the additional dependency is worth it. |
Demonstrate how to use AcceleratedKernels.
#2590 (comment)
There are several more places where we would need to do this,
but this is the crux of it.
fixes: #2823