diff --git a/docs/literate/src/files/index.jl b/docs/literate/src/files/index.jl index 7ad2b318d2..1fc025d84d 100644 --- a/docs/literate/src/files/index.jl +++ b/docs/literate/src/files/index.jl @@ -115,20 +115,26 @@ # software in the Trixi.jl ecosystem, and then run a simulation using Trixi.jl on said mesh. # In the end, the tutorial briefly explains how to simulate an example using AMR via `P4estMesh`. -# ### [16 Explicit time stepping](@ref time_stepping) +# ### [16 P4est mesh from gmsh](@ref p4est_from_gmsh) +#- +# This tutorial describes how to obtain a [`P4estMesh`](@ref) from an existing mesh generated +# by [`gmsh`](https://gmsh.info/) or any other meshing software that can export to the Abaqus +# input `.inp` format. The tutorial demonstrates how edges/faces can be associated with boundary conditions based on the physical nodesets. + +# ### [17 Explicit time stepping](@ref time_stepping) #- # This tutorial is about time integration using [OrdinaryDiffEq.jl](https://github.com/SciML/OrdinaryDiffEq.jl). # It explains how to use their algorithms and presents two types of time step choices - with error-based # and CFL-based adaptive step size control. -# ### [17 Differentiable programming](@ref differentiable_programming) +# ### [18 Differentiable programming](@ref differentiable_programming) #- # This part deals with some basic differentiable programming topics. For example, a Jacobian, its # eigenvalues and a curve of total energy (through the simulation) are calculated and plotted for # a few semidiscretizations. Moreover, we calculate an example for propagating errors with Measurement.jl # at the end. -# ### [18 Custom semidiscretization](@ref custom_semidiscretization) +# ### [19 Custom semidiscretization](@ref custom_semidiscretization) #- # This tutorial describes the [semidiscretiations](@ref overview-semidiscretizations) of Trixi.jl # and explains how to extend them for custom tasks. diff --git a/docs/src/performance.md b/docs/src/performance.md index df66f451b7..82d7f501f6 100644 --- a/docs/src/performance.md +++ b/docs/src/performance.md @@ -267,3 +267,14 @@ requires. It can thus be seen as a proxy for "energy used" and, as an extension, timing result, you need to set the analysis interval such that the `AnalysisCallback` is invoked at least once during the course of the simulation and discard the first PID value. + +## Performance issues with multi-threaded reductions +[False sharing](https://en.wikipedia.org/wiki/False_sharing) is a known performance issue +for systems with distributed caches. It also occurred for the implementation of a thread +parallel bounds checking routine for the subcell IDP limiting +in [PR #1736](https://github.com/trixi-framework/Trixi.jl/pull/1736). +After some [testing and discussion](https://github.com/trixi-framework/Trixi.jl/pull/1736#discussion_r1423881895), +it turned out that initializing a vector of length `n * Threads.nthreads()` and only using every +n-th entry instead of a vector of length `Threads.nthreads()` fixes the problem. +Since there are no processors with caches over 128B, we use `n = 128B / size(uEltype)`. +Now, the bounds checking routine of the IDP limiting scales as hoped.