Skip to content

Use IController in implicit restart test for deterministic behavior#2839

Closed
ChrisRackauckas-Claude wants to merge 1 commit intotrixi-framework:mainfrom
ChrisRackauckas-Claude:fix-implicit-restart-controller
Closed

Use IController in implicit restart test for deterministic behavior#2839
ChrisRackauckas-Claude wants to merge 1 commit intotrixi-framework:mainfrom
ChrisRackauckas-Claude:fix-implicit-restart-controller

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown

Summary

The implicit sparse Jacobian restart test (elixir_advection_implicit_sparse_jacobian_restart.jl) broke with OrdinaryDiffEqCore v3.10 because of a new first-step acceleration feature (SciML/OrdinaryDiffEq.jl#3101).

The test was already fragile for two reasons:

  1. PIController has memory (qold) — the previous error estimate ratio isn't saved/restored in checkpoints, so the restarted controller state differs from the continuous run. It only worked because the problem is trivial enough that the error estimate was near zero, making qmax the binding constraint and qold irrelevant.
  2. First-step acceleration — OrdinaryDiffEqCore v3.10 added qmax_first_step=10000 (matching Sundials CVODE behavior), which allows 10000x step growth on the first step. Since a restarted integrator has success_iter=0, it gets this acceleration, diverging from the continuous run.

Fix

Switch the restart elixir's solver to use NewIController(alg, qmax=10, qmax_first_step=10):

  • IController is memoryless — no dependency on previous error estimates, so restart produces deterministic step sizes
  • qmax_first_step = qmax — disables first-step acceleration so the restart behaves identically to a continuation

Reference values updated to match the new controller.

Test plan

  • Both restart test variants (with and without colorvec) pass locally with updated reference values
  • CI passes

🤖 Generated with Claude Code

Co-Authored-By: Chris Rackauckas accounts@chrisrackauckas.com

The implicit sparse Jacobian restart test was fragile because:
1. The default PIController has memory (qold) that isn't saved/restored
   in checkpoints, so restart step sizes depend on controller state
2. OrdinaryDiffEqCore v3.10 added first-step acceleration (qmax=10000)
   which changed the restart trajectory since fresh integrators have
   success_iter=0

Switch to NewIController with qmax_first_step=qmax=10 which:
- Has no memory of previous error estimates (memoryless I-control)
- Disables first-step acceleration so restarts behave identically
  to continuation

Reference values updated accordingly.

Fixes the test regression from SciML/OrdinaryDiffEq.jl#3101.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 1, 2026

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.13%. Comparing base (8ce8682) to head (a54d28c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2839   +/-   ##
=======================================
  Coverage   97.13%   97.13%           
=======================================
  Files         596      596           
  Lines       45962    45962           
=======================================
  Hits        44642    44642           
  Misses       1320     1320           
Flag Coverage Δ
unittests 97.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoshuaLampert
Copy link
Copy Markdown
Member

Thanks! So this fixes the failing test from the restart example, but there is also another failing test, which does not use the restart functionality, see https://github.com/trixi-framework/Trixi.jl/actions/runs/22542181717/job/65300052174?pr=2839#step:7:8507. This test runs this example file: https://github.com/trixi-framework/Trixi.jl/blob/8ce86828611783942b2a4518df908f1cb434a218/examples/p4est_2d_dgsem/elixir_euler_supersonic_cylinder_scO2.jl.

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Author

Investigation: elixir_euler_supersonic_cylinder_scO2.jl test failure

This failure is not caused by this PR — it's a pre-existing issue on main caused by OrdinaryDiffEqCore v3.10 introducing qmax_first_step=10000 (matching Sundials CVODE behavior), which allows 10000x step growth on the first step. This changes the adaptive stepping trajectory, which triggers different AMR decisions, producing a different (but equally valid) numerical solution.

Root cause confirmation

Running the elixir at tspan=(0.0, 0.001) with the test's default tolerances (abstol=1e-6, reltol=1e-3):

OrdinaryDiffEqCore L2[rho] Matches test reference?
v3.9 (old, no first-step acceleration) 0.02931403129205... Yes, to ~1e-16
v3.11 (new, with qmax_first_step=10000) 0.02922... No, ~0.3% different

The old ODECore reproduces the hardcoded reference values to machine precision. The new ODECore gives a different trajectory through the AMR decisions, producing ~0.3–8% differences in L2/Linf norms.

The real problem: the test tolerance is absurdly tight

The test asserts rtol=1.5e-8 on the solution norms, but the time integration error at the default tolerances is ~1–3%. I ran a convergence study sweeping abstol=reltol from 1e-3 to 1e-8 to measure actual solution accuracy. Using the tightest tolerance (1e-8) as a reference:

Old OrdinaryDiffEqCore (v3.9):

tol L2[rho] Error vs ref Steps
1e-3 0.02228 21.5% 15
1e-4 0.02765 2.6% 54
1e-5 0.02926 3.1% 141
1e-6 (test default) 0.02875 1.3% 399
1e-7 0.02840 0.06% 940
1e-8 (ref) 0.02838 2204

New OrdinaryDiffEqCore (v3.11):

tol L2[rho] Error vs ref Steps
1e-3 0.02291 19.3% 14
1e-4 0.02747 3.2% 49
1e-5 0.02901 2.2% 131
1e-6 (test default) 0.02922 3.0% 385
1e-7 0.02838 0.03% 910
1e-8 (ref) 0.02838 2268

Both old and new ODECore converge to the same solution (L2[rho] ≈ 0.02838) at tight tolerances, confirming the new ODECore is correct. The ~1–3% time integration error at tol=1e-6 means the test is asserting 8+ digits of agreement (rtol=1.5e-8) on a solution that only has ~2 digits of accuracy. Any change to the stepping trajectory (like qmax_first_step) that stays within the time integration error is equally valid.

Recommendation

The test reference values should either be updated for the new ODECore, or (better) the test rtol should be relaxed to something consistent with the actual solution accuracy — e.g., rtol=0.05 (5%).

Reproduction script

convergence_study.jl (click to expand)
# Convergence study: run the supersonic cylinder scO2 elixir at different tolerances

using Pkg
Pkg.activate(mktempdir())
Pkg.develop(path="Trixi.jl")
Pkg.add("OrdinaryDiffEqCore")  # latest (with first-step acceleration)
Pkg.add("OrdinaryDiffEqSSPRK")

using Trixi
using OrdinaryDiffEqSSPRK
using Printf

elixir = joinpath(@__DIR__, "elixir_scO2_tol.jl")

tolerances = [1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]

# Store results to a global that survives world age changes
const RESULTS = Dict{Float64, Any}()

for tol in tolerances
    @printf("\n\n>>> Running with abstol=%.0e, reltol=%.0e\n", tol, tol)

    @eval begin
        global MY_ABSTOL = $tol
        global MY_RELTOL = $tol
        trixi_include($elixir, tspan=(0.0, 0.001))
        l2_vals = analysis_callback(sol).l2
        linf_vals = analysis_callback(sol).linf
        RESULTS[$tol] = (l2=copy(l2_vals), linf=copy(linf_vals),
                         nsteps=sol.stats.naccept, ndofs=length(sol.u[end]) ÷ 4)
    end
end

println("\n\n" * "="^120)
println("CONVERGENCE STUDY RESULTS")
println("="^120)
println()
@printf("%-10s  %-10s  %-6s  %-22s  %-22s  %-22s  %-22s\n",
        "tol", "#DOFs", "steps", "L2[rho]", "L2[rho_v1]", "L2[rho_v2]", "L2[rho_e]")
println("-"^120)
for tol in tolerances
    r = RESULTS[tol]
    @printf("%.0e    %-10d  %-6d  %.15e  %.15e  %.15e  %.15e\n",
            tol, r.ndofs, r.nsteps, r.l2[1], r.l2[2], r.l2[3], r.l2[4])
end
elixir_scO2_tol.jl — modified elixir that reads MY_ABSTOL/MY_RELTOL globals (click to expand)
# Channel flow around a cylinder at Mach 3
# (Modified version that uses global MY_ABSTOL, MY_RELTOL for convergence studies)

using OrdinaryDiffEqSSPRK
using Trixi

equations = CompressibleEulerEquations2D(1.4)

@inline function initial_condition_mach3_flow(x, t, equations::CompressibleEulerEquations2D)
    rho_freestream = 1.4
    v1 = 3.0
    v2 = 0.0
    p_freestream = 1.0
    prim = SVector(rho_freestream, v1, v2, p_freestream)
    return prim2cons(prim, equations)
end

initial_condition = initial_condition_mach3_flow

@inline function boundary_condition_supersonic_inflow(u_inner,
                                                      normal_direction::AbstractVector,
                                                      x, t, surface_flux_function,
                                                      equations::CompressibleEulerEquations2D)
    u_boundary = initial_condition_mach3_flow(x, t, equations)
    return flux(u_boundary, normal_direction, equations)
end

@inline function boundary_condition_outflow(u_inner, normal_direction::AbstractVector, x, t,
                                            surface_flux_function,
                                            equations::CompressibleEulerEquations2D)
    return flux(u_inner, normal_direction, equations)
end

boundary_conditions = (; Bottom = boundary_condition_slip_wall,
                       Circle = boundary_condition_slip_wall,
                       Top = boundary_condition_slip_wall,
                       Right = boundary_condition_outflow,
                       Left = boundary_condition_supersonic_inflow)

volume_flux = flux_ranocha_turbo
surface_flux = flux_lax_friedrichs

polydeg = 3
basis = LobattoLegendreBasis(polydeg)
shock_indicator = IndicatorHennemannGassner(equations, basis,
                                            alpha_max = 0.5,
                                            alpha_min = 0.001,
                                            alpha_smooth = true,
                                            variable = density_pressure)
volume_integral = VolumeIntegralShockCapturingRRG(basis, shock_indicator;
                                                  volume_flux_dg = volume_flux,
                                                  volume_flux_fv = surface_flux,
                                                  slope_limiter = minmod)

solver = DGSEM(polydeg = polydeg, surface_flux = surface_flux,
               volume_integral = volume_integral)

mesh_file = Trixi.download("https://gist.githubusercontent.com/andrewwinters5000/a08f78f6b185b63c3baeff911a63f628/raw/addac716ea0541f588b9d2bd3f92f643eb27b88f/abaqus_cylinder_in_channel.inp",
                           joinpath(@__DIR__, "abaqus_cylinder_in_channel.inp"))

mesh = P4estMesh{2}(mesh_file)

semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver;
                                    boundary_conditions = boundary_conditions)

tspan = (0.0, 2.0)
ode = semidiscretize(semi, tspan)

summary_callback = SummaryCallback()
analysis_interval = 1000
analysis_callback = AnalysisCallback(semi, interval = analysis_interval)
alive_callback = AliveCallback(analysis_interval = analysis_interval)
save_solution = SaveSolutionCallback(interval = 5000,
                                     save_initial_solution = false,
                                     save_final_solution = true,
                                     solution_variables = cons2prim)

amr_indicator = IndicatorLöhner(semi, variable = Trixi.density)
amr_controller = ControllerThreeLevel(semi, amr_indicator,
                                      base_level = 0,
                                      med_level = 3, med_threshold = 0.05,
                                      max_level = 5, max_threshold = 0.1)
amr_callback = AMRCallback(semi, amr_controller,
                           interval = 2,
                           adapt_initial_condition = true,
                           adapt_initial_condition_only_refine = true)

callbacks = CallbackSet(summary_callback,
                        analysis_callback, alive_callback,
                        save_solution,
                        amr_callback)

stage_limiter! = PositivityPreservingLimiterZhangShu(thresholds = (5.0e-7, 1.0e-6),
                                                     variables = (pressure, Trixi.density))

dt0 = 1e-8
sol = solve(ode, SSPRK43(stage_limiter! = stage_limiter!, thread = Trixi.True());
            dt = dt0, save_everystep = false,
            abstol = MY_ABSTOL, reltol = MY_RELTOL,
            callback = callbacks, maxiters = 1_000_000);

@ChrisRackauckas
Copy link
Copy Markdown

That result is pretty self-explanatory. It's very clear the test as written isn't very meaningful because it expects 6 orders of magnitude lower error than what you're actually getting from the solution. What would you like out of your options?

  1. Make the test tolerance something reasonable based on the true error?
  2. Decrease the solver tolerances so that the chosen test tolerance works?
  3. Update the reference solution to the new value (i.e. treat it as a regression test not a correctness test)

It can be reasonable to have a test like this to say "did anything change?", but the test as written isn't asking the question "are things correct?", it's asking "did anything change?". Those are two very different questions and here is a case where you get a different answer. I think it's totally fine to have regression tests to check whether anything changed unexpectedly, OrdinaryDiffEq.jl has a few them, but it is not an indicator that something is wrong if a non-correctness regression test has a drift, it is a sign for further analysis. And here, the further analysis was very straightforward and clear because it's not borderline: the test is asking for 6 orders of magnitude tighter results than what you're getting against the true solution, obviously that's going to drift even for small changes in CPU choice, SIMD, threading, etc.?

I'm happy to help isolate these things, because yes these kinds of regression tests are good for double checking things, but please do the double check before jumping to the conclusion that something is wrong and the world is on fire. Because what we can see from this is that, there wasn't anything wrong when it was looked into, and both of these were test issues, test issues that lead to better code in your repo!

@JoshuaLampert
Copy link
Copy Markdown
Member

I never said the new version in OrdinaryDiffEqCore.jl is wrong and our previous reference value is the "true" solution. As you say, we have these tests to see if something changes. And if the changes were of the order of the tolerances for the time integration we would not complain (in fact this is sometimes the case and we did not complain), but the difference is not something like 1e-3 or smaller, but rather 0.5:

   Evaluated: isapprox(7.397166897133932, 6.8750812520355655; atol = 1.1102230246251565e-13, rtol = 1.4901161193847656e-8)
julia> 7.397166897133932 - 6.8750812520355655
0.5220856450983664

julia> 7.397166897133932/6.8750812520355655
1.0759388327147097

I never saw before such a big change, which was solely explained by solver tolerances. But if you say, this is still within the tolerance, it's fine and we adapt our tests. I just didn't expect it before when I saw this huge difference in the test result.
Regarding your three suggestions, we will probably do 2. or 3. because 1. does not seem appropriate for this case. We will not set, e.g., an absolute tolerance of more than 0.5. This will only mean we will miss differences we like to know of.

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Author

Investigation: Can lowering solver tolerance achieve rtol=1.5e-8 reproducibility?

Following up on the suggestion to go with option 2 (decrease solver tolerances), I ran two experiments to test whether this is viable.

Experiment 1: Convergence study (single ODECore, sweeping tolerances)

Running the elixir at tspan=(0.0, 0.001) with abstol=reltol=tol on the current ODECore v3.11, comparing consecutive tolerance levels:

tol Steps L2[rho] Max rel_diff vs next tighter tol
1e-6 385 0.029223159337780365 1.9e-2
1e-7 899 0.02866922858136196 2.9e-3
5e-8 1217 0.028587446274307562 3.6e-3
2e-8 1749 0.028485133731793903 4.6e-4
1e-8 2278 0.028471921125559973 1.7e-3
5e-9 2867 0.028423664475640067 7.6e-4
2e-9 4097 0.02844530272496351 8.5e-4
1e-9 5187 0.028421004274159126

None of the consecutive pairs meet rtol=1.5e-8. The norms oscillate rather than converge smoothly because AMR decisions are discrete (refine-or-not), so small changes in the solution can trigger different mesh topologies.

Experiment 2: Direct comparison of old vs new stepping behavior

This is the key test. At each solver tolerance, I compared qmax_first_step=6 (simulating old ODECore behavior) vs qmax_first_step=10000 (new behavior) using NewPIController on the same ODECore v3.11:

Solver tol Steps (old/new) Max rel_diff (L2) Max rel_diff (Linf)
1e-6 402 / 392 3.6e-2 (rho_v2) 8.0e-2 (rho)
1e-7 904 / 897 1.3e-3 (rho_v2) 1.2e-2 (rho)
1e-8 2278 / 2269 4.5e-4 (rho_v2) 4.8e-4 (rho_e)
1e-9 5202 / 5215 7.3e-4 (rho_v1) 5.8e-3 (rho)

Even at tol=1e-9 (5200 steps, ~13x slower than the current 385), the max relative difference is 5.8e-3 — still 5 orders of magnitude above the test rtol of 1.5e-8.

Going from tol=1e-8 to tol=1e-9, several Linf components actually increased their disagreement (Linf[rho] went from 5.6e-6 to 5.8e-3). Different tolerances cause different cells to cross the AMR threshold, producing fundamentally different meshes.

Conclusion

Option 2 (lower solver tolerance to hit test rtol=1.5e-8) is not viable for this AMR problem. The discrete nature of AMR decisions creates O(1e-3) jumps in solution norms that cannot be eliminated by tightening the time integrator tolerance. No solver tolerance will make this test robust to stepping algorithm changes at rtol=1.5e-8.

Viable alternatives:

  • Option 3 (update reference values): Update the hardcoded L2/Linf values for the new ODECore. Future stepping changes may require another update.
  • Relax the test rtol: e.g. rtol=0.05 matches the actual solution accuracy at the default solver tolerances.
  • Hybrid: Tighten solver to abstol=reltol=1e-8 (6x slower, ~2278 steps), update reference values, and set test rtol=1e-3. This gives a meaningful correctness check (values within 0.1% of converged solution) while being robust to minor stepping changes.

Copy link
Copy Markdown
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us please not use an IController (or NewIController), since it will not satisfy our requirements of step size control stability.
The tests here are regression tests. We do not want them to use huge tolerances, since we would otherwise not catch errors we would like to know about, as Joshua wrote.
We should likely update our reference values for the new implementation after checking whether the full simulation results are reasonable.

@DanielDoehring
Copy link
Copy Markdown
Member

Xref #2838

@JoshuaLampert
Copy link
Copy Markdown
Member

Superseded by #2838.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants