Skip to content

Conversation

@tophmatthews
Copy link
Contributor

Ref #15915

@tophmatthews
Copy link
Contributor Author

I'm trying to consolidate ADMatDiffusion and MatDiffusion and got close, but am now stuck...I'm afraid it may be impossible becuase ADKernel retains a templated feature, while Kernel doesn't. This cascades all the way down through KernelGrad, and MatDiffusionBase. A little help, maybe @dschwen since you did some of the early work on KernelGrad?

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch 2 times, most recently from a66462f to 113bc53 Compare February 24, 2025 16:31
@tophmatthews
Copy link
Contributor Author

grrr this is close, but gets a dyld[37997]: symbol not found in flat namespace '__ZN21MatDiffusionBaseTemplIN7libMesh11TensorValueIdEELb0EE12initialSetupEv' due to something with MatAnisoDiffusionTempl in the phase_field module...

@dschwen
Copy link
Member

dschwen commented Feb 25, 2025

Check your explicit instantiations.

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from e29ee32 to e668f01 Compare February 25, 2025 17:49
@tophmatthews
Copy link
Contributor Author

How do I deal with this @dschwen:

/tmp/build/marmot/build/header_symlinks/SwellingMatDiffusionBase.h:32:60: error: 'MatDiffusionBase' was not declared in this scope; did you mean 'MatDiffusionBaseTempl'?
   32 | class SwellingMatDiffusionBase : public SwellingGradKernel<MatDiffusionBase<T>>
      |                                                            ^~~~~~~~~~~~~~~~
      |                                                            MatDiffusionBaseTempl

I can't really figure out a way to do partial specialization with templates...Do I have to go modify Marmot directly after this is merged?

@tophmatthews
Copy link
Contributor Author

oh maybe:

template <typename T>
using MatDiffusionBase = MatDiffusionBaseTempl<T, false>;

@dschwen dschwen changed the title Combine ad nonad mat diffusion 15915 Combine ad non-AD mat diffusion Feb 25, 2025
@moosebuild
Copy link
Contributor

moosebuild commented Feb 25, 2025

Job Coverage, step Generate coverage on 7d3d0b5 wanted to post the following:

Framework coverage

236997 #29888 7d3d0b
Total Total +/- New
Rate 85.93% 85.94% +0.01% 100.00%
Hits 122919 122911 -8 50
Misses 20122 20108 -14 0

Diff coverage report

Full coverage report

Modules coverage

Phase field

236997 #29888 7d3d0b
Total Total +/- New
Rate 86.10% 86.10% +0.00% 100.00%
Hits 14018 14012 -6 6
Misses 2264 2263 -1 0

Diff coverage report

Full coverage report

Full coverage reports

Reports

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

moosebuild commented Feb 25, 2025

Job Documentation, step Docs: sync website on 7d3d0b5 wanted to post the following:

View the site here

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

Job Precheck, step Clang format on dc4ab13 wanted to post the following:

Your code requires style changes.

A patch was auto generated and copied here
You can directly apply the patch by running, in the top level of your repository:

curl -s https://mooseframework.inl.gov/docs/PRs/29888/clang_format/style.patch | git apply -v

Alternatively, with your repository up to date and in the top level of your repository:

git clang-format 12e3b024e9909d2df7bc9252a156be46f77c76ea

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from dc4ab13 to 3591419 Compare February 26, 2025 21:23
@tophmatthews
Copy link
Contributor Author

tophmatthews commented Feb 27, 2025

_coupled_moose_vars is broken now...wtf...

no that's not it...

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from 3591419 to c5dda9f Compare February 27, 2025 16:03
@tophmatthews
Copy link
Contributor Author

I'm missing something @dschwen , the overwritten non-AD precomputeQpJacobian in the new class MatDiffusionBase is not being called at all. It's going through the qp and the j's in KernelGrad, but virtual RealGradient precomputeQpJacobian() override; is not respected. Help!

@tophmatthews
Copy link
Contributor Author

Oh wait, it's because MatDiffusion inherits from MatDiffusionBaseTempl<Real, is_ad> not MatDiffusionBase...

@tophmatthews
Copy link
Contributor Author

tophmatthews commented Feb 27, 2025

Cheese and crackers, I'm doubting this being a worthwhile exercise, I'm out of ideas...

dyld[57678]: symbol not found in flat namespace '__ZN21MatDiffusionBaseTemplIN7libMesh11TensorValueIdEELb0EE20precomputeQpResidualEv'

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from 9a4c958 to 6be38a9 Compare February 27, 2025 17:54
@tophmatthews
Copy link
Contributor Author

Holy crapoli this PR will be the end of me...

First issue: There is a intermittent failure in the optimization module:

optimizationreporter/function_misfit.point_loads/volume: Running csvdiff: /opt/moose/share/moose/python/mooseutils/csvdiff.py /tmp/moosetest.TCpJSo/optimization/moose/optimization/optimizationreporter/function_misfit/gold/main_auto_out_OptimizationReporter_0001.csv /tmp/moosetest.TCpJSo/optimization/moose/optimization/optimizationreporter/function_misfit/main_auto_out_OptimizationReporter_0001.csv --relative-tolerance 0.0001 --abs-zero 0.0001
optimizationreporter/function_misfit.point_loads/volume: ERROR: In file main_auto_out_OptimizationReporter_0001.csv: The values in column "parameter_results" don't match. 
optimizationreporter/function_misfit.point_loads/volume: 	relative diff:   -8.997e+02 ~ -8.987e+02 = 1.105e-03 (1.105e-03)

I can't reproduce it locally, and you can see in the civet history it hits often, but somewhat randomly. I'm not sure if has something to do with #29951 or not.

Second issue:
There are Marmot failures due to

    1. renaming of member variables (like _D to _diffusivity), and
    1. redefinition of return type of MatDiffusionBase<T>::computeQpCJacobian() due to moving to precomputeQpJacobian from KernelGrad.

For (1), I can just use the less (personally) favorable variables names. For (2), I can make a temporary precomputeQpCJacobian, and cleanup Marmot after the MOOSE fixes. OR I can just cleanup MARMOT after. What do you think @dschwen

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch 2 times, most recently from 72f5964 to a1a7578 Compare March 7, 2025 13:45
@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from a1a7578 to 90050aa Compare March 21, 2025 14:42
@tophmatthews
Copy link
Contributor Author

Hi @lynnmunday @maxnezdyur , @dschwen recommended reaching out to you guys on this:
There is a intermittent failure in the optimization module:

optimizationreporter/function_misfit.point_loads/volume: Running csvdiff: /opt/moose/share/moose/python/mooseutils/csvdiff.py /tmp/moosetest.TCpJSo/optimization/moose/optimization/optimizationreporter/function_misfit/gold/main_auto_out_OptimizationReporter_0001.csv /tmp/moosetest.TCpJSo/optimization/moose/optimization/optimizationreporter/function_misfit/main_auto_out_OptimizationReporter_0001.csv --relative-tolerance 0.0001 --abs-zero 0.0001
optimizationreporter/function_misfit.point_loads/volume: ERROR: In file main_auto_out_OptimizationReporter_0001.csv: The values in column "parameter_results" don't match. 
optimizationreporter/function_misfit.point_loads/volume: 	relative diff:   -8.997e+02 ~ -8.987e+02 = 1.105e-03 (1.105e-03)

I can't reproduce it locally, and you can see in the civet history it hits often, but somewhat randomly. I'm not sure if has something to do with #29951 or not.

This happens in moose/modules/optimization/test/tests/optimizationreporter/function_misfit/forward_and_adjoint.i which is run from main_auto.i

Any thoughts?

@maxnezdyur
Copy link
Contributor

maxnezdyur commented Mar 24, 2025

In the test file, for the test failing you can add cli args that the test will be run with. Can you add cli_args = "-pc_type lu", if we still see intermittent failures then it's not the issue you referenced.

@tophmatthews tophmatthews marked this pull request as draft October 3, 2025 16:27
@tophmatthews
Copy link
Contributor Author

Any luck @travismui ?

@tophmatthews
Copy link
Contributor Author

Looks like it was due to SAM using computeQpJacobian, but the new KernelGrad version for MatDiffusion uses precomputeQpJacobian. I'll add a final declaration to computeQpResidual and computeQpJacobian, which exists for ADKernelGrad, but not KernelGrad. That will prevent this slipping through the cracks. That means that Marmot and SAM will require modifications with this new PR. I'm not sure the next steps there, but I have a fix for Marmot, and @travismui has a fix for SAM ready to go.

@tophmatthews tophmatthews force-pushed the combine_ad_nonad_mat_diffusion_15915 branch from 3298154 to e70544b Compare October 10, 2025 00:50
@travismui
Copy link
Contributor

Looks like it was due to SAM using computeQpJacobian, but the new KernelGrad version for MatDiffusion uses precomputeQpJacobian. I'll add a final declaration to computeQpResidual and computeQpJacobian, which exists for ADKernelGrad, but not KernelGrad. That will prevent this slipping through the cracks. That means that Marmot and SAM will require modifications with this new PR. I'm not sure the next steps there, but I have a fix for Marmot, and @travismui has a fix for SAM ready to go.

SAM patch prepared at SAM/SAM!1124, will be merged after this goes into next (I think that's how this works?)

@tophmatthews
Copy link
Contributor Author

Marmot patch is ready at https://github.inl.gov/ncrc/marmot/pull/948.

@dschwen this is ready for re-review!

@tophmatthews tophmatthews marked this pull request as ready for review October 13, 2025 14:53
@moosebuild
Copy link
Contributor

Job Documentation public apps on 7d3d0b5 : invalidated by @tophmatthews

Updated TMAP8 to remove ADMatDiffusion.md reference

@tophmatthews
Copy link
Contributor Author

Ready to go, with patches for MARMOT and SAM ready.

@tophmatthews
Copy link
Contributor Author

This is ready to go in right? What else can I do to help?

@GiudGiud
Copy link
Contributor

GiudGiud commented Oct 27, 2025

Unfortunately not. It needs both a marmot and a SAM patch

EDIT: ok you made those

@tophmatthews
Copy link
Contributor Author

Unfortunately not. It needs both a marmot and a SAM patch

soooo.........

@loganharbour
Copy link
Member

SAM devel is broken at the moment

@tophmatthews
Copy link
Contributor Author

Is sam broken still? This PR should still be ready....

@GiudGiud
Copy link
Contributor

GiudGiud commented Oct 31, 2025

SAM should be fine now.

However this patch is breaking right, there is no point running tests again?

@tophmatthews
Copy link
Contributor Author

However this patch is breaking right, there is no point running tests again?

Yeah. This PR breaks MARMOT and SAM. There are patches for both of those codes ready once this is merged that will unbreak them. Although it's been so long now, something else could be breaking. That's why I'm anxious to get this done with.

@GiudGiud GiudGiud merged commit 4e5abd1 into idaholab:next Oct 31, 2025
65 of 69 checks passed
@tophmatthews
Copy link
Contributor Author

Thanks @GiudGiud ! For the MARMOT patch, do I update the moose submodule to next or devel or not at all?

@GiudGiud
Copy link
Contributor

to next please

@tophmatthews
Copy link
Contributor Author

@travismui can you do the sam patching? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants