Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Marianas variables and disable Newell CI, upgrade to C++14 and CUDA 11 #508

Merged
merged 2 commits into from
Jul 5, 2022

Conversation

cameronrutherford
Copy link
Collaborator

@cameronrutherford cameronrutherford commented Jun 28, 2022

Updating the Marainas build with cuda 11.4. One of the unit tests fails.

NlpSparse2_4 failure:
Start 25: NlpSparse2_4

25: Test command: /share/apps/openmpi/3.1.3/gcc/7.3.0/bin/mpirun "-n" "1" "/qfs/people/ruth521/projects/hiop/hiop-git/build/src/Drivers/Sparse/NlpSparseEx2.exe" "500" "-ginkgo" "-inertiafree" "-selfcheck"
25: Test timeout computed to be: 10000000
25: ===============
25: Hiop SOLVER
25: ===============
25: Using 1 MPI ranks.
25: ---------------
25: Problem Summary
25: ---------------
25: Total number of variables: 500
25: lower/upper/lower_and_upper bounds: 499 / 1 / 1
25: Total number of equality constraints: 2
25: Total number of inequality constraints: 499
25: lower/upper/lower_and_upper bounds: 498 / 498 / 497
25: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
25: 0 6.4379656e+01 9.980e+00 1.010e+00 0.00 0.000e+00 0.000e+00 -(-)
25: Setting up Ginkgo solver ...
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 7.26408e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.25761e+09)
25: [Warning] solve compressed high absolute resid norm (= 1.01166e+09)
25: [Warning] solve compressed high absolute resid norm (= 8.51145e+08)
25: [Warning] solve compressed high absolute resid norm (= 4.84826e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.78755e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.40315e+08)
25: [Warning] solve compressed high absolute resid norm (= 7.80505e+09)
25: [Warning] solve compressed high absolute resid norm (= 8.62754e+08)
25: [Warning] solve compressed high absolute resid norm (= 5.77189e+08)
25: [Warning] solve compressed high absolute resid norm (= 1.08956e+09)
25: [Warning] solve compressed high absolute resid norm (= 1.34512e+09)
25: [Warning] solve compressed high absolute resid norm (= 6.98415e+07)
25: [Warning] solve compressed high absolute resid norm (= 5.00960e+07)
25: [Warning] solve compressed high absolute resid norm (= 4.98722e+05)
25: [Warning] solve compressed high absolute resid norm (= 1.64079e+04)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=10311.1n - Rel res=0.463445
25: - ||rhs||_2=22248.7 ||sol||_2=6.78723e+20
25: 1 6.8469698e+01 9.842e+00 2.767e+02 0.00 3.826e-03 1.382e-02 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 4.74791e+06)
25: [Warning] solve compressed high absolute resid norm (= 1.96485e+10)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=22165.1n - Rel res=1
25: - ||rhs||_2=22165.1 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 4.13325e+06)
25: 2 6.4340819e+01 9.836e+00 2.035e+03 0.00 1.000e+00 2.915e-04 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 3.43130e+08)
25: [Warning] solve compressed high absolute resid norm (= 2.17503e+11)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=45518.6n - Rel res=1
25: - ||rhs||_2=45518.6 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 2.58205e+08)
25: 3 6.4341020e+01 9.842e+00 1.375e+08 0.00 1.000e+00 1.006e-05 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 4.01006e+08)
25: [Warning] solve compressed high absolute resid norm (= 1.70880e+15)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=1.37528e+08n - Rel res=1
25: - ||rhs||_2=1.37528e+08 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 7.59508e+08)
25: 4 6.4339359e+01 9.842e+00 1.400e+05 -0.70 1.000e+00 8.102e-07 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.63992e+05)
25: [Warning] solve compressed high absolute resid norm (= 3.15174e+06)
25: [Warning] solve compressed high absolute resid norm (= 4.37603e+05)
25: [Warning] solve compressed high absolute resid norm (= 9.10059e+05)
25: [Warning] solve compressed high absolute resid norm (= 1.69711e+07)
25: [Warning] solve compressed high absolute resid norm (= 5.06641e+06)
25: [Warning] solve compressed high absolute resid norm (= 4.29326e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.54467e+04)
25: [Warning] solve compressed high absolute resid norm (= 3.57853e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.12660e+06)
25: [Warning] solve compressed high absolute resid norm (= 1.91780e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.47442e+06)
25: [Warning] solve compressed high absolute resid norm (= 1.22083e+09)
25: [Warning] solve compressed high absolute resid norm (= 3.21712e+06)
25: [Warning] solve compressed high absolute resid norm (= 3.17689e+06)
25: [Warning] solve compressed high absolute resid norm (= 5.43254e+06)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=103486n - Rel res=0.0331539
25: - ||rhs||_2=3.12137e+06 ||sol||_2=1.18232e+22
25: [Warning] solve compressed high absolute resid norm (= 3.11754e+04)
25: 5 6.4338319e+01 9.842e+00 1.392e+05 -0.70 6.109e-03 4.273e-05 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.02521e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.50899e+11)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=3.10231e+06n - Rel res=1
25: - ||rhs||_2=3.10231e+06 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 1.01781e+07)
25: [Warning] Requesting additional accuracy and stability from the KKT linear system at iteration 5 (safe mode ON) [2]
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.02521e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.50899e+11)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=3.10231e+06n - Rel res=1
25: - ||rhs||_2=3.10231e+06 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 1.01781e+07)
25: Minimum step size reached. The problem may be locally infeasible or the gradient inaccurate. Will try to restore feasibility.
25: Failed to read option file 'hiop_fr.options'. Hiop will use default options.
25: 6 6.4338319e+01 9.842e+00 1.418e+04 -0.70 1.000e+00 1.000e+00 0(R)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 9.96321e+03)
25: [Warning] solve compressed high absolute resid norm (= 8.82589e+06)
25: [Warning] solve compressed high absolute resid norm (= 5.50840e+07)
25: [Warning] solve compressed high absolute resid norm (= 2.14816e+09)
25: [Warning] solve compressed high absolute resid norm (= 4.83760e+09)
25: [Warning] solve compressed high absolute resid norm (= 1.91531e+11)
25: [Warning] BiCGStab did NOT converged after 3 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=34115.8n - Rel res=1
25: - ||rhs||_2=34115.8 ||sol||_2=0
25: 7 6.6349550e+01 2.025e+00 1.418e+04 -0.70 1.000e+00 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.92405e+03)
25: [Warning] solve compressed high absolute resid norm (= 6.73113e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.83259e+09)
25: [Warning] solve compressed high absolute resid norm (= 2.48062e+09)
25: [Warning] solve compressed high absolute resid norm (= 2.21556e+08)
25: [Warning] solve compressed high absolute resid norm (= 8.45047e+08)
25: [Warning] solve compressed high absolute resid norm (= 1.07987e+09)
25: [Warning] solve compressed high absolute resid norm (= 6.87395e+10)
25: [Warning] BiCGStab did NOT converged after 4 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=34115.6n - Rel res=1
25: - ||rhs||_2=34115.6 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 4.19431e+06)
25: 8 6.6346831e+01 2.010e+00 3.546e+03 -0.70 1.000e+00 4.553e-03 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.14832e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.93772e+10)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=3547.4n - Rel res=1
25: - ||rhs||_2=3547.4 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 2.02920e+06)
25: 9 6.6347339e+01 2.010e+00 1.013e+06 -0.70 1.000e+00 3.515e-05 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.80589e+03)
25: [Warning] solve compressed high absolute resid norm (= 9.37740e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.26247e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.59627e+04)
25: [Warning] solve compressed high absolute resid norm (= 3.16749e+07)
25: [Warning] solve compressed high absolute resid norm (= 7.50428e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.06101e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.43433e+03)
25: [Warning] solve compressed high absolute resid norm (= 8.38862e+06)
25: [Warning] solve compressed high absolute resid norm (= 1.80811e+04)
25: [Warning] solve compressed high absolute resid norm (= 7.01206e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.05051e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.67773e+07)
25: [Warning] solve compressed high absolute resid norm (= 4.08707e+03)
25: [Warning] solve compressed high absolute resid norm (= 2.72927e+04)
25: [Warning] solve compressed high absolute resid norm (= 3.75092e+07)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 4 was returned.
25: - Error code 1
25: - Abs res=1141.36n - Rel res=0.00112704
25: - ||rhs||_2=1.0127e+06 ||sol||_2=3.22836e+21
25: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
25: 10 9.3608249e+01 5.315e+00 9.968e+05 -0.70 1.660e-02 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 4.00693e+03)
25: [Warning] solve compressed high absolute resid norm (= 3.10417e+05)
25: [Warning] solve compressed high absolute resid norm (= 3.60229e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.19220e+03)
25: [Warning] solve compressed high absolute resid norm (= 5.81422e+05)
25: [Warning] solve compressed high absolute resid norm (= 2.12324e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.62897e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.80474e+03)
25: [Warning] solve compressed high absolute resid norm (= 2.92165e+05)
25: [Warning] solve compressed high absolute resid norm (= 6.96939e+03)
25: [Warning] solve compressed high absolute resid norm (= 5.73491e+03)
25: [Warning] solve compressed high absolute resid norm (= 6.33499e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.37523e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.34617e+04)
25: [Warning] solve compressed high absolute resid norm (= 5.86228e+05)
25: [Warning] solve compressed high absolute resid norm (= 1.03891e+04)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 5 was returned.
25: - Error code 1
25: - Abs res=245.268n - Rel res=0.00024606
25: - ||rhs||_2=996781 ||sol||_2=3.23415e+19
25: 11 9.7336540e+01 4.637e+00 9.607e+05 -0.70 3.615e-02 1.276e-01 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 3.48153e+05)
25: [Warning] solve compressed high absolute resid norm (= 2.18734e+08)
25: [Warning] solve compressed high absolute resid norm (= 9.11355e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.08045e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.65603e+03)
25: [Warning] solve compressed high absolute resid norm (= 2.18482e+01)
25: [Warning] solve compressed high absolute resid norm (= 9.77751e+00)
25: [Warning] solve compressed high absolute resid norm (= 6.92257e+00)
25: [Warning] solve compressed high absolute resid norm (= 1.85939e+03) [269/501]
25: [Warning] solve compressed high absolute resid norm (= 4.14049e+03)
25: [Warning] solve compressed high absolute resid norm (= 6.52492e+02)
25: [Warning] solve compressed high absolute resid norm (= 4.98503e+02)
25: [Warning] solve compressed high absolute resid norm (= 1.47355e+01)
25: [Warning] solve compressed high absolute resid norm (= 4.67216e+02)
25: [Warning] solve compressed high absolute resid norm (= 5.96779e+02)
25: [Warning] solve compressed high absolute resid norm (= 4.55434e+02)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=2.44966n - Rel res=2.54993e-06
25: - ||rhs||_2=960678 ||sol||_2=1.00978e+22
25: 12 1.1111213e+02 3.913e-03 7.265e+05 -0.70 2.431e-01 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.05498e+01)
25: [Warning] solve compressed high absolute resid norm (= 4.03044e+01)
25: 13 1.0624060e+02 1.931e-03 6.681e+05 -1.40 8.033e-02 5.137e-01 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 8.94078e+00)
25: [Warning] solve compressed high absolute resid norm (= 3.48187e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.01487e+01)
25: [Warning] solve compressed high absolute resid norm (= 7.30103e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.89024e+02)
25: [Warning] solve compressed high absolute resid norm (= 1.65310e+00)
25: [Warning] solve compressed high absolute resid norm (= 1.45050e+02)
25: [Warning] solve compressed high absolute resid norm (= 2.16874e+00)
25: [Warning] solve compressed high absolute resid norm (= 2.72088e+00)
25: [Warning] solve compressed high absolute resid norm (= 4.46230e-01)
25: [Warning] solve compressed high absolute resid norm (= 2.96016e+01)
25: [Warning] solve compressed high absolute resid norm (= 6.35927e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.09198e+01)
25: [Warning] solve compressed high absolute resid norm (= 2.34966e+01)
25: [Warning] solve compressed high absolute resid norm (= 3.18296e+02)
25: [Warning] solve compressed high absolute resid norm (= 4.48765e+02)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 4.5 was returned.
25: - Error code 1
25: - Abs res=5.03777n - Rel res=7.54018e-06
25: - ||rhs||_2=668124 ||sol||_2=7.18021e+14
25: 14 1.0612914e+02 1.905e-03 6.653e+03 -1.40 9.900e-01 1.355e-02 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 7.32728e+04)
25: [Warning] solve compressed high absolute resid norm (= 4.59559e+04)
25: [Warning] solve compressed high absolute resid norm (= 2.26197e+03)
25: [Warning] solve compressed high absolute resid norm (= 4.29497e+09)
25: [Warning] solve compressed high absolute resid norm (= 3.50743e+06)
25: [Warning] solve compressed high absolute resid norm (= 2.14748e+09)
25: [Warning] solve compressed high absolute resid norm (= 9.61774e+06)
25: [Warning] solve compressed high absolute resid norm (= 7.28146e+05)
25: [Warning] solve compressed high absolute resid norm (= 2.00046e+08)
25: [Warning] solve compressed high absolute resid norm (= 7.61522e+06)
25: [Warning] BiCGStab did NOT converged after 5 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=6652.68n - Rel res=1
25: - ||rhs||_2=6652.68 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 1.09845e+05)
25: 15 1.0741331e+02 2.947e-04 6.680e+03 -2.10 1.000e+00 9.183e-04 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 9.29346e-02)
25: [Warning] solve compressed high absolute resid norm (= 8.92393e+03)
25: [Warning] solve compressed high absolute resid norm (= 3.68008e-01)
25: [Warning] solve compressed high absolute resid norm (= 2.14292e-01)
25: [Warning] solve compressed high absolute resid norm (= 5.74440e-05)
25: [Warning] solve compressed high absolute resid norm (= 1.23604e+06)
25: [Warning] solve compressed high absolute resid norm (= 7.91983e+09)
25: [Warning] solve compressed high absolute resid norm (= 2.15076e+09)
25: [Warning] solve compressed high absolute resid norm (= 3.24027e-01)
25: [Warning] solve compressed high absolute resid norm (= 1.92077e+10)
25: [Warning] BiCGStab did NOT converged after 5 iters. The solution from iter 3 was returned.
25: - Error code 4
25: - Abs res=38.664n - Rel res=0.00578798
25: - ||rhs||_2=6680.06 ||sol||_2=3.57361e+21
25: 16 1.0735236e+02 3.344e-04 6.680e+03 -3.15 4.854e-05 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.26530e+10)
25: [Warning] solve compressed high absolute resid norm (= 6.53524e+04)
25: [Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0 was returned.
25: - Error code 4
25: - Abs res=6679.73n - Rel res=1
25: - ||rhs||_2=6679.73 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 1.86916e-01)
25: 17 1.0735086e+02 3.344e-04 9.704e+08 -3.15 1.000e+00 6.879e-06 1(S)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.82531e+05)
25: 18 1.0737117e+02 3.339e-04 6.943e+05 -3.15 9.993e-01 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.94391e-01)
25: [Warning] solve compressed high absolute resid norm (= 3.38015e-01)
[Warning] BiCGStab did NOT converged after 1 iters. The solution from iter 0.5 was returned.
25: - Error code 4
25: - Abs res=41.7766n - Rel res=6.01666e-05
25: - ||rhs||_2=694349 ||sol||_2=7.80703e+22
25: 19 1.0737117e+02 3.339e-04 2.789e+01 -4.72 1.000e+00 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 3.94796e-01)
25: [Warning] solve compressed high absolute resid norm (= 3.89264e-01)
25: [Warning] solve compressed high absolute resid norm (= 2.56000e+02)
25: [Warning] solve compressed high absolute resid norm (= 3.94796e-01)
25: [Warning] solve compressed high absolute resid norm (= 7.79291e-01)
25: [Warning] solve compressed high absolute resid norm (= 1.92077e+10)
25: [Warning] BiCGStab did NOT converged after 3 iters. The solution from iter 1 was returned.
25: - Error code 4
25: - Abs res=41.7766n - Rel res=0.952942
25: - ||rhs||_2=43.8396 ||sol||_2=4.79017e+18
25: iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
25: 20 1.0737118e+02 3.339e-04 2.789e+01 -4.72 1.000e+00 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.84897e+02)
25: [Warning] solve compressed high absolute resid norm (= 1.87578e+07)
25: [Warning] solve compressed high absolute resid norm (= 2.45893e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.04858e+06)
25: [Warning] solve compressed high absolute resid norm (= 1.04858e+06)
25: [Warning] solve compressed high absolute resid norm (= 4.68911e+06)
25: [Warning] solve compressed high absolute resid norm (= 4.18452e+09)
25: [Warning] solve compressed high absolute resid norm (= 1.84413e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.34218e+08)
25: [Warning] solve compressed high absolute resid norm (= 1.84012e+03)
25: [Warning] solve compressed high absolute resid norm (= 4.61247e+03)
25: [Warning] solve compressed high absolute resid norm (= 2.55446e+03)
25: [Warning] solve compressed high absolute resid norm (= 5.76860e+05)
25: [Warning] solve compressed high absolute resid norm (= 1.13565e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.34234e+08)
25: [Warning] solve compressed high absolute resid norm (= 2.11583e+04)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 0 was returned.
25: - Error code 1
25: - Abs res=41.7766n - Rel res=1
25: - ||rhs||_2=41.7766 ||sol||_2=0
25: [Warning] solve compressed high absolute resid norm (= 1.63840e+04)
25: Minimum step size reached. The problem may be locally infeasible or the gradient inaccurate. Will try to restore feasibility.
25: Failed to read option file 'hiop_fr.options'. Hiop will use default options.
25: 21 1.0737118e+02 2.705e-03 2.357e+03 -4.72 1.000e+00 1.000e+00 0(R)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 5.91117e+03)
25: [Warning] solve compressed high absolute resid norm (= 1.20100e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.60462e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.20100e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.83126e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.20099e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.96261e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.20097e+01)
25: [Warning] solve compressed high absolute resid norm (= 2.80644e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.20099e+01)
25: [Warning] solve compressed high absolute resid norm (= 7.50710e+07)
25: [Warning] solve compressed high absolute resid norm (= 1.20104e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.00469e+04)
25: [Warning] solve compressed high absolute resid norm (= 1.67772e+07)
25: [Warning] solve compressed high absolute resid norm (= 2.98033e+04)
25: [Warning] solve compressed high absolute resid norm (= 9.60384e+09)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=16.2669n - Rel res=0.00690145
25: - ||rhs||_2=2357.02 ||sol||_2=3.10819e+19
25: 22 6.4821432e+01 3.056e-04 2.470e+00 -4.72 1.000e+00 3.255e-02 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 3.19139e-01)
25: [Warning] solve compressed high absolute resid norm (= 6.43012e-02)
25: [Warning] solve compressed high absolute resid norm (= 4.63008e-01)
25: [Warning] solve compressed high absolute resid norm (= 1.60022e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.04592e+01)
25: [Warning] solve compressed high absolute resid norm (= 6.12779e+00)
25: [Warning] solve compressed high absolute resid norm (= 3.66378e+04)
25: [Warning] solve compressed high absolute resid norm (= 6.02930e+00)
25: [Warning] solve compressed high absolute resid norm (= 8.64359e+00)
25: [Warning] solve compressed high absolute resid norm (= 7.67246e+00)
25: [Warning] solve compressed high absolute resid norm (= 2.24646e+00)
25: [Warning] solve compressed high absolute resid norm (= 4.53621e+00)
25: [Warning] solve compressed high absolute resid norm (= 5.53193e-01)
25: [Warning] solve compressed high absolute resid norm (= 3.56171e+00)
25: [Warning] solve compressed high absolute resid norm (= 3.42027e+01)
25: [Warning] solve compressed high absolute resid norm (= 1.09307e+00)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=0.308549n - Rel res=0.0265739
25: - ||rhs||_2=11.611 ||sol||_2=2.83343e+19
25: 23 6.5578393e+01 4.369e-06 2.463e+00 -4.72 3.005e-03 1.000e+00 1(s)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 7.61221e-01)
25: [Warning] solve compressed high absolute resid norm (= 4.38865e-01)
25: [Warning] solve compressed high absolute resid norm (= 9.44751e-02)
25: [Warning] solve compressed high absolute resid norm (= 1.56763e-01)
25: [Warning] solve compressed high absolute resid norm (= 3.75909e-01)
25: [Warning] solve compressed high absolute resid norm (= 3.53205e-02)
25: [Warning] solve compressed high absolute resid norm (= 3.28839e-02)
25: [Warning] solve compressed high absolute resid norm (= 9.84950e-05)
25: [Warning] solve compressed high absolute resid norm (= 3.76047e-04)
25: [Warning] solve compressed high absolute resid norm (= 1.76804e-05)
25: [Warning] solve compressed high absolute resid norm (= 1.72262e-05)
25: [Warning] solve compressed high absolute resid norm (= 1.82923e-04)
25: [Warning] solve compressed high absolute resid norm (= 3.09932e-06)
25: [Warning] solve compressed high absolute resid norm (= 5.90452e-06)
25: [Warning] solve compressed high absolute resid norm (= 1.11113e-04)
25: [Warning] solve compressed high absolute resid norm (= 2.00453e-03)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=9.56126e-05n - Rel res=8.18745e-06
25: - ||rhs||_2=11.6779 ||sol||_2=1.44683e+16
25: 24 6.4472718e+01 4.287e-06 1.064e+00 -7.08 5.681e-01 1.866e-02 1(f)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 8.84296e-01)
25: [Warning] solve compressed high absolute resid norm (= 8.88296e-05)
25: [Warning] solve compressed high absolute resid norm (= 4.62546e-06)
25: [Warning] solve compressed high absolute resid norm (= 4.56549e-06)
25: [Warning] solve compressed high absolute resid norm (= 6.90818e-07)
25: [Warning] solve compressed high absolute resid norm (= 1.99151e-05)
25: [Warning] solve compressed high absolute resid norm (= 5.45761e-04)
25: [Warning] solve compressed high absolute resid norm (= 3.90625e-03)
25: [Warning] solve compressed high absolute resid norm (= 1.59443e-06)
25: [Warning] solve compressed high absolute resid norm (= 1.15213e-06)
25: [Warning] solve compressed high absolute resid norm (= 4.88285e-04)
25: [Warning] solve compressed high absolute resid norm (= 7.81250e-03)
25: [Warning] solve compressed high absolute resid norm (= 8.43591e-06)
25: [Warning] solve compressed high absolute resid norm (= 4.36665e-07)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 2 was returned.
25: - Error code 1
25: - Abs res=0.000286369n - Rel res=3.73132e-05 [43/501]
25: - ||rhs||_2=7.67473 ||sol||_2=1.47612e+16
25: 25 6.4820216e+01 5.397e-10 1.035e-01 -7.08 9.027e-01 1.000e+00 1(h)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 1.82298e-04)
25: [Warning] solve compressed high absolute resid norm (= 1.39626e-07)
25: [Warning] solve compressed high absolute resid norm (= 1.39350e-08)
25: [Warning] solve compressed high absolute resid norm (= 2.72958e-04)
25: [Warning] solve compressed high absolute resid norm (= 3.20501e-08)
25: [Warning] solve compressed high absolute resid norm (= 1.52482e-07)
25: [Warning] solve compressed high absolute resid norm (= 5.39567e-05)
25: [Warning] solve compressed high absolute resid norm (= 4.31641e-04)
25: [Warning] solve compressed high absolute resid norm (= 1.70157e-05)
25: [Warning] solve compressed high absolute resid norm (= 3.15235e-05)
25: [Warning] solve compressed high absolute resid norm (= 6.73754e-04)
25: [Warning] solve compressed high absolute resid norm (= 2.25657e-04)
25: [Warning] solve compressed high absolute resid norm (= 7.01213e-05)
25: [Warning] solve compressed high absolute resid norm (= 1.18319e-05)
25: [Warning] solve compressed high absolute resid norm (= 2.98392e-05)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=1.07971e-05n - Rel res=1.45302e-05
25: - ||rhs||_2=0.743079 ||sol||_2=3.05218e+15
25: [Warning] solve compressed high absolute resid norm (= 8.46207e-03)
25: 26 6.4819985e+01 1.127e-08 4.594e-05 -7.08 1.000e+00 1.000e+00 1(H)
25: cudaErrorNoKernelImageForDevice
25: no kernel image is available for execution on the device
25: [Warning] solve compressed high absolute resid norm (= 2.46792e-05)
25: [Warning] solve compressed high absolute resid norm (= 7.50014e-06)
25: [Warning] solve compressed high absolute resid norm (= 2.18019e-06)
25: [Warning] solve compressed high absolute resid norm (= 5.83501e-06)
25: [Warning] solve compressed high absolute resid norm (= 5.02202e-07)
25: [Warning] solve compressed high absolute resid norm (= 8.97634e-07)
25: [Warning] solve compressed high absolute resid norm (= 7.37249e-07)
25: [Warning] solve compressed high absolute resid norm (= 9.10248e-07)
25: [Warning] solve compressed high absolute resid norm (= 1.43733e-08)
25: [Warning] solve compressed high absolute resid norm (= 2.42868e-07)
25: [Warning] solve compressed high absolute resid norm (= 7.20617e-06)
25: [Warning] solve compressed high absolute resid norm (= 3.95745e-08)
25: [Warning] solve compressed high absolute resid norm (= 3.81472e-06)
25: [Warning] solve compressed high absolute resid norm (= 1.94067e-08)
25: [Warning] BiCGStab did NOT converged after 9 iters. The solution from iter 9 was returned.
25: - Error code 1
25: - Abs res=2.85923e-06n - Rel res=0.0571234
25: - ||rhs||_2=5.00536e-05 ||sol||_2=1.79209e+14
25: 27 6.4820052e+01 7.274e-13 2.976e-06 -9.00 1.000e+00 1.000e+00 1(h)
25: Successfull termination.
25: Total time 6.167 sec
25: Hiop internal time: total 6.733 sec avg iter 0.249 sec
25: internal total std dev across ranks 0.000 percent
25: Fcn/deriv time: total=0.014 sec ( obj=0.010 grad=0.001 cons=0.002 Jac=0.001 Hess=0.000)
25: Fcn/deriv total std dev across ranks 0.000 percent
25: Fcn/deriv #: obj 211 grad 29 eq cons 216 ineq cons 216 eq Jac 35 ineq Jac 35
25: Total KKT time 5.977 sec
25: update init 0.001 sec update linsys 0.002 sec fact 0.201 sec
25: solve rhs-manip 0.019 sec inner solve 6.255 sec resid 0.052 sec IR 112.500 iter
25:
25: selfcheck failure. Objective (6.482005192413e+01) does not agree (6 digits) with the saved value (6.432237100000e+01) for n=500.
25: --------------------------------------------------------------------------
25: Primary job terminated normally, but 1 process returned
25: a non-zero exit code. Per user-direction, the job has been aborted.
25: --------------------------------------------------------------------------
25: --------------------------------------------------------------------------
25: mpirun detected that one or more processes exited with non-zero status, thus causing
25: the job to be terminated. The first process to do so was:
25:
25: Process name: [[1788,1],0]
25: Exit code: 255
25: --------------------------------------------------------------------------
1/1 Test #25: NlpSparse2_4 .....................***Failed 9.54 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 9.60 sec

The following tests FAILED:
25 - NlpSparse2_4 (Failed)

There is a kernel image error (?), along with a whole bunch of other output. Not sure if this is a known issue.

@cameronrutherford cameronrutherford changed the title Update marianas variables and disable Newell CI. Update Marianas variables and disable Newell CI, upgrade to C++14 and CUDA 14 Jul 1, 2022
@cameronrutherford
Copy link
Collaborator Author

cameronrutherford commented Jul 1, 2022

It appears as though the unit test that I mentioned is still failing.

Additionally, it doesn't seem like push mirroring to PNNL's GitLab action has been re-enabled, and so I suggest doing that before this is merged.

As a side note, Newell is undergoing an update to allow for building with cuda11, and we can re-enable that pipeline in the near future (I hope).

@cameronrutherford cameronrutherford requested a review from cnpetra July 1, 2022 15:55
@cnpetra
Copy link
Collaborator

cnpetra commented Jul 1, 2022

how does this PR relates to #510 ?

@pelesh
Copy link
Collaborator

pelesh commented Jul 1, 2022

how does this PR relates to #510 ?

It addresses issues other than the one in #510 all revealed after recent upgrades. If there is a simple fix to #510, we could fold it in here.

@cnpetra
Copy link
Collaborator

cnpetra commented Jul 1, 2022

Unlikely that #510 will fix the kernel image error, isn't?

@pelesh
Copy link
Collaborator

pelesh commented Jul 1, 2022

Unlikely that #510 will fix the kernel image error, isn't?

We are dealing with a few different issues. This PR deals with compile standard and CUDA version upgrade. #510 is separate issue and causes compilation failure because optimization classes are not built with vendor specific compilers.

@cnpetra
Copy link
Collaborator

cnpetra commented Jul 1, 2022

Unlikely that #510 will fix the kernel image error, isn't?

We are dealing with a few different issues. This PR deals with compile standard and CUDA version upgrade. #510 is separate issue and causes compilation failure because optimization classes are not built with vendor specific compilers.

I take it that fixing #510 will not fix the runtime kernel image error.

Copy link
Collaborator

@cnpetra cnpetra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not for increase the CXX standard, unless for very good reasons = ?

Also --expt-relaxed-constexpr, I guess it's for the eigen-related warnings, it's not a fix, I think it just hides them (they will be fixed by a different approach).

@cameronrutherford
Copy link
Collaborator Author

I can remove the --expt-relaxed-constexpr since they were just for the eigen-related warnings as you said.

The CXX standard and CUDA target upgrades are being required with new versions of Umpire, RAJA and Camp.

Umpire requires C++ 14 here https://github.com/LLNL/Umpire/blob/develop/CMakeLists.txt#L37

Certain RAJA features require C++14 https://github.com/LLNL/RAJA/blob/develop/CMakeLists.txt#L83

Camp is quite clear about this as well https://github.com/LLNL/camp/blob/main/CMakeLists.txt#L17

These changes were required to get to the root of LLNL/camp#110. We can certainly wait on upgrading the C++ standard and CUDA standard, but as soon as the camp issue is resolved, if we want to upgrade to newest RAJA and Umpire we have to make this change.

@cameronrutherford
Copy link
Collaborator Author

Unlikely that #510 will fix the kernel image error, isn't?

We are dealing with a few different issues. This PR deals with compile standard and CUDA version upgrade. #510 is separate issue and causes compilation failure because optimization classes are not built with vendor specific compilers.

I take it that fixing #510 will not fix the runtime kernel image error.

Happy to re-build and try with this fix, but I am also pessimistic that this would fix the error.

@pelesh
Copy link
Collaborator

pelesh commented Jul 1, 2022

Should not for increase the CXX standard, unless for very good reasons = ?

Besides RAJA/Umpire requirements, I think this is a fairly safe update. On our target platforms we don't have a compiler which is not C++14 compliant.

@pelesh
Copy link
Collaborator

pelesh commented Jul 1, 2022

@cameronrutherford, kernel image error usually happens when you build your CUDA code with capability that does not match your GPU or driver. In this case it looks as if Ginkgo was built with different CUDA capability than the rest of the HiOp code. We can troubleshoot this offline.

Copy link
Collaborator

@cnpetra cnpetra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great and makes sense to require C++14 given the circumstances

@cnpetra
Copy link
Collaborator

cnpetra commented Jul 5, 2022

also, will re-enable the CI workflow for PNNL once the kernel image error is fixed.

@cnpetra cnpetra merged commit f445921 into develop Jul 5, 2022
@cnpetra cnpetra mentioned this pull request Jul 12, 2022
2 tasks
@cameronrutherford cameronrutherford changed the title Update Marianas variables and disable Newell CI, upgrade to C++14 and CUDA 14 Update Marianas variables and disable Newell CI, upgrade to C++14 and CUDA 11 Aug 25, 2022
nychiang pushed a commit that referenced this pull request Sep 30, 2022
… CUDA 14 (#508)

* Update marianas variables and disable Newell CI.

* Use C++14 and CUDA 14 & CMake fixes.
@cnpetra cnpetra deleted the pnnl-env-update branch June 30, 2023 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants