Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test suite failures on i686/armhf #196

Open
mbanck opened this issue Oct 13, 2020 · 7 comments
Open

test suite failures on i686/armhf #196

mbanck opened this issue Oct 13, 2020 · 7 comments

Comments

@mbanck
Copy link
Contributor

mbanck commented Oct 13, 2020

So as I mentioned elsewhere, I see the test suite failing on most 32bit architectures (but not alpha or powerpc; those have 16 byte long double as opposed to 8 or 12 for the rest, which is the only apparent difference I could spot).

I made the eri tests output even without a failure in order to compare the output between x86-64 and x86-32, and even if for 32bit the reference and libint numbers match, they are way off the corresponding 64bit values. Should those numbers be in principle identical or are they machine-dependent? It could be due to a wrong fix in order to make the eri test build (see #194), but the other tests (including the Hartree-Fock tests) fail as well, see e.g. https://buildd.debian.org/status/fetch.php?pkg=libint2&arch=i386&ver=2.6.0-4&stamp=1602502875&raw=0

One example:

32bit:

Testing (s|s)  deriv order = 1
Elem 0 di= 0 v=0 : ref = -130.531 libint = -130.531 relabs_error = 0
Elem 0 di= 1 v=0 : ref = -103.863 libint = -103.863 relabs_error = 2.73647e-16
Elem 0 di= 2 v=0 : ref = -77.9148 libint = -77.9148 relabs_error = 3.64779e-16
Elem 0 di= 3 v=0 : ref = 130.531 libint = -130.531 relabs_error = 2
Elem 0 di= 4 v=0 : ref = 103.863 libint = -103.863 relabs_error = 2
Elem 0 di= 5 v=0 : ref = 77.9148 libint = -77.9148 relabs_error = 2
failed
Testing (s|p)  deriv order = 1
Elem 0 di= 0 v=0 : ref = 36.4571 libint = 36.4571 relabs_error = 0
Elem 0 di= 1 v=0 : ref = -12.9406 libint = -12.9406 relabs_error = 0
Elem 0 di= 2 v=0 : ref = 2.53014 libint = 2.53014 relabs_error = 0
Elem 0 di= 3 v=0 : ref = -36.4571 libint = -36.4571 relabs_error = 1.55919e-15
Elem 0 di= 4 v=0 : ref = 12.9406 libint = 12.9406 relabs_error = 0
Elem 0 di= 5 v=0 : ref = -2.53014 libint = -2.53014 relabs_error = 1.7552e-16
Elem 1 di= 0 v=0 : ref = -12.9406 libint = -12.9406 relabs_error = 2.74541e-16
Elem 1 di= 1 v=0 : ref = -14.3173 libint = -14.3173 relabs_error = 6.20351e-16
Elem 1 di= 2 v=0 : ref = 10.5351 libint = 10.5351 relabs_error = 1.68614e-16
Elem 1 di= 3 v=0 : ref = 12.9406 libint = 12.9406 relabs_error = 2.74541e-16
Elem 1 di= 4 v=0 : ref = 14.3173 libint = 14.3173 relabs_error = 3.10176e-15
Elem 1 di= 5 v=0 : ref = -10.5351 libint = -10.5351 relabs_error = 0
Elem 2 di= 0 v=0 : ref = 2.53014 libint = 2.53014 relabs_error = 3.51039e-16
Elem 2 di= 1 v=0 : ref = 10.5351 libint = 10.5351 relabs_error = 3.37227e-16
Elem 2 di= 2 v=0 : ref = 37.5052 libint = 37.5052 relabs_error = 1.89452e-16
Elem 2 di= 3 v=0 : ref = -2.53014 libint = -2.53014 relabs_error = 1.7552e-16
Elem 2 di= 4 v=0 : ref = -10.5351 libint = -10.5351 relabs_error = 1.68614e-16
Elem 2 di= 5 v=0 : ref = -37.5052 libint = -37.5052 relabs_error = 9.4726e-16
ok

64bit:

Testing (s|s)  deriv order = 1
Elem 0 di= 0 v=0 : ref = -173.336 libint = -173.336 relabs_error = 1.63969e-16
Elem 0 di= 1 v=0 : ref = -81.8765 libint = -81.8765 relabs_error = 3.47129e-16
Elem 0 di= 2 v=0 : ref = 55.7602 libint = 55.7602 relabs_error = 5.09713e-16
Elem 0 di= 3 v=0 : ref = 173.336 libint = 173.336 relabs_error = 1.63969e-16
Elem 0 di= 4 v=0 : ref = 81.8765 libint = 81.8765 relabs_error = 1.73565e-16
Elem 0 di= 5 v=0 : ref = -55.7602 libint = -55.7602 relabs_error = 5.09713e-16
ok
Testing (s|p)  deriv order = 1
Elem 0 di= 0 v=0 : ref = 6.59281 libint = 6.59281 relabs_error = 1.34719e-16
Elem 0 di= 1 v=0 : ref = 0.190251 libint = 0.190251 relabs_error = 2.91778e-15
Elem 0 di= 2 v=0 : ref = -1.23046 libint = -1.23046 relabs_error = 2.16549e-15
Elem 0 di= 3 v=0 : ref = -6.59281 libint = -6.59281 relabs_error = 8.08315e-16
Elem 0 di= 4 v=0 : ref = -0.190251 libint = -0.190251 relabs_error = 2.91778e-15
Elem 0 di= 5 v=0 : ref = 1.23046 libint = 1.23046 relabs_error = 2.16549e-15
Elem 1 di= 0 v=0 : ref = 0.190251 libint = 0.190251 relabs_error = 9.19102e-15
Elem 1 di= 1 v=0 : ref = 6.33288 libint = 6.33288 relabs_error = 1.40249e-16
Elem 1 di= 2 v=0 : ref = 2.33071 libint = 2.33071 relabs_error = 5.71614e-16
Elem 1 di= 3 v=0 : ref = -0.190251 libint = -0.190251 relabs_error = 7.29446e-16
Elem 1 di= 4 v=0 : ref = -6.33288 libint = -6.33288 relabs_error = 1.40249e-15
Elem 1 di= 5 v=0 : ref = -2.33071 libint = -2.33071 relabs_error = 3.81076e-16
Elem 2 di= 0 v=0 : ref = -1.23046 libint = -1.23046 relabs_error = 9.74469e-15
Elem 2 di= 1 v=0 : ref = 2.33071 libint = 2.33071 relabs_error = 7.62151e-16
Elem 2 di= 2 v=0 : ref = -8.38073 libint = -8.38073 relabs_error = 2.11957e-16
Elem 2 di= 3 v=0 : ref = 1.23046 libint = 1.23046 relabs_error = 1.80457e-16
Elem 2 di= 4 v=0 : ref = -2.33071 libint = -2.33071 relabs_error = 9.52689e-16
Elem 2 di= 5 v=0 : ref = 8.38073 libint = 8.38073 relabs_error = 2.11957e-16
ok

The three 32bit failures have the wrong sign:

Elem 0 di= 3 v=0 : ref = 130.531 libint = -130.531 relabs_error = 2
Elem 0 di= 4 v=0 : ref = 103.863 libint = -103.863 relabs_error = 2
Elem 0 di= 5 v=0 : ref = 77.9148 libint = -77.9148 relabs_error = 2

For other parts of the test (probably more operations are done there), the sign is correct but the values are off.

32bit:

Testing (d|p) 
Elem 0 di= 0 v=0 : ref = 92.7058 libint = 112.767 relabs_error = 0.216392
Elem 1 di= 0 v=0 : ref = -227.271 libint = -227.271 relabs_error = 1.25056e-16
Elem 2 di= 0 v=0 : ref = 288.784 libint = 288.784 relabs_error = 1.96837e-16
Elem 3 di= 0 v=0 : ref = 31.1654 libint = 31.1654 relabs_error = 2.27991e-16
Elem 4 di= 0 v=0 : ref = -14.6554 libint = -14.6554 relabs_error = 0
Elem 5 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 2.58628e-16
Elem 6 di= 0 v=0 : ref = -39.6006 libint = -39.6006 relabs_error = 3.58855e-16
Elem 7 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 3.87942e-16
Elem 8 di= 0 v=0 : ref = -11.3334 libint = -11.3334 relabs_error = 1.56737e-16
Elem 9 di= 0 v=0 : ref = 136.386 libint = 136.386 relabs_error = 0
Elem 10 di= 0 v=0 : ref = -164.712 libint = -199.036 relabs_error = 0.208391
Elem 11 di= 0 v=0 : ref = 296.522 libint = 296.522 relabs_error = 0
Elem 12 di= 0 v=0 : ref = -6.86838 libint = -6.86838 relabs_error = 0
Elem 13 di= 0 v=0 : ref = -31.8628 libint = -31.8628 relabs_error = 0
Elem 14 di= 0 v=0 : ref = 19.3917 libint = 19.3917 relabs_error = 0
Elem 15 di= 0 v=0 : ref = 139.708 libint = 139.708 relabs_error = 0
Elem 16 di= 0 v=0 : ref = -239.045 libint = -239.045 relabs_error = 2.37794e-16
Elem 17 di= 0 v=0 : ref = 216.515 libint = 260.13 relabs_error = 0.20144
failed

64bit:

Testing (d|p) 
Elem 0 di= 0 v=0 : ref = -0.708763 libint = -0.708763 relabs_error = 9.39854e-16
Elem 1 di= 0 v=0 : ref = 3.88409 libint = 3.88409 relabs_error = 6.86013e-16
Elem 2 di= 0 v=0 : ref = 2.94575 libint = 2.94575 relabs_error = 1.05529e-15
Elem 3 di= 0 v=0 : ref = -0.851277 libint = -0.851277 relabs_error = 2.60837e-16
Elem 4 di= 0 v=0 : ref = -2.51181 libint = -2.51181 relabs_error = 1.2376e-15
Elem 5 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 1.20328e-15
Elem 6 di= 0 v=0 : ref = -0.645621 libint = -0.645621 relabs_error = 3.43924e-16
Elem 7 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 1.54707e-15
Elem 8 di= 0 v=0 : ref = -2.87358 libint = -2.87358 relabs_error = 9.27252e-16
Elem 9 di= 0 v=0 : ref = 4.82524 libint = 4.82524 relabs_error = 1.84069e-16
Elem 10 di= 0 v=0 : ref = -1.2273 libint = -1.2273 relabs_error = 1.99014e-15
Elem 11 di= 0 v=0 : ref = 2.36188 libint = 2.36188 relabs_error = 1.31617e-15
Elem 12 di= 0 v=0 : ref = 0.645867 libint = 0.645867 relabs_error = 5.1569e-16
Elem 13 di= 0 v=0 : ref = -1.22949 libint = -1.22949 relabs_error = 0
Elem 14 di= 0 v=0 : ref = -1.85462 libint = -1.85462 relabs_error = 7.1835e-16
Elem 15 di= 0 v=0 : ref = 4.46347 libint = 4.46347 relabs_error = 0
Elem 16 di= 0 v=0 : ref = 2.88074 libint = 2.88074 relabs_error = 1.07911e-15
Elem 17 di= 0 v=0 : ref = -1.10788 libint = -1.10788 relabs_error = 6.0127e-16
ok
@mbanck
Copy link
Contributor Author

mbanck commented Jan 24, 2021

Some observations:

  1. Downloading the generated libint-cp2k tarball (https://github.com/cp2k/libint-cp2k/releases/download/v2.6.0/libint-v2.6.0-cp2k-lmax-4.tgz) and just running ./configure && make && make check in a minimal Debian unstable i586 chroot passes the tests, so it does not seem to be a general toolchain issue.
  2. the non-deriv eri test (the first ./test 0 2 test) seems to pass if --with-opt-am is lowered to 0 or 1, however, the deriv 1 tests (./test 1 1) already fail the s test, so this is independent of possibly lowering --with-eri-opt-am et al.:
Testing  (ss|ss)  deriv order = 1: Elem 0 di= 6 v=0 : ref = -14.7577 libint = -185.892 relabs_error = 11.5962
  1. when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:
--- ../libint-v2.6.0-cp2k-lmax-4/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc   2019-08-05 13:36:39.000000000 +0000
+++ libint-2.6.0/src/CR_DerivGaussP0InBra_aB_d001__0__s__1___TwoPRep_s__0__s__1___Ab__up_0.cc   2021-01-24 16:17:49.000000000 +0000
@@ -33,42 +33,40 @@
 {
 const int vi = 0;
 LIBINT2_REALTYPE fp1;
-fp1 = 2.0000000000000000e+00 * src1[((hsi*3+2)*1+lsi)*1];
-LIBINT2_REALTYPE fp2;
-fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
+fp1 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
 LIBINT2_REALTYPE fp0;
-fp0 = fp2 - fp1;
+fp0 = fp1 - src1[((hsi*3+2)*1+lsi)*1];
 target[((hsi*6+5)*1+lsi)*1] = fp0;
+LIBINT2_REALTYPE fp3;
+fp3 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
 LIBINT2_REALTYPE fp4;
-fp4 = 1.0000000000000000e+00 * src1[((hsi*3+1)*1+lsi)*1];
+fp4 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
+LIBINT2_REALTYPE fp2;
+fp2 = fp4 - fp3;
+target[((hsi*6+4)*1+lsi)*1] = fp2;
 LIBINT2_REALTYPE fp5;
-fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+8)*1+lsi)*1];
-LIBINT2_REALTYPE fp3;
-fp3 = fp5 - fp4;
-target[((hsi*6+4)*1+lsi)*1] = fp3;
-LIBINT2_REALTYPE fp6;
-fp6 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
-target[((hsi*6+3)*1+lsi)*1] = fp6;
+fp5 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+7)*1+lsi)*1];
+target[((hsi*6+3)*1+lsi)*1] = fp5;
+LIBINT2_REALTYPE fp7;
+fp7 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
 LIBINT2_REALTYPE fp8;
-fp8 = 1.0000000000000000e+00 * src1[((hsi*3+0)*1+lsi)*1];
+fp8 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
+LIBINT2_REALTYPE fp6;
+fp6 = fp8 - fp7;
+target[((hsi*6+2)*1+lsi)*1] = fp6;
 LIBINT2_REALTYPE fp9;
-fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+5)*1+lsi)*1];
-LIBINT2_REALTYPE fp7;
-fp7 = fp9 - fp8;
-target[((hsi*6+2)*1+lsi)*1] = fp7;
+fp9 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
+target[((hsi*6+1)*1+lsi)*1] = fp9;
 LIBINT2_REALTYPE fp10;
-fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+4)*1+lsi)*1];
-target[((hsi*6+1)*1+lsi)*1] = fp10;
-LIBINT2_REALTYPE fp11;
-fp11 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
-target[((hsi*6+0)*1+lsi)*1] = fp11;
+fp10 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+2)*1+lsi)*1];
+target[((hsi*6+0)*1+lsi)*1] = fp10;
 }
 }
 }
 const int hsi = 0;
 const int lsi = 0;
 const int vi = 0;
-/** Number of flops = 12 */
+/** Number of flops = 11 */
 }

 #ifdef __cplusplus

The full diff of lmax=4 is here: https://people.debian.org/~mbanck/libint2.diff.gz
5. Once I apply that diff and build, the eri test suite passes.

@susilehtola
Copy link
Contributor

The interesting bit is that the failing values appear to be correct; they just have the wrong sign!

@mbanck
Copy link
Contributor Author

mbanck commented Jan 25, 2021

The interesting bit is that the failing values appear to be correct; they just have the wrong sign!

Only for some of the failures, but it could be that the others are just multiple sign-flips adding up

@mbanck
Copy link
Contributor Author

mbanck commented Jan 25, 2021

1. when I try to reproduce the above libint-cp2k tarball, I get some possibly relevant diffs even if I use the same configure flags for the compiler (and the tests still fail), it might be due to the environment (they use alpine/musl, not Debian GNU/Linux), like:

Not sure why I didn't try this earlier, but I get the same diff (or rather, no diff) if I build the libint compiler under x86-64. So something at 32bit leads to the different code generation and subsequently the test suite failures. So it is again not an environment/toolchain issue.

@mbanck
Copy link
Contributor Author

mbanck commented Jan 25, 2021

Only 478 files out of almost 4000 at lmax=4 are generated differently, by the way, they like those

CR_DerivGaussP[01]InBra_aB_[...]
CR_aB_[XYZ][01234]_0__Overlap_[..]
CR_aB_[spdf]__0__Kinetic_[..]
OSVRRElecPotIn{Bra,Ket}_[..]
OSVRRP[01]InBra_aB_[...]
OSVRRSMultipole_aB_[...]

@mbanck
Copy link
Contributor Author

mbanck commented Jan 26, 2021

Using the x86-64 generated but 32bit built libint makes the CP2K libint-related regtests pass

@StefanBruens
Copy link

There is exactly one semantic difference in the diff above:

 fp2 = inteval->two_alpha0_bra[vi] * src0[((hsi*10+9)*1+lsi)*1];
-target[((hsi*6+5)*1+lsi)*1] = fp2 - 2.0e+0 * src1[((hsi*3+2)*1+lsi)*1];
+target[((hsi*6+5)*1+lsi)*1] = fp2 - 1.0e+0 * src1[((hsi*3+2)*1+lsi)*1];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants