Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./out/demo_socp_gpu fails to solve its problem #180

Closed
kalmarek opened this issue Oct 15, 2021 · 30 comments
Closed

./out/demo_socp_gpu fails to solve its problem #180

kalmarek opened this issue Oct 15, 2021 · 30 comments

Comments

@kalmarek
Copy link
Contributor

kalmarek commented Oct 15, 2021

Specifications

  • OS: Arch Linux
  • SCS Version: master at 5be0e16
  • Compiler: gcc

Description

scs fails at solving ./out/demo_socp_gpu 1000 0.5 0.5 1

How to reproduce

linking against julia openblas:

JULIA_HOME="/opt/julias/julia-1.6"
JULIA_LD_PATH="$JULIA_HOME/lib/julia"
BLASLDFLAGS="-L$JULIA_LD_PATH -lopenblas64_"
SCSFLAGS="USE_OPENMP=1 BLAS64=1 BLASSUFFIX=_64_"
make -j4 CFLAGS="-march=native" DLONG=0 ${SCSFLAGS} BLASLDFLAGS="${BLASLDFLAGS}" gpu

then running it via

LD_LIBRARY_PATH=$JULIA_LD_PATH:$LD_LIBRARY_PATH ./out/demo_socp_gpu 1000 0.5 0.5 1

Additional information

similarly compiled direct and indirect solvers (cpu) work just fine

Output

seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones:    z: primal zero / dual free vars: 2000
          l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  9.46e+01  3.33e+04 -1.66e+04  1.00e-01  1.03e-03 
   250| 1.76e+04  4.31e+01  1.23e+04 -6.15e+03  1.00e-01  1.65e-01 
   500| 2.74e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.29e-01 
   750| 1.57e+04  4.26e+01  1.23e+04 -6.16e+03  1.00e-01  4.94e-01 
  1000| 1.64e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  6.85e-01 
  1250| 4.30e+21  2.67e+22  6.54e+22 -3.27e+22  1.00e-01  8.48e-01 
  1500| 1.90e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  9.48e-01 
  1750| 2.14e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.04e+00 
  2000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.13e+00 
  2250| 6.45e+20  2.19e+22  4.21e+22  2.11e+22  1.00e-01  1.22e+00 
  2500| 2.07e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.30e+00 
  2750| 2.53e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.39e+00 
  3000| 2.02e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.48e+00 
  3250| 5.72e+20  3.01e+22  3.73e+22  1.87e+22  1.00e-01  1.57e+00 
  3500| 2.09e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.66e+00 
  3750| 2.43e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.75e+00 
  4000| 2.31e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  1.84e+00 
 [ ... ]
 99500| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.65e+01 
 99750| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.67e+01 
100000| 2.48e+04  4.29e+01  1.23e+04 -6.16e+03  1.00e-01  3.68e+01 
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 3.68e+01s = setup: 5.47e-02s + solve: 3.68e+01s
         lin-sys: 3.16e+01s, cones: 7.88e-01s, accel: 4.77e-01s
------------------------------------------------------------------
objective = -6159.028853 (inaccurate)
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 0.000000
scs dua obj = -12318.057707
@bodono
Copy link
Member

bodono commented Oct 16, 2021

Thanks for posting. I am unable to reproduce this, when I run the command I get:

2021-10-16 14:47:37 (base) 0 bodonoghue@bodonoghue-[]-~/git/scs:
└──[ins] => out/demo_socp_gpu_indirect 1000 0.5 0.5 1
seed : 1

A is 4000 by 1000, with 32 nonzeros per column.
A has 32000 nonzeros (0.800000% dense).
Nonzeros of A take 0.000238 GB of storage.
Row idxs of A take 0.000119 GB of storage.
Col ptrs of A take 0.000004 GB of storage.

ScsCone information:
Zero cone rows: 2000
LP cone rows: 2000
Number of second-order cones: 0, covering 0 rows, with sizes
[]
Number of rows covered is 4000 out of 4000.

true pri opt = 2022.070521
true dua opt = 2022.070521
------------------------------------------------------------------
	       SCS v3.0.0 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 1000, constraints m: 4000
cones: 	  z: primal zero / dual free vars: 2000
	  l: linear vars: 2000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, warm_start: 0
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
	  nnz(A): 32000, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 6.90e+00  7.44e+00  2.65e+02  3.90e+03  1.00e-01  2.11e-02
    25| 3.80e-06  3.17e-04  3.36e-03  2.02e+03  1.00e-01  1.08e-01
------------------------------------------------------------------
status:  solved
timings: total: 6.66e-01s = setup: 5.58e-01s + solve: 1.08e-01s
	 lin-sys: 8.57e-02s, cones: 2.84e-04s, accel: 6.22e-05s
------------------------------------------------------------------
objective = 2022.072100
------------------------------------------------------------------
true pri opt = 2022.070521
true dua opt = 2022.070521
scs pri obj= 2022.070419
scs dua obj = 2022.073782

It might be the case that you are missing the gpu fixes I submitted here: 13e675d.

I did not cut a new release / tag with those fixes. Is that the issue?

By the way, you can better test the gpu using:

make purge
make test_gpu
out/run_tests_gpu_indirect

@kalmarek
Copy link
Contributor Author

I'm on master as of 5be0e16
I have CUDA_PATH=/opt/cuda in my env pointing to cuda-11.4.2.
I compiled scs with

make purge
make test_gpu

as advised and then test it with ./out/run_tests_gpu_indirect. here is what I get:

cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -DINDIRECT=1 -c src/scs.c -o src/scs_indir.o
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/util.o src/util.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/cones.o src/cones.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/aa.o src/aa.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/rw.o src/rw.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/linalg.o src/linalg.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/ctrlc.o src/ctrlc.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/scs_version.o src/scs_version.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o src/normalize.o src/normalize.c
cc  -c -o linsys/gpu/indirect/private.o linsys/gpu/indirect/private.c -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -I/opt/cuda/include -Ilinsys/gpu -Wno-c++11-long-long  -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o linsys/scs_matrix.o linsys/scs_matrix.c
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK   -c -o linsys/csparse.o linsys/csparse.c
mkdir -p out
ar rv out/libscsgpuindir.a src/scs_indir.o src/util.o src/cones.o src/aa.o src/rw.o src/linalg.o src/ctrlc.o src/scs_version.o src/normalize.o linsys/gpu/indirect/private.o linsys/scs_matrix.o linsys/csparse.o linsys/gpu/gpu.o
ar: creating out/libscsgpuindir.a
a - src/scs_indir.o
a - src/util.o
a - src/cones.o
a - src/aa.o
a - src/rw.o
a - src/linalg.o
a - src/ctrlc.o
a - src/scs_version.o
a - src/normalize.o
a - linsys/gpu/indirect/private.o
a - linsys/scs_matrix.o
a - linsys/csparse.o
a - linsys/gpu/gpu.o
ranlib out/libscsgpuindir.a
cc -g -Wall -Wwrite-strings -pedantic -funroll-loops -Wstrict-prototypes -I. -Iinclude -Ilinsys -O3 -fPIC -DCTRLC=1  -DCOPYAMATRIX=1  -DGPU_TRANSPOSE_MAT=1  -DUSE_LAPACK -o out/run_tests_gpu_indirect test/run_tests.c out/libscsgpuindir.a -lm -lrt -lblas -llapack  -L/opt/cuda/lib -L/opt/cuda/lib64 -lcudart -lcublas -lcusparse -Itest
test_fails
Testing that SCS handles bad inputs correctly:eps_abs tolerance must be positive
ERROR: Validation returned failure
Failure:could not initialize work
degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 2.10e+01  2.00e+00  7.90e+00 -3.95e+00  1.00e-01  1.47e-04 
   250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  2.53e-02 
   500| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  5.54e-02 
   750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.65e-02 
  1000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  9.70e-02 
  1250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.18e-01 
  1500| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.39e-01 
  1750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.60e-01 
  2000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  1.81e-01 
  2250| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  2.02e-01
[...]
 99750| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.39e+00 
100000| 5.69e+11  2.00e+00  0.00e+00  0.00e+00  1.00e+06  7.41e+00 
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 7.45e+00s = setup: 4.52e-02s + solve: 7.41e+00s
         lin-sys: 7.25e+00s, cones: 2.01e-02s, accel: 8.37e-02s
------------------------------------------------------------------
objective = 0.000000 (inaccurate)
------------------------------------------------------------------
INVALID STATUS
Tests run: 2

no fancy options, no julia-shipped blas ;)

~/local/src/scs   master  ldd ./out/run_tests_gpu_indirect 
        linux-vdso.so.1 (0x00007ffcff3ba000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f12b0400000)
        librt.so.1 => /usr/lib/librt.so.1 (0x00007f12b03f5000)
        libopenblas.so.3 => /usr/lib/libopenblas.so.3 (0x00007f12aefd5000)
        liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f12ae90b000)
        libcudart.so.11.0 => /opt/cuda/lib64/libcudart.so.11.0 (0x00007f12ae669000)
        libcublas.so.11 => /opt/cuda/lib64/libcublas.so.11 (0x00007f12a52b5000)
        libcusparse.so.11 => /opt/cuda/lib64/libcusparse.so.11 (0x00007f1296ec8000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f1296cfc000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f12b0597000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f1296cdb000)
        libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1296c97000)
        libgfortran.so.5 => /usr/lib/libgfortran.so.5 (0x00007f12969db000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f12969c0000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f12969b7000)
        libcublasLt.so.11 => /opt/cuda/lib64/libcublasLt.so.11 (0x00007f1282fbb000)
        libquadmath.so.0 => /usr/lib/../lib/libquadmath.so.0 (0x00007f1282f70000)

@bodono
Copy link
Member

bodono commented Oct 27, 2021

That's strange, I cannot reproduce this on the only gpu machine I have access to. Can you try disabling the AA? You can do it by changing ACCELERATION_LOOKBACK to 0 in include/glbopts.h which will disable it for the tests that do not specify it manually and it should be clear if that's the issue.

Here's what my ldd looks like, I don't see any major differences to yours:

└──[ins] => ldd out/run_tests_gpu_indirect
	linux-vdso.so.1 (0x00007ffc11d05000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7c3fcf9000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f7c3fc97000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f7c3f5fa000)
	libcudart.so.11.0 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f7c3f375000)
	libcublas.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublas.so.11 (0x00007f7c37e9a000)
	libcusparse.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcusparse.so.11 (0x00007f7c29e1c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7c29c55000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7c3fe94000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f7c2781e000)
	libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f7c27574000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7c2756e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7c2754d000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7c27542000)
	libcublasLt.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007f7c19776000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7c1956a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7c19550000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f7c19507000)

Can you try running

valgrind --leak-check=full out/run_tests_gpu_indirect

it likely won't help (and is very noisy for gpus) but just in case.

@kalmarek
Copy link
Contributor Author

I disabled AA but it changed just the numerical values in the log, not the behaviour;
here's valgrind log: https://gist.github.com/kalmarek/adb225c93de2bb8d9a7032caec42eea9

I think the problem is somewhere in problem generation (before scs), since the header looks like this:

test_fails
Testing that SCS handles bad inputs correctly:eps_abs tolerance must be positive
ERROR: Validation returned failure
Failure:could not initialize work
degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, warm_start: 0
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2

i.e. first non positive eps_abs and then a problem with 2 variables and 4 constraints?

@bodono
Copy link
Member

bodono commented Oct 27, 2021

That's just the output of the first test which is testing data validation and is working correctly. You will see the same if you run the non gpu tests without/run_tests_direct. The first real problem is a tiny lp with 2 vars and 4 constraints.

@duyipai
Copy link

duyipai commented Oct 28, 2021

I have got the same problem with @kalmarek .

@kalmarek
Copy link
Contributor Author

That's just the output of the first test which is testing data validation and is working correctly. You will see the same if you run the non gpu tests without/run_tests_direct. The first real problem is a tiny lp with 2 vars and 4 constraints.

yeah, maybe I should try to compare with run_tests_direct first ;)

@kalmarek
Copy link
Contributor Author

kalmarek commented Jan 7, 2022

@bodono: so I set VERBOSITY=2 and it seems that cg is never run succesfully. those cuda errors

linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument

seem to go away if i replace macro expanded CUBLAS(name) to the appropriate one, but the end result is the same. I literarly have no idea what I am doing ;), but you could suggest how to diagnose it next I'd be glad!

**********************************************************
Running test: test_validation
Testing that SCS handles bad inputs correctly:
eps_abs tolerance must be positive
ERROR: Validation returned failure
size of scs_int = 4, size of scs_float = 8
Failure:could not initialize work
**********************************************************
**********************************************************
Running test: degenerate
------------------------------------------------------------------
               SCS v3.0.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    l: linear vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 50, normalize: 1, warm_start: 0
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
getting pre-conditioner
finished getting pre-conditioner
size of scs_int = 4, size of scs_float = 8
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 2.10e+01  2.00e+00  7.90e+00 -3.95e+00  1.00e-01  3.27e-04 
Norm u = 2.306122, Norm u_t = 1.492570, Norm v = 1.939709, Norm x = 0.000000, Norm y = 4.450789, Norm s = 22.360680, Norm |Ax + s| = 2.24e+01, tau = 1.000000, kappa = 0.000000, |u - u_t| = 1.11e+00, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 7.90e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     1| 3.68e+01  2.00e+00  0.00e+00  0.00e+00  1.00e-01  6.66e-04 
Norm u = 17.210439, Norm u_t = 18.766100, Norm v = 29.666025, Norm x = 0.000000, Norm y = 0.000000, Norm s = 877.991704, Norm |Ax + s| = 8.78e+02, tau = 17.210439, kappa = 0.000000, |u - u_t| = 1.81e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     2| 9.46e+01  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.37e-03 
Norm u = 10.600861, Norm u_t = 22.294830, Norm v = 35.509350, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1226.504583, Norm |Ax + s| = 1.23e+03, tau = 10.600861, kappa = 0.000000, |u - u_t| = 2.20e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     3| 2.28e+02  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.07e-03 
Norm u = 5.455154, Norm u_t = 25.405974, Norm v = 40.611483, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1472.679019, Norm |Ax + s| = 1.47e+03, tau = 5.455154, kappa = 0.000000, |u - u_t| = 2.53e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     4| 5.39e+02  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.34e-03 
Norm u = 2.454521, Norm u_t = 26.247918, Norm v = 41.989207, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1544.434977, Norm |Ax + s| = 1.54e+03, tau = 2.454521, kappa = 0.000000, |u - u_t| = 2.62e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
     5| 1.26e+03  2.00e+00  0.00e+00  0.00e+00  1.00e-01  2.62e-03 
[...]
    48| 1.05e+18  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.60e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
linsys/gpu/indirect/private.c:506:scs_solve_lin_sys
 ERROR_CUDA (#): invalid argument
tol 1.000e-12
cg_its 0
    49| 5.29e+17  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.63e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
    50| 5.29e+17  2.00e+00  0.00e+00  0.00e+00  1.00e-01  1.63e-02 
Norm u = 0.000000, Norm u_t = 26.457513, Norm v = 42.332021, Norm x = 0.000000, Norm y = 0.000000, Norm s = 1569.004030, Norm |Ax + s| = 1.57e+03, tau = 0.000000, kappa = 0.000000, |u - u_t| = 2.65e+01, res_infeas = nan, res_unbdd_a = nan, res_unbdd_p = nan, ctx_tau = 0.00e+00, bty_tau = 0.00e+00
------------------------------------------------------------------
status:  solved (inaccurate - reached max_iters)
timings: total: 5.82e-02s = setup: 4.19e-02s + solve: 1.63e-02s
         lin-sys: 1.51e-02s, cones: 1.97e-05s, accel: 3.52e-06s
------------------------------------------------------------------
objective = 0.000000 (inaccurate)
------------------------------------------------------------------
**********************************************************
INVALID STATUS
Tests run: 2

@bodono
Copy link
Member

bodono commented Jan 7, 2022

Ok, can you try with VERBOSITY=4? That should print out some info on whether pcg is running correctly. The fact that you're seeing cg_its 0 is worrying.

The macro itself has an error check when VERBOSITY>0 (see here), which is why the error goes away when you replace it (although it does suggest that only that line is broken, which is strange).

@bodono
Copy link
Member

bodono commented Jan 7, 2022

I just pushed c10b3fe. Pull that down and see if it fixes it.

Sorry, false alarm.

@kalmarek
Copy link
Contributor Author

kalmarek commented Jan 8, 2022

Even with VERBOSITY=4 I don't see other output, since cg_gpu_norm(cublas_handle, r, n) < tol is satisfied in https://github.com/cvxgrp/scs/blob/77c86c89bc8d75dce0e8475c364f805fdb62cef0/linsys/gpu/indirect/private.c#L399
If I put the printf statement above I get the old

linsys/gpu/indirect/private.c:16:cg_gpu_norm
 ERROR_CUDA (#): invalid argument

I'm not sure how to test that my CUDA/cublas is installed properly?

@bodono
Copy link
Member

bodono commented Jan 9, 2022

Can you try setting USE_L2_NORM to 1?

@kalmarek
Copy link
Contributor Author

kalmarek commented Jan 9, 2022

I set it to 1 but I get a similar behavior (though no errors). I also checked that nrm is always 0 in cg_gpu_norm, though &r[1] prints as 1.000000...

@bodono
Copy link
Member

bodono commented Jan 14, 2022

This is so strange, I don't understand what's happening here at all and I can't reproduce this behavior on my gpu machine. If you really want to get to the bottom of this then I'm happy to get on a call and we can debug together manually on your machine.

@kalmarek
Copy link
Contributor Author

Thanks! I asked for the access to a nvidia gpu at my institution; If I can reproduce it there I'll get back to you!

@kalmarek
Copy link
Contributor Author

Dear @bodono
I managed to get access to a gpu-enabled node and run some tests there;

  • a simple make test_gpu which results in
~/local/scs$ ldd ./out/run_tests_gpu_indirect 
        linux-vdso.so.1 (0x00007fff935d2000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbb17291000)
        liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007fbb16bed000)
        libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007fbb16b80000)
        libcudart.so.10.1 => /usr/lib/x86_64-linux-gnu/libcudart.so.10.1 (0x00007fbb16904000)
        libcublas.so.10 => /usr/lib/x86_64-linux-gnu/libcublas.so.10 (0x00007fbb12b69000)
        libcusparse.so.10 => /usr/lib/x86_64-linux-gnu/libcusparse.so.10 (0x00007fbb0b8e0000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbb0b6ee000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fbb17459000)
        libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007fbb0b426000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fbb0b40b000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbb0b405000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fbb0b3e2000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbb0b3d6000)
        libcublasLt.so.10 => /usr/lib/x86_64-linux-gnu/libcublasLt.so.10 (0x00007fbb09532000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fbb09350000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fbb09306000)

runs just fine (11 out of 11 tests passed).

  • This works just fine even when I replace the systems CUDA with the one shipped with julia:
~/local/scs$ LD_LIBRARY_PATH="${CUDA_PATH}/lib" ldd out/run_tests_gpu_indirect
        linux-vdso.so.1 (0x00007ffd8ec76000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f472fbad000)
        liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f472f509000)
        libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f472f49c000)
        libcudart.so.10.1 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcudart.so.10.1 (0x00007f472f220000)
        libcublas.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcublas.so.10 (0x00007f472b47e000)
        libcusparse.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcusparse.so.10 (0x00007f47241f5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4724003000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f472fd75000)
        libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f4723d3b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4723d20000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4723d1a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4723cf7000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4723ceb000)
        libcublasLt.so.10 => /local/data/zz1594/.julia/artifacts/f049c2824a217dc29dbf657e5cdf0f8adafca77a/lib/libcublasLt.so.10 (0x00007f4721e47000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4721c65000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f4721c1b000)
  • however if I try to link against julia provided OpenBLAS with
BLASLDFLAGS="-L${JULIA_BLAS_PATH} -lopenblas64_"

make purge
make -j4 $SCSFLAGS BLASSUFFIX="_64_" BLAS64=1 DLONG=0 BLASLDFLAGS="${BLASLDFLAGS}" test_gpu

which results in

LD_LIBRARY_PATH="${JULIA_BLAS_PATH}" ldd out/run_tests_gpu_indirect
        linux-vdso.so.1 (0x00007ffd2f1bb000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0dd6654000)
        libopenblas64_.so => /local/data/zz1594/julia-1.7.2/lib/julia/libopenblas64_.so (0x00007f0dd48fc000)
        libcudart.so.10.1 => /usr/lib/x86_64-linux-gnu/libcudart.so.10.1 (0x00007f0dd4680000)
        libcublas.so.10 => /usr/lib/x86_64-linux-gnu/libcublas.so.10 (0x00007f0dd08e5000)
        libcusparse.so.10 => /usr/lib/x86_64-linux-gnu/libcusparse.so.10 (0x00007f0dc965e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0dc946a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0dd681c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0dc9447000)
        libgfortran.so.5 => /local/data/zz1594/julia-1.7.2/lib/julia/libgfortran.so.5 (0x00007f0dc918c000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0dc9186000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0dc917c000)
        libcublasLt.so.10 => /usr/lib/x86_64-linux-gnu/libcublasLt.so.10 (0x00007f0dc72d8000)
        libstdc++.so.6 => /local/data/zz1594/julia-1.7.2/lib/julia/libstdc++.so.6 (0x00007f0dc70c2000)
        libgcc_s.so.1 => /local/data/zz1594/julia-1.7.2/lib/julia/libgcc_s.so.1 (0x00007f0dc70a7000)
        libquadmath.so.0 => /local/data/zz1594/julia-1.7.2/lib/julia/libquadmath.so.0 (0x00007f0dc705e000)

I get a failure:

*********************************************************
Running test: hs21_tiny_qp
------------------------------------------------------------------
               SCS v3.2.1 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 2, constraints m: 4
cones:    b: box cone vars: 4
settings: eps_abs: 1.0e-06, eps_rel: 1.0e-06, eps_infeas: 1.0e-09
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 100000, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 4, nnz(P): 2
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 9.61e-01  1.17e-01  1.96e-01  9.80e-02  1.00e-01  4.95e-04 
    25| 4.08e-04  4.78e-02  1.14e-01  6.94e-18  1.00e-01  4.21e-03 
------------------------------------------------------------------
status:  infeasible
timings: total: 4.22e-03s = setup: 4.24e-04s + solve: 3.79e-03s
         lin-sys: 3.70e-03s, cones: 3.82e-06s, accel: 1.08e-06s
------------------------------------------------------------------
objective = inf
------------------------------------------------------------------
primal obj error  inf
dual obj error  inf
hs21_tiny_qp: SCS failed to produce outputflag SCS_SOLVED
Tests run: 6
  • similarly built run_tests_[in]direct pass all tests just fine

@bodono
Copy link
Member

bodono commented Apr 20, 2022

Hmmm, if the blas you're using is 64 bit it might be tricky to get everything to work with a GPU which (usually) expects 32 bit integers.

@kalmarek
Copy link
Contributor Author

hmm, precisely the same problem happens if I compile with

BLASLDFLAGS="-L${JULIA_BLAS_PATH} -lopenblas"
SCSFLAGS="USE_OPENMP=0 BLAS32=1 DLONG=0"

make purge
CUDA_PATH="${CUDA_PATH}" make -j4 $SCSFLAGS BLASLDFLAGS="${BLASLDFLAGS}" test_gpu

here is a gist from build, tests and ldd.
https://gist.github.com/kalmarek/0bb320b84871351bff1bb796e516c4a7

OpenBLAS is the LP64 version (integers are ints)

@bodono
Copy link
Member

bodono commented Apr 25, 2022

Looks like the tests are passing except for hs21, which is probably just because the numerics are slightly different on the GPU and it's producing a bad flag.

@kalmarek
Copy link
Contributor Author

kalmarek commented Nov 3, 2022

@bodono could you have a look at this problem:
https://cloud.impan.pl/s/MX5oBX0lHb5LJl2

It's the same problem that you obtain through this code:

let T = SCS.GpuIndirectSolver
    A = [
        1.0 1.0 0.0 0.0 0.0
        0.0 1.0 0.0 0.0 1.0
        0.0 0.0 1.0 1.0 1.0
        -1.0 0.0 0.0 0.0 0.0
        0.0 -1.0 0.0 0.0 0.0
        0.0 0.0 -1.0 0.0 0.0
        0.0 0.0 0.0 -1.0 0.0
        0.0 0.0 0.0 0.0 -1.0
    ]
    m, n = Int32.(size(A))
    args = (
        m = m,
        n = n,
        A = A,
        P = zeros(n, n),
        b = [5.0, 3.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        c = -[3.0, 4.0, 4.0, 9.0, 5.0],
        z = 0,
        l = 8,
        bu = Float64[],
        bl = Float64[],
        q = Int32[],
        s = Int32[],
        ep = 0,
        ed = 0,
        p = Float64[],
    )
    solution = SCS.scs_solve(T, args..., max_iters=200, write_data_filename="simple_problem.scs")
    @test isapprox(solution.x' * args.c, -99.0; rtol = 1e-4)
end

This is easily solvable by the (In)Direct solvers but fails with our julia bindings to the GPU solver.
Maybe by inspecting it by hand (it's a binary which I have no idea how to digest) we can learn what goes wrong?

this is what I get here:

writing data to simple_problem.scs
------------------------------------------------------------------
               SCS v3.2.0 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 5, constraints m: 8
cones:    l: linear vars: 8
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
          alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
          max_iters: 200, normalize: 1, rho_x: 1.00e-06
          acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-indirect GPU
          nnz(A): 12, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.26e+02  3.95e+00  1.22e+03 -6.94e+02  1.00e-01  7.87e-04 
Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag.
[...]
Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag.
   200|      nan       nan      -nan      -nan  1.00e-01  8.29e-01 
------------------------------------------------------------------
status:  unbounded (inaccurate - reached max_iters)
timings: total: 8.81e-01s = setup: 5.27e-02s + solve: 8.29e-01s
         lin-sys: 8.26e-01s, cones: 2.52e-05s, accel: 6.92e-04s
------------------------------------------------------------------
objective = -inf (inaccurate)
------------------------------------------------------------------

@bodono
Copy link
Member

bodono commented Nov 3, 2022

Did you compile with the INDIRECT flag?

@kalmarek
Copy link
Contributor Author

kalmarek commented Nov 3, 2022

this is the script I use to compile scs

script = raw"""
cd $WORKSPACE/srcdir/scs*
flags="DLONG=0 BLAS32=1 USE_OPENMP=0 INDIRECT=1"
blasldflags="-L${libdir} -lopenblas"

CUDA_PATH=$prefix/cuda make BLASLDFLAGS="${blasldflags}" ${flags} out/libscsgpuindir.${dlext}

mkdir -p ${libdir}
cp out/libscs*.${dlext} ${libdir}
"""

@kalmarek
Copy link
Contributor Author

kalmarek commented Nov 3, 2022

DINDIRECT=1 results in the same log

@bodono
Copy link
Member

bodono commented Nov 3, 2022

The error message Warning: tol = -1.000000 <= 0, likely compiled without setting INDIRECT flag. should only appear if the INDIRECT flag is not set during compilation.

When the INDIRECT flag is set SCS does the additional computation to generate a good warm-start and a sensible tolerance for the indirect system:

scs/src/scs.c

Line 366 in f2da64d

#if INDIRECT > 0

Otherwise the tolerance is set to -1.0, which is an invalid tolerance:

scs/src/scs.c

Line 361 in f2da64d

scs_float tol = -1.0; /* only used for indirect methods, overridden later */

And that trips a warning from the indirect system solvers (should probably error out):

scs_printf("Warning: tol = %4f <= 0, likely compiled without setting "

When that flag is not set SCS skips that computation for speed.

@bodono
Copy link
Member

bodono commented Nov 3, 2022

Hmmm, actually this is likely something to do with the GPU solver specifically. There is some issue in there that only trips on some GPUs that I have run into before. It's probably something to do with type sizes that I have not been able to figure out. I would probably recommend shelving the GPU solver for now, the MKL one is typically faster anyway.

@syockit
Copy link

syockit commented Apr 3, 2023

Try the following patch. I got all the tests to pass with this fix.

--- a/linsys/gpu/gpu.c
+++ b/linsys/gpu/gpu.c
@@ -19,13 +19,13 @@ void SCS(accum_by_atrans_gpu)(const ScsGpuMatrix *Ag,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_NON_TRANSPOSE, &onef, Ag->descr, x,
-   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   &onef, y, SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* this is slow, use trans routine if possible */
@@ -48,13 +48,13 @@ void SCS(accum_by_a_gpu)(const ScsGpuMatrix *Ag, const cusparseDnVecDescr_t x,
     if (*buffer != SCS_NULL) {
       cudaFree(*buffer);
     }
-    cudaMalloc(buffer, *buffer_size);
+    cudaMalloc(buffer, new_buffer_size);
     *buffer_size = new_buffer_size;
   }

   CUSPARSE_GEN(SpMV)
   (cusparse_handle, CUSPARSE_OPERATION_TRANSPOSE, &onef, Ag->descr, x, &onef, y,
-   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, buffer);
+   SCS_CUDA_FLOAT, SCS_CSRMV_ALG, *buffer);
 }

 /* This assumes that P has been made full (ie not triangular) and uses the

@bodono
Copy link
Member

bodono commented Apr 3, 2023

@syockit Thanks for this! I applied the patch and it worked! Do you want to turn this into a PR?

The only problem I had was an erroneous 'infeasible' certificate on hs21_tiny_qp and hs21_tiny_qp_rw tests. Do you get that too? I was able to get it to pass by tightening the eps_infeas tolerance in those files so if you have that problem too we can just do that.

@syockit
Copy link

syockit commented Apr 3, 2023

@bodono It's a hassle for me to set up a fork right now, so please apply the commit on your side.

You're right, I got the same infeasible certificate on the tests you mentioned. I missed that yesterday. And tightening eps_infeas did make it feasible.

@bodono
Copy link
Member

bodono commented Apr 4, 2023

Sure, no problem @syockit , thanks for sending in the patch!

@kalmarek
Copy link
Contributor Author

I presume this issue can be closed after #251 is merged

@bodono bodono closed this as completed Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants