Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI support #302

Open
GongSiqiu opened this issue Dec 26, 2024 · 14 comments
Open

MPI support #302

GongSiqiu opened this issue Dec 26, 2024 · 14 comments

Comments

@GongSiqiu
Copy link

Is it possible to support MPI?
SCS uses OpenMP to realize parallelism. However, when solving large-scale problems, we need MPI support to run them on HPC clusters.
Now, SCS can run on only one node and solve SDP problems using the indirect linear system solver with a 4000*4000 variable matrix (about 16000000 variables and 32000000 constraints). If I plan to calculate larger problems, about with a 16000 * 16000 variable matrix, MPI is necessary.

@kalmarek
Copy link
Contributor

afaik openmp is only used in two pragma calls (exp cone projection and over rows of y += A'*x accumulation). With these sizes you'll probably encounter numerical issues in eigendecomposition, MPI is not going to change it.

Do your problems have some additional structures (symmetry, sparsity, some algebraicness)? You'll be much better off exploiting those than trying to brute-force a large problem.

@GongSiqiu
Copy link
Author

This is the code for the problem.

import numpy as np
import cvxpy as cp

cov = np.load( "cov.npy")
omega = np.array([[0, 1], [-1, 0]])
M = len(cov) // 2;
Omega = np.kron(omega, np.eye(M))

n = 2 * M
X = cp.Variable((n,n), symmetric=True)
constraints = [cp.bmat([[X, Omega], [-Omega, X]]) >> 0] # or constraints = [X-1j*Omega >> 0]
constraints += [cov - X >> 0]
prob = cp.Problem(cp.Minimize(cp.trace(X)), constraints)
prob.solve(solver = 'SCS',use_indirect=True,verbose=True)

I can calculate a not-bad analytical solution for this problem but I don't know how to use it to speed up the optimization.
Although SCS supports warm start, it seems that only knowing the initial value of X seems insufficient to use it.

@kalmarek
Copy link
Contributor

I've never used scs through python, but I recall for warmstart you need to provide both primal and dual variables.

what is in cov.npy? does it have any structure (sparse, symmetric, chordal, etc)?

@GongSiqiu
Copy link
Author

cov.npy is a positive semidifinite matrix and satisfy the constraint that cov-1j*Omega positive semidifinite.
I also have a julia version but it seams that the parallel efficiency is worse than python version

using LinearAlgebra
using JuMP
using SCS
omega = [0 1; -1 0]
M = div(size(cov)[1], 2)
Omega = kron(omega, I(M))

n = 2 * M
model = Model(SCS.Optimizer)
set_attribute(model, "linear_solver", SCS.IndirectSolver)
@variable(model, X[1:n, 1:n], PSD)
@constraint(model, [X Omega; -Omega X] in PSDCone())
# @constraint(model, LinearAlgebra.Hermitian(X-Omega*1im) in HermitianPSDCone())
@constraint(model, cov - X in PSDCone())
@objective(model, Min, tr(X))
optimize!(model)

@kalmarek
Copy link
Contributor

yes, this I understand, but does cov.npy have any additional structure? :D

But the best speedup will be achieved by just not solving such large an optimization problem...

@GongSiqiu
Copy link
Author

I don't think cov.npy have some additional structure. It is a covariance matrix of a gaussian state.

@bodono
Copy link
Member

bodono commented Jan 2, 2025

You can warm-start just some of the variables I think, SCS will set the others to zero. But SCS is expecting the variables corresponding to the conic reformulation that CVXPY does under the hood, so you would need access to those. If you have one of the conic variables as a warm-start it should be possible to get reasonable (better than zero) warm-starts for the others too, using the KKT equations.

As for MPI, as @kalmarek says we only have OpenMP pragmas over a couple of minor routines, and your problem will likely be dominated by the eigendecomposition of the semidefinite cone variable. That being said, we do support MKL as the blas/lapack backend and that definitely uses multiple threads for the eigendecomp, it may even support MPI for clusters. That's probably your best bet.

@bodono
Copy link
Member

bodono commented Jan 2, 2025

Oh also the MKL pardiso direct solver is much faster than the indirect solver, I would recommend using the direct solver if you can.

@GongSiqiu
Copy link
Author

Thanks for your reply. I've tried MKL pardiso direct solver. The time it took is simular to the indirect solver.
Maybe I need to try other solvers or just use the analytical solution.

@bodono
Copy link
Member

bodono commented Jan 3, 2025

I am very surprised that the MKL pardiso solver requires about the same time as the indirect solver! It might be worth playing with the MPI settings (more threads or whatever) or the SCS solver settings to see if there is anything to squeeze out.

@GongSiqiu
Copy link
Author

G.cov is a (992, 992) PSD matrix. The indirect solver is faster and the CPU usage of the two solvers are both ~90%
图片
图片

@bodono
Copy link
Member

bodono commented Jan 3, 2025

Can you post the entire output trace (or the very end)? I would be interested in seeing the total times / iterations for each.

Are you controlling the number of threads? I think you can do that with env variable OMP_NUM_THREADS. It might be worth tinkering with that a bit.

@GongSiqiu
Copy link
Author

GongSiqiu commented Jan 3, 2025

It seems that the preprocess of pardiso uses only one thread and takes a lot of time
pardiso

indirect

@bodono
Copy link
Member

bodono commented Jan 3, 2025

Yes it's doing a factorization, I didn't realize that was done single-threaded, seems like a waste! That probably explains the slowdown for MKL then. In principle we could write an MKL indirect solver which would be faster, but that would take a little bit of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants