Skip to content

compileandrun v1.x

George Bosilca edited this page Feb 22, 2022 · 2 revisions

Table of Contents

How to Compile and Run DPLASMA

Version 2.0, December 22, 2012, tested with on release 1.0 (from the public repository https://bitbucket.org/bosilca/parsec.public/commits/e0fef3754b8fffb2d6fe48c6670f22836fdbc47b).

Tools Dependences

To compile DPLASMA on a new platform, you will need:

  • cmake version 2.8.0 or above. cmake can be found in the debian package cmake, or as sources at the cmake download page
  • PLASMA version 2.5 or above.
  • a BLAS library optimized for your platform: MKL, ACML, Goto, Atlas, VecLib (MAC OS X), or in the worst case default BLAS.

Configuring DPLASMA for a new platform

CMake is comparable to configure, but it's subtly different. For one thing, CMake display the commands with colors, which is its most prominent feature.

CMake keeps everything it found hitherto in a cache file name CMakeCache.txt. Until you have successfully configured dplasma, remove the CMakeCache.txt file each time you run cmake.

First, there are example invocations of cmake in the DPLASMA trunk ( [DPLASMA/contrib/platforms/] config.dancer is a typical linux system, config.jaguar is for, you got it, XT5, ...). We advise you to start from this file and tweak it for your system accordingly to the following guidelines.

In order to enable the Linear Algebra package the PLASMA library is required. If you have a fairly recent version, correctly configured, this should be a breeze as it provides a pkg-config file that our configuration scripts understand. Thus, is your version of PLASMA is recent enough providing -DPLASMA_DIR=**my path to the PLASMA lib**, should be enough to have a straightforward configurations process.

If not, the steps are slightly more complicated, as in addition to correctly configuring your PLASMA installation you will need to provide our configuration scripts with few hints to help the detection process. Assume that on your architecture, the BLAS are mkl in /opt/mkl/lib/em64t; that you need to link with mkl_gf_lp64, mkl_sequential and mkl_core to have all the BLAS working (NB: with DPLASMA, use the sequential version of the BLAS, always. Using the threaded version of the BLAS will decrease performance, even if setting OMP_NUM_THREADS=1). Assume also that the PLASMA library was installed in /opt/plasma. You'll want to run the following script in the DPLASMA directory.

Hopefully, once the expected arguments are provided the output will look similar to:

If this is done, congratulations, DPLASMA is configured and you're ready for building and testing the system.

In the unlikely case something goes wrong, read carefully the error message. We spend a significant amount of time trying to output something meaningful for you and for us (in case you need help to debug/understand). If the output is not helpful enough to fix the problem, you should contact us via the DPLASMA user mailing list and provide the CMake command and the flags, the output as well as the files CMakeFiles/CMakeError.log and CMakeFiles/CMakeOutput.log.

Troubleshooting

  • Issues we have encountered with BLAS libraries are out of scope of this README. Please, refer to your own experience to find how to have a working BLAS library, and header files. Those are supposed to be in the BLAS_LIBRARIES/lib and BLAS_LIBRARIES/include directories (create a phony directory with symbolic links to include/ and lib/ if needed).
  • When using the plasma-installer, some have reported that it was necessary after the make and make install, to copy all .h files found in src/ to include/
  • We use quite a few packages that are optional, don't panic if they are not found during the configuration. However, some of them are critical for increasing the performance (such as HWLOC).
  • Check that you have a working MPI somewhere accessible (mpicc and mpirun should be in your PATH)
  • If you have strange behavior, check that you have one of the following (if not, the atomic operations will not work, and that is damageable for the good operation of DPLASMA)
    • Found target X86_64
    • Found target gcc
    • Found target MACOSX
    • Found target x86_32
  • You can tune the compiler using variables (see also ccmake section):
    • CC to choose your C compiler
    • FC to choose your Fortran compiler
    • MPI_COMPILER to choose your mpicc compiler
    • MPIFC to choose your mpifortran compiler
    • CFLAGS to change your C compilation flags
    • LDFLAGS to change your C linking falgs

Tuning the configuration : ccmake

When the configuration is successful, you can tune it using ccmake:

(notice the double c of ccmake). This is an interactive tool, that let you choose the compilation parameters. Navigate with the arrows to the parameter you want to change and hit enter to edit. Recommended parameters are:

  • PARSEC_DEBUG OFF (and all other PARSEC_DEBUG options)
  • PARSEC_DIST_COLLECTIVES ON
  • PARSEC_DIST_WITH_MPI ON
  • PARSEC_GPU_WITH_CUDA ON
  • PARSEC_OMEGA_DIR OFF
  • PARSEC_PROF_* OFF (all PARSEC_PROF_ flags off)
  • DPLASMA_CALL_TRACE OFF
  • DPLASMA_GPU_WITH_MAGMA OFF

Using the 'expert' mode (key 't' to toggle to expert mode), you can change other usefull options, like

  • CMAKE_C_FLAGS_RELEASE
  • CMAKE_EXE_LINKER_FLAGS_RELEASE
  • CMAKE_Fortran_FLAGS_RELEASE
  • CMAKE_VERBOSE_MAKEFILE
  • And others to change the path to some compilers, for example.
The CMAKE_VERBOSE_MAKEFILE option, when turned ON, will display the command run when compiling, which can help debugging configurations mistakes.

When you have set all the options you want in ccmake, type 'c' to configure again, and 'g' to generate the files. If you entered wrong values in some fields, ccmake will complain at 'c' time.

Building Dplasma

If the configuration was good, compilation should be as simple and fancy as 'make'. To debug issues, turn the CMAKE_VERBOSE_MAKEFILE option to ON using ccmake, and check your compilation lines, and adapt your configuration options accordingly.

Running Dplasma

The dplasma library is compiled into dplasma/library. All testing programs are compiled in dplasma/testing. Exemples are:

  • dplasma/testing/testing_?getrf -> LU Factorization (simple or double precision)
  • dplasma/testing/testing_?geqrf -> QR Factorization (simple or double precision)
  • dplasma/testing/testing_?potrf -> Cholesky Factorization (simple or double precision)

All the binaries should accept as input:

  • -c <n></n> the number of threads used for kernel execution on each node. This should be set to the number of cores. Remember that one additional thread will be spawned to handle the communications in the MPI version, but in normal run, this thread shares the most available core with another thread.
  • -N SIZE, a mandatory argument to define the size of the matrix
  • -g <number></number> number of GPUs to use, if the operation is GPU-enabled
  • -t <blocksize></blocksize> columns in a tile
  • -T <blocksize></blocksize> rows in a tile, (WARNING: actually every algorithm included in DPLASMA requires square tiles)
  • -p <number></number> to require a 2-D block cyclic distribution of p rows
  • -q <number></number> to require a 2D block cyclic distribution of q columns

A typical dplasma run using MPI looks like:

Meaning that we'll run Cholesky factorization on 8 nodes, 8 computing threads per node, nodes being arranged in a 4x2 grid, with a distributed generation of the matrix of size 1000x1000 singles, with tiles of size 120x120.

Each test can dump the list of options with -h. Some tests have specific options (like -I to tune the inner block size in QR and LU, and -M in LU or QR to have non-square matrices).

Modular Component Architecture

In addition to the parameters usually accepted by DPLASMA (see for a full list), the PaRSEC runime engine can be tuned through its MCA. MCA parameters can be passed to the runtime engine after the DPLASMA arguments, by separating the DPLASMA arguments from the PaRSEC arguments with (e.g. would tell DPLASMA to use 8 cores, and PaRSEC to use the AP (Absolute Priority) scheduling heuristic).

A complete list of MCA parameters can be found by passing to the PaRSEC runtime engine (e.g. ).