Skip to content

[WIP] Reduce BLAS dependency#1496

Closed
hhorii wants to merge 2 commits into
Qiskit:mainfrom
hhorii:use_single_thread_blas
Closed

[WIP] Reduce BLAS dependency#1496
hhorii wants to merge 2 commits into
Qiskit:mainfrom
hhorii:use_single_thread_blas

Conversation

@hhorii
Copy link
Copy Markdown
Collaborator

@hhorii hhorii commented Apr 1, 2022

Summary

Work around memory allocation errors in OpenBLAS when multi-threads run

Details and comments

OpenBLAS sometimes fail test cases of Aer when multiple threads run.

$ python -m unittest test.terra.backends.aer_simulator.test_noise.TestNoise 
....BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault

This error can be worked around by setting OPENBLAS_NUM_THREADS=1 :

$ OPENBLAS_NUM_THREADS=1 python -m unittest test.terra.backends.aer_simulator.test_noise.TestNoise 
.......................................................
----------------------------------------------------------------------
Ran 55 tests in 4.844s

OK

In CI, stestr runs multiple threads or multiple processes and then encounters unsuitability:

{0} test.terra.backends.aer_simulator.test_noise.TestNoise.test_kraus_gate_noise_on_QFT_cache_blocking_2___statevector____Thrust__ [] ... inprogress

This PR sets OPENBLAS_NUM_THREADS=1 when AerBackend is loaded.

@jakelishman
Copy link
Copy Markdown
Member

Is this something we are capable of detecting within the library? It feels like some bug in our defaults that this can happen.

@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Apr 1, 2022

The official FAQ said:

If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading.

Maybe openblas_set_num_threads(1) is necessary if Aer uses OpenBLAS.

Another solution may be not to use BLAS. Currently, only matrix.hpp uses it to support matrix operators. Aer uses them only for small size of matrices. I guess that few performance impact from reduction of BLAS.

BTW, the current CI errors are from recent terra release. CI does not download binary from pypi, and then fails to compile qiskit-terra because of lack of rust compiler.

@mtreinish
Copy link
Copy Markdown
Member

We need to update our wheel jobs to use manylinux2014 so the terra binaries are compatible with the build env

@jakelishman
Copy link
Copy Markdown
Member

Maybe openblas_set_num_threads(1) is necessary if Aer uses OpenBLAS

Yeah, I was thinking something along these lines - we know if we're spawning threads/processes already, and we should know whether we're linked against OpenBLAS or something else, so we should make sure we're not conflicting with their assumptions.

We should set up our defaults to "just work" for the most part - if we're having trouble with a simple python -munittest TestNoise, it looks like a behaviour problem on our side.

@hhorii hhorii force-pushed the use_single_thread_blas branch 4 times, most recently from 8d67d9a to 1d2eba3 Compare April 4, 2022 06:50
@hhorii hhorii changed the title [WIP] set OPENBLAS_NUM_THREADS=1 in stestr of CI set OPENBLAS_NUM_THREADS=1 for OpenBLAS to use a single thread if not configured Apr 4, 2022
@hhorii hhorii added this to the Aer 0.10.4 milestone Apr 4, 2022
@hhorii hhorii mentioned this pull request Apr 4, 2022
1 task
Comment thread pyproject.toml Outdated
[[tool.cibuildwheel.overrides]]
select = "cp3{6,7,8,9}-manylinux*"
manylinux-x86_64-image = "manylinux2010"
# manylinux-x86_64-image = "manylinux2010"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this for real in #1498 (and a corresponding fix for the stable branch in #1499)

Suggested change
# manylinux-x86_64-image = "manylinux2010"

@hhorii hhorii added the stable-backport-potential The issue or PR might be minimal and/or import enough to backport to stable label Apr 6, 2022
@mtreinish
Copy link
Copy Markdown
Member

I'm not sure this fixes the underlying issue. For me locally when I run with main I'm able to still get a segfault when I run with OPENBLAS_NUM_THREADS=1:

OPENBLAS_NUM_THREADS=1 .tox/py310/bin/python -m unittest test.terra.backends.aer_simulator.test_noise.TestNoise 
.............OpenBLAS : Program will terminate because you tried to allocate too many memory regions.
...OpenBLAS : Program will terminate because you tried to allocate too many memory regions.
...<frozen importlib._bootstrap>:914: ImportWarning: _SixMetaPathImporter.find_spec() not found; falling back to find_module()
....................................
----------------------------------------------------------------------
Ran 55 tests in 2.141s

OK
zsh: segmentation fault (core dumped)  OPENBLAS_NUM_THREADS=1 .tox/py310/bin/python -m unittest 

I also tried with this PR applied and I also get a segfault in openblas although a without the allocation error message. It might be something with my local openblas install though. But I think we should avoid setting things via env variables via python like this if there is a bug in how we're using openblas I think we should fix that in the source.

@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Apr 6, 2022

OK. I believe that we need more time to resolve OpenBLAS issue and this PR should not block 0.10.4 release.

@mtreinish mtreinish removed the stable-backport-potential The issue or PR might be minimal and/or import enough to backport to stable label Apr 6, 2022
@mtreinish mtreinish removed this from the Aer 0.10.4 milestone Apr 6, 2022
@hhorii hhorii force-pushed the use_single_thread_blas branch 4 times, most recently from dba8990 to b555807 Compare April 12, 2022 13:23
@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Apr 19, 2022

matrix multiplication is frequently used in matrix_product_state method. Here are performance comparisons (QV) of matrix_product_state with 0.10.4 and 168f7b7 (Intel(R) Xeon(R) Gold 6140 CPU, 18cores x 2HT, Ubuntu 18.0.4, gcc 8.4.0). Probably, for low-qubit circuits, library calls to OpenBLAS library were bottlenecks.

image
image

Another comparisons on Macbook Pro(Intel Core i7, 6core x 2HT, macOS BigSur).
image
image

Aer does not use matrix multiplication heavily, in general.

@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Apr 19, 2022

In statevector method, matrix multiplication is used in fusion, mainly. Here are performance comparisons (QV) of statevector with 0.10.4 and 168f7b7 (Intel(R) Xeon(R) Gold 6140 CPU, 18cores x 2HT, Ubuntu 18.0.4, gcc 8.4.0).
image
image

@hhorii hhorii force-pushed the use_single_thread_blas branch 2 times, most recently from fec92f6 to 168f7b7 Compare April 19, 2022 12:34
@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Apr 19, 2022

#pragma omp parallel if (BaseState::threads_ > 1) num_threads(BaseState::threads_)
{
MPS temp;
#pragma omp for
for (int_t i=0; i<static_cast<int_t>(shots); i++) {
temp.initialize(qreg_);
auto single_result = temp.apply_measure_internal(sorted_qubits, rnds_list[i]);
all_samples[i] = single_result;
}
}

The above OMP loop produces SEGV in test.terra.backends.aer_simulator.test_noise.TestNoise.test_readout_noise_5___matrix_product_state____CPU__. By taking OMP pragma, no error is thrown.
If a bug exists in it, we need to fix it instead of work-around by reducing use of BLAS.

@hhorii hhorii force-pushed the use_single_thread_blas branch from 168f7b7 to f0e4782 Compare April 19, 2022 15:39
@hhorii hhorii changed the title set OPENBLAS_NUM_THREADS=1 for OpenBLAS to use a single thread if not configured Eliminate BLAS dependency May 16, 2022
@hhorii hhorii force-pushed the use_single_thread_blas branch from e604353 to c62caf8 Compare May 19, 2022 05:32
@hhorii hhorii changed the title Eliminate BLAS dependency [WIP] Reduce BLAS dependency May 19, 2022
@hhorii hhorii force-pushed the use_single_thread_blas branch 3 times, most recently from 94ca954 to d7e6357 Compare May 19, 2022 16:05
@hhorii hhorii force-pushed the use_single_thread_blas branch from d7e6357 to 224525b Compare May 20, 2022 07:46
@hhorii
Copy link
Copy Markdown
Collaborator Author

hhorii commented Jul 14, 2022

#1555 will resolve this BLAS issue

@hhorii hhorii closed this Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants