Skip to content

Add experimental support of cuQuantum#1400

Merged
hhorii merged 33 commits into
Qiskit:mainfrom
doichanj:cuStatevec
Mar 1, 2022
Merged

Add experimental support of cuQuantum#1400
hhorii merged 33 commits into
Qiskit:mainfrom
doichanj:cuStatevec

Conversation

@doichanj
Copy link
Copy Markdown
Collaborator

@doichanj doichanj commented Dec 13, 2021

Summary

This is the experimental support for NVIDIA's cuQuantum Beta 2 (ver 0.1.0).

Details and comments

We can use cuStateVec APIs instead of Aer's GPU implementations by setting options at runtime (see CONTRIBUTING.md for details). cuStateVec is enabled when building with CUSTATEVEC_ROOT with the path to cuQuantum.
By using cuStateVec, we can speed up x2 for large qubits (larger than 22 qubits) but Aer's implementation is still faster for smaller qubits.

Since cuQuantum is beta version, there are some limitations:

  • cuStateVec is not thread safe, multi-chunk parallelization (cache blocking) is done by single thread (slow)
  • Multi-shots parallelization is disabled (single thread, slow)
  • Multi-shots batched optimization is not support for cuStateVec

@chriseclectic chriseclectic added the on hold Can not fix yet label Dec 13, 2021
@chriseclectic
Copy link
Copy Markdown
Member

I added On-hold until 0.10 release is out

@chriseclectic chriseclectic removed the on hold Can not fix yet label Dec 14, 2021
Comment thread CONTRIBUTING.md
Comment thread qiskit/providers/aer/backends/aer_simulator.py Outdated
Copy link
Copy Markdown
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think introducing a new namespace is necessary for AER::QV to identify chunk-based or not. For example, GateFuncBase sounds too general and it should be in AER::QV::CHUNK or something.

Comment thread src/controllers/aer_controller.hpp Outdated
Comment thread src/controllers/aer_controller.hpp
Comment thread src/simulators/density_matrix/densitymatrix_state.hpp
Comment thread src/simulators/state.hpp Outdated
Comment thread src/simulators/statevector/chunk/chunk_container.hpp Outdated
Comment thread src/simulators/statevector/chunk/chunk_manager.hpp
@hhorii
Copy link
Copy Markdown
Collaborator

hhorii commented Feb 1, 2022

@doichanj a release note is necessary.

@hhorii hhorii added this to the Aer 0.10.3 milestone Feb 2, 2022
@jakelishman
Copy link
Copy Markdown
Member

This probably should also wait for Aer 0.11 - it's a big new feature, and patch releases are usually for bugfixes.

@hhorii hhorii modified the milestones: Aer 0.10.3, Aer 0.11.0 Feb 2, 2022
Copy link
Copy Markdown
Collaborator

@hhorii hhorii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I found that OpenMP does not work well in any devices. Let me investigate this phenomena is from only my configuration or common.

if(cuStateVec_enable_){
enable_batch_multi_shots_ = false; //cuStateVec does not support batch execution of multi-shots
parallel_shots_ = 1; //cuStateVec is currently not thread safe
return;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if cuStateVec_enable=True is configured in AerSimulator.run(), parallel_state_update_ is not set. This will produce performance regression if application accidientaly sets cuStateVec_enable with device='CPU'.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: when enable_batch_multi_shots_=true would you create nShots copies of the statevector for parallelization? If so & IIUC, I think a proper "workaround" is to create multiple cuStateVec handles (or just retain and reuse a pool of handles at init time to reduce overhead) and use them in parallel.

IMHO though it's beyond a "workaround": even after we fix the thread safety issue, generally speaking it is still challenging for library handles to be shared by multiple host threads. For example, despite cuBLAS supports this usage pattern they explicitly recommend to not do so. Thus the handle pool approach is commonly seen in ML/DL frameworks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_batch_multi_shots_=true is not applicable for cuStateVec currently, because multiple state vectors are calculated in a single CUDA kernel and each state vector refers classical registers to handle branch operations, this is not implemented in cuStateVec.
Multiple cuStateVec handle is required when enable_batch_multi_shots_=false and shot level parallelization is required. In this case, state vectors are independently calculated using OpenMP threads. (Currently cuStateVec is not thread safe and we disable OpenMP parallelization)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation @doichanj. I understand better now. So once we fix thread safety we can unblock you for the shot-level parallelization.

#'GPU_cuStateVec' is used only inside tests not available in Aer
#and this is converted to "device='GPU'" and option "cuStateVec_enalbe = True" is added
if cuStateVec:
data_args.append((method, 'GPU_cuStateVec'))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseclectic could you review this change? This is a hack to minimize changes of tests for tests of cuStateVec option. cuStateVec is an option only for device=GPU. Current annotator supported_methods() requires tests to take two argument method and device. Adding new option cuStateVec_enable to all the tests is not productive, I believe.

@chriseclectic chriseclectic self-assigned this Feb 15, 2022
@hhorii
Copy link
Copy Markdown
Collaborator

hhorii commented Feb 28, 2022

I confirmed that no regressions will be happened with this PR.

@hhorii hhorii merged commit db91e7d into Qiskit:main Mar 1, 2022
@hhorii hhorii mentioned this pull request Mar 29, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants