diff --git a/docs/source/index.rst b/docs/source/index.rst index 2cd735b0d..39ecab9f2 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -3,18 +3,24 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -Numba CUDA +Numba-CUDA ========== -This is the documentation for Numba's CUDA target. +Numba-CUDA provides a CUDA target for the Numba Python JIT Compiler. It is used +for writing SIMT kernels in Python, for providing Python bindings for +accelerated device libraries, and as a compiler for user-defined functions in +accelerated libraries like `RAPIDS `_. -This is presently a work-in-progress - the user guide and reference -documentation are presently direct copies of the `upstream Numba CUDA -documentation `_. +* To install Numba-CUDA, see: :ref:`numba-cuda-installation`. +* To get started writing CUDA kernels in Python with Numba, see + :ref:`writing-cuda-kernels`. +* Browse the :ref:`numba-cuda-examples` to see a variety of use cases of Numba-CUDA. + +Contents +======== .. toctree:: :maxdepth: 2 - :hidden: user/index.rst reference/index.rst diff --git a/docs/source/user/examples.rst b/docs/source/user/examples.rst index 5633c336b..ac833810a 100644 --- a/docs/source/user/examples.rst +++ b/docs/source/user/examples.rst @@ -1,3 +1,4 @@ +.. _numba-cuda-examples: ======== Examples diff --git a/docs/source/user/index.rst b/docs/source/user/index.rst index b31732820..21088c492 100644 --- a/docs/source/user/index.rst +++ b/docs/source/user/index.rst @@ -6,7 +6,7 @@ User guide .. toctree:: - overview.rst + installation.rst kernels.rst memory.rst device-functions.rst diff --git a/docs/source/user/installation.rst b/docs/source/user/installation.rst new file mode 100644 index 000000000..031f56f78 --- /dev/null +++ b/docs/source/user/installation.rst @@ -0,0 +1,105 @@ +.. _numba-cuda-installation: + +============ +Installation +============ + +Requirements +============ + +Supported GPUs +-------------- + +Numba supports all NVIDIA GPUs that are supported by the CUDA Toolkit it uses. +Presently for CUDA 11 this ranges from Compute Capabilities 3.5 to 9.0, and for +CUDA 12 this ranges from 5.0 to 12.1, depending on the exact installed version. + + +Supported CUDA Toolkits +----------------------- + +Numba-CUDA aims to support all minor versions of the two most recent CUDA +Toolkit releases. Presently 11 and 12 are supported; CUDA 11.2 is the minimum +required, because older releases (11.0 and 11.1) have a version of NVVM based on +a previous and incompatible LLVM version. + +For further information about version compatibility between toolkit and driver +versions, refer to :ref:`minor-version-compatibility`. + + +Installation with a Python package manager +========================================== + +Conda users can install the CUDA Toolkit into a conda environment. + +For CUDA 12:: + + $ conda install -c conda-forge numba-cuda "cuda-version>=12.0" + +Alternatively, you can install all CUDA 12 dependencies from PyPI via ``pip``:: + + $ pip install numba-cuda[cu12] + +For CUDA 11, ``cudatoolkit`` is required:: + + $ conda install -c conda-forge numba-cuda "cuda-version>=11.2,<12.0" + +or:: + + $ pip install numba-cuda[cu11] + +If you are not using Conda/pip or if you want to use a different version of CUDA +toolkit, :ref:`cudatoolkit-lookup` describes how Numba searches for a CUDA toolkit. + + +Configuration +============= + +.. _cuda-bindings: + +CUDA Bindings +------------- + +Numba supports interacting with the CUDA Driver API via either the `NVIDIA CUDA +Python bindings `_ or its own ctypes-based +bindings. Functionality is equivalent between the two binding choices. The +NVIDIA bindings are the default, and the ctypes bindings are now deprecated. + +If you do not want to use the NVIDIA bindings, the (deprecated) ctypes bindings +can be enabled by setting the environment variable +:envvar:`NUMBA_CUDA_USE_NVIDIA_BINDING` to ``"0"``. + + +.. _cudatoolkit-lookup: + +CUDA Driver and Toolkit search paths +------------------------------------ + +Default behavior +~~~~~~~~~~~~~~~~ + +When using the NVIDIA bindings, searches for the CUDA driver and toolkit +libraries use its `built-in path-finding logic `_. + +Ctypes bindings (deprecated) behavior +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When using the ctypes bindings, Numba searches for a CUDA toolkit installation +in the following order: + +1. Conda-installed CUDA Toolkit packages +2. Pip-installed CUDA Toolkit packages +3. The environment variable ``CUDA_HOME``, which points to the directory of the + installed CUDA toolkit (i.e. ``/home/user/cuda-12``) +4. System-wide installation at exactly ``/usr/local/cuda`` on Linux platforms. + Versioned installation paths (i.e. ``/usr/local/cuda-12.0``) are intentionally + ignored. Users can use ``CUDA_HOME`` to select specific versions. + +In addition to the CUDA toolkit libraries, which can be installed by conda into +an environment or installed system-wide by the `CUDA SDK installer +`_, the CUDA target in Numba also +requires an up-to-date NVIDIA driver. Updated NVIDIA drivers are also installed +by the CUDA SDK installer, so there is no need to do both. If the ``libcuda`` +library is in a non-standard location, users can set environment variable +:envvar:`NUMBA_CUDA_DRIVER` to the file path (not the directory path) of the +shared library file. diff --git a/docs/source/user/kernels.rst b/docs/source/user/kernels.rst index b4af2ccf8..a45e49c86 100644 --- a/docs/source/user/kernels.rst +++ b/docs/source/user/kernels.rst @@ -1,8 +1,14 @@ +.. _writing-cuda-kernels: ==================== Writing CUDA Kernels ==================== +Numba-CUDA supports programming NVIDIA CUDA GPUs by directly compiling a +restricted subset of Python code into CUDA kernels and device functions +following the CUDA execution model. + + Introduction ============ @@ -24,6 +30,28 @@ consider how to use and access memory in order to minimize bandwidth requirements and contention. +Terminology +=========== + +Several important terms in the topic of CUDA programming are listed here: + +- *host*: the CPU +- *device*: the GPU +- *host memory*: the system main memory +- *device memory*: onboard memory on a GPU card +- *kernels*: a GPU function launched by the host and executed on the device +- *device function*: a GPU function executed on the device which can only be + called from the device (i.e. from a kernel or another device function) + + +Programming model +================= + +Most CUDA programming facilities exposed by Numba map directly to the CUDA +C language offered by NVIDIA. Therefore, it is recommended you read the +official `CUDA C programming guide `_. + + Kernel declaration ================== diff --git a/docs/source/user/overview.rst b/docs/source/user/overview.rst deleted file mode 100644 index a740fb352..000000000 --- a/docs/source/user/overview.rst +++ /dev/null @@ -1,142 +0,0 @@ -======== -Overview -======== - -Numba supports CUDA GPU programming by directly compiling a restricted subset -of Python code into CUDA kernels and device functions following the CUDA -execution model. Kernels written in Numba appear to have direct access -to NumPy arrays. NumPy arrays are transferred between the CPU and the -GPU automatically. - - -Terminology -=========== - -Several important terms in the topic of CUDA programming are listed here: - -- *host*: the CPU -- *device*: the GPU -- *host memory*: the system main memory -- *device memory*: onboard memory on a GPU card -- *kernels*: a GPU function launched by the host and executed on the device -- *device function*: a GPU function executed on the device which can only be - called from the device (i.e. from a kernel or another device function) - - -Programming model -================= - -Most CUDA programming facilities exposed by Numba map directly to the CUDA -C language offered by NVidia. Therefore, it is recommended you read the -official `CUDA C programming guide `_. - - -Requirements -============ - -Supported GPUs --------------- - -Numba supports CUDA-enabled GPUs with Compute Capability 3.5 or greater. -Support for devices with Compute Capability less than 5.0 is deprecated, and -will be removed in a future Numba release. - -Devices with Compute Capability 5.0 or greater include (but are not limited to): - -- Embedded platforms: NVIDIA Jetson Nano, Jetson Orin Nano, TX1, TX2, Xavier - NX, AGX Xavier, AGX Orin. -- Desktop / Server GPUs: All GPUs with Maxwell microarchitecture or later. E.g. - GTX 9 / 10 / 16 series, RTX 20 / 30 / 40 / 50 series, Quadro / Tesla L / M / - P / V / RTX series, RTX A series, RTX Ada / SFF, A / L series, H100, B100. -- Laptop GPUs: All GPUs with Maxwell microarchitecture or later. E.g. MX series, - Quadro M / P / T series (mobile), RTX 20 / 30 series (mobile), RTX A / Ada - series (mobile). - -Software --------- - -Numba aims to support CUDA Toolkit versions released within the last 3 years. -Presently 11.2 is the minimum required toolkit version. An NVIDIA driver -sufficient for the toolkit version is also required (see also -:ref:`minor-version-compatibility`). - -Conda users can install the CUDA Toolkit into a conda environment. - -For CUDA 12, ``cuda-nvcc`` and ``cuda-nvrtc`` are required:: - - $ conda install -c conda-forge numba-cuda cuda-nvcc cuda-nvrtc "cuda-version>=12.0" - -Alternatively, you can install all CUDA 12 dependencies from PyPI via ``pip``:: - - $ pip install numba-cuda[cu12] - -For CUDA 11, ``cudatoolkit`` is required:: - - $ conda install -c conda-forge numba-cuda cudatoolkit "cuda-version>=11.2,<12.0" - -or:: - - $ pip install numba-cuda[cu11] - -If you are not using Conda or if you want to use a different version of CUDA -toolkit, the following describes how Numba searches for a CUDA toolkit -installation. - -.. _cuda-bindings: - -CUDA Bindings -~~~~~~~~~~~~~ - -Numba supports interacting with the CUDA Driver API via either the `NVIDIA CUDA -Python bindings `_ or its own ctypes-based -bindings. Functionality is equivalent between the two binding choices. The -ctypes-based bindings are presently the default, but the NVIDIA bindings will -be used by default (if they are available in the environment) in a future Numba -release. - -You can install the NVIDIA bindings with:: - - $ conda install -c conda-forge cuda-python - -if you are using Conda, or:: - - $ pip install cuda-python - -if you are using pip. Note that the bracket notation ``[cuXX]`` introduced above -will bring in this dependency for you. - -The use of the NVIDIA bindings is enabled by setting the environment variable -:envvar:`NUMBA_CUDA_USE_NVIDIA_BINDING` to ``"1"``. - -.. _cudatoolkit-lookup: - -Setting CUDA Installation Path -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Numba searches for a CUDA toolkit installation in the following order: - -1. Conda installed CUDA Toolkit packages -2. Environment variable ``CUDA_HOME``, which points to the directory of the - installed CUDA toolkit (i.e. ``/home/user/cuda-12``) -3. System-wide installation at exactly ``/usr/local/cuda`` on Linux platforms. - Versioned installation paths (i.e. ``/usr/local/cuda-12.0``) are intentionally - ignored. Users can use ``CUDA_HOME`` to select specific versions. - -In addition to the CUDA toolkit libraries, which can be installed by conda into -an environment or installed system-wide by the `CUDA SDK installer -`_, the CUDA target in Numba -also requires an up-to-date NVIDIA graphics driver. Updated graphics drivers -are also installed by the CUDA SDK installer, so there is no need to do both. -If the ``libcuda`` library is in a non-standard location, users can set -environment variable ``NUMBA_CUDA_DRIVER`` to the file path (not the directory -path) of the shared library file. - - -Missing CUDA Features -===================== - -Numba does not implement all features of CUDA, yet. Some missing features -are listed below: - -* dynamic parallelism -* texture memory