NVIDIA · kkraus14 · Jun 13, 2025 · Jun 12, 2025 · Jun 12, 2025 · Jun 13, 2025
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,18 +3,24 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Numba CUDA
+Numba-CUDA
 ==========
 
-This is the documentation for Numba's CUDA target.
+Numba-CUDA provides a CUDA target for the Numba Python JIT Compiler. It is used
+for writing SIMT kernels in Python, for providing Python bindings for
+accelerated device libraries, and as a compiler for user-defined functions in
+accelerated libraries like `RAPIDS <https://rapids.ai>`_.
 
-This is presently a work-in-progress - the user guide and reference
-documentation are presently direct copies of the `upstream Numba CUDA
-documentation <https://numba.readthedocs.io/en/0.60.0/>`_.
+* To install Numba-CUDA, see: :ref:`numba-cuda-installation`.
+* To get started writing CUDA kernels in Python with Numba, see
+  :ref:`writing-cuda-kernels`.
+* Browse the :ref:`numba-cuda-examples` to see a variety of use cases of Numba-CUDA.
+
+Contents
+========
 
 .. toctree::
    :maxdepth: 2
-   :hidden:
 
    user/index.rst
    reference/index.rst
diff --git a/docs/source/user/examples.rst b/docs/source/user/examples.rst
@@ -1,3 +1,4 @@
+.. _numba-cuda-examples:
 
 ========
 Examples

diff --git a/docs/source/user/index.rst b/docs/source/user/index.rst
@@ -6,7 +6,7 @@ User guide
 
 .. toctree::
 
-   overview.rst
+   installation.rst
    kernels.rst
    memory.rst
    device-functions.rst

diff --git a/docs/source/user/installation.rst b/docs/source/user/installation.rst
@@ -0,0 +1,105 @@
+.. _numba-cuda-installation:
+
+============
+Installation
+============
+
+Requirements
+============
+
+Supported GPUs
+--------------
+
+Numba supports all NVIDIA GPUs that are supported by the CUDA Toolkit it uses.
+Presently for CUDA 11 this ranges from Compute Capabilities 3.5 to 9.0, and for
+CUDA 12 this ranges from 5.0 to 12.1, depending on the exact installed version.
+
+
+Supported CUDA Toolkits
+-----------------------
+
+Numba-CUDA aims to support all minor versions of the two most recent CUDA
+Toolkit releases. Presently 11 and 12 are supported; CUDA 11.2 is the minimum
+required, because older releases (11.0 and 11.1) have a version of NVVM based on
+a previous and incompatible LLVM version.
+
+For further information about version compatibility between toolkit and driver
+versions, refer to :ref:`minor-version-compatibility`.
+
+
+Installation with a Python package manager
+==========================================
+
+Conda users can install the CUDA Toolkit into a conda environment.
+
+For CUDA 12::
+
+    $ conda install -c conda-forge numba-cuda "cuda-version>=12.0"
+
+Alternatively, you can install all CUDA 12 dependencies from PyPI via ``pip``::
+
+    $ pip install numba-cuda[cu12]
+
+For CUDA 11, ``cudatoolkit`` is required::
+
+    $ conda install -c conda-forge numba-cuda "cuda-version>=11.2,<12.0"
+
+or::
+
+    $ pip install numba-cuda[cu11]
+
+If you are not using Conda/pip or if you want to use a different version of CUDA
+toolkit, :ref:`cudatoolkit-lookup` describes how Numba searches for a CUDA toolkit.
+
+
+Configuration
+=============
+
+.. _cuda-bindings:
+
+CUDA Bindings
+-------------
+
+Numba supports interacting with the CUDA Driver API via either the `NVIDIA CUDA
+Python bindings <https://nvidia.github.io/cuda-python/>`_ or its own ctypes-based
+bindings. Functionality is equivalent between the two binding choices. The
+NVIDIA bindings are the default, and the ctypes bindings are now deprecated.
+
+If you do not want to use the NVIDIA bindings, the (deprecated) ctypes bindings
+can be enabled by setting the environment variable
+:envvar:`NUMBA_CUDA_USE_NVIDIA_BINDING` to ``"0"``.
+
+
+.. _cudatoolkit-lookup:
+
+CUDA Driver and Toolkit search paths
+------------------------------------
+
+Default behavior
+~~~~~~~~~~~~~~~~
+
+When using the NVIDIA bindings, searches for the CUDA driver and toolkit
+libraries use its `built-in path-finding logic <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings/cuda/bindings/_path_finder>`_.
+
+Ctypes bindings (deprecated) behavior
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using the ctypes bindings, Numba searches for a CUDA toolkit installation
+in the following order:
+
+1. Conda-installed CUDA Toolkit packages
+2. Pip-installed CUDA Toolkit packages
+3. The environment variable ``CUDA_HOME``, which points to the directory of the
+   installed CUDA toolkit (i.e. ``/home/user/cuda-12``)
+4. System-wide installation at exactly ``/usr/local/cuda`` on Linux platforms.
+   Versioned installation paths (i.e. ``/usr/local/cuda-12.0``) are intentionally
+   ignored. Users can use ``CUDA_HOME`` to select specific versions.
+
+In addition to the CUDA toolkit libraries, which can be installed by conda into
+an environment or installed system-wide by the `CUDA SDK installer
+<https://developer.nvidia.com/cuda-downloads>`_, the CUDA target in Numba also
+requires an up-to-date NVIDIA driver.  Updated NVIDIA drivers are also installed
+by the CUDA SDK installer, so there is no need to do both. If the ``libcuda``
+library is in a non-standard location, users can set environment variable
+:envvar:`NUMBA_CUDA_DRIVER` to the file path (not the directory path) of the
+shared library file.
diff --git a/docs/source/user/kernels.rst b/docs/source/user/kernels.rst
@@ -1,8 +1,14 @@
+.. _writing-cuda-kernels:
 
 ====================
 Writing CUDA Kernels
 ====================
 
+Numba-CUDA supports programming NVIDIA CUDA GPUs by directly compiling a
+restricted subset of Python code into CUDA kernels and device functions
+following the CUDA execution model.
+
+
 Introduction
 ============
 
@@ -24,6 +30,28 @@ consider how to use and access memory in order to minimize bandwidth
 requirements and contention.
 
 
+Terminology
+===========
+
+Several important terms in the topic of CUDA programming are listed here:
+
+- *host*: the CPU
+- *device*: the GPU
+- *host memory*: the system main memory
+- *device memory*: onboard memory on a GPU card
+- *kernels*: a GPU function launched by the host and executed on the device
+- *device function*: a GPU function executed on the device which can only be
+  called from the device (i.e. from a kernel or another device function)
+
+
+Programming model
+=================
+
+Most CUDA programming facilities exposed by Numba map directly to the CUDA
+C language offered by NVIDIA. Therefore, it is recommended you read the
+official `CUDA C programming guide <http://docs.nvidia.com/cuda/cuda-c-programming-guide>`_.
+
+
 Kernel declaration
 ==================
 

diff --git a/docs/source/user/overview.rst b/docs/source/user/overview.rst