diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst index 63a7ecaa27ca6..13c6f8f3a28e2 100644 --- a/python/docs/source/getting_started/install.rst +++ b/python/docs/source/getting_started/install.rst @@ -83,46 +83,42 @@ Note that this installation way of PySpark with/without a specific Hadoop versio Using Conda ----------- -Conda is an open-source package management and environment management system which is a part of -the `Anaconda `_ distribution. It is both cross-platform and -language agnostic. In practice, Conda can replace both `pip `_ and -`virtualenv `_. +Conda is an open-source package management and environment management system (developed by +`Anaconda `_), which is best installed through +`Miniconda `_ or `Miniforge `_. +The tool is both cross-platform and language agnostic, and in practice, conda can replace both +`pip `_ and `virtualenv `_. -Create new virtual environment from your terminal as shown below: +Conda uses so-called channels to distribute packages, and together with the default channels by +Anaconda itself, the most important channel is `conda-forge `_, which +is the community-driven packaging effort that is the most extensive & the most current (and also +serves as the upstream for the Anaconda channels in most cases). -.. code-block:: bash - - conda create -n pyspark_env - -After the virtual environment is created, it should be visible under the list of Conda environments -which can be seen using the following command: - -.. code-block:: bash - - conda env list - -Now activate the newly created environment with the following command: +To create a new conda environment from your terminal and activate it, proceed as shown below: .. code-block:: bash + conda create -n pyspark_env conda activate pyspark_env -You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in the newly created -environment, for example as below. It will install PySpark under the new virtual environment -``pyspark_env`` created above. +After activating the environment, use the following command to install pyspark, +a python version of your choice, as well as other packages you want to use in +the same session as pyspark (you can install in several steps too). .. code-block:: bash - pip install pyspark - -Alternatively, you can install PySpark from Conda itself as below: + conda install -c conda-forge pyspark # can also add "python=3.8 some_package [etc.]" here -.. code-block:: bash +Note that `PySpark for conda `_ is maintained +separately by the community; while new versions generally get packaged quickly, the +availability through conda(-forge) is not directly in sync with the PySpark release cycle. - conda install pyspark +While using pip in a conda environment is technically feasible (with the same command as +`above <#using-pypi>`_), this approach is `discouraged `_, +because pip does not interoperate with conda. -However, note that `PySpark at Conda `_ is not necessarily -synced with PySpark release cycle because it is maintained by the community separately. +For a short summary about useful conda commands, see their +`cheat sheet `_. Manually Downloading