diff --git a/docs/img/pyspark-components.png b/docs/img/pyspark-components.png new file mode 100644 index 0000000000000..a0979d3465a92 Binary files /dev/null and b/docs/img/pyspark-components.png differ diff --git a/docs/img/pyspark-components.pptx b/docs/img/pyspark-components.pptx new file mode 100644 index 0000000000000..e0111a44e186e Binary files /dev/null and b/docs/img/pyspark-components.pptx differ diff --git a/python/docs/source/index.rst b/python/docs/source/index.rst index 34011ec7c5573..b9180cefe5dcc 100644 --- a/python/docs/source/index.rst +++ b/python/docs/source/index.rst @@ -21,8 +21,44 @@ PySpark Documentation ===================== +.. TODO(SPARK-32204): Add Binder integration at Live Notebook. + +PySpark is an interface for Apache Spark in Python. It not only allows you to write +Spark applications using Python APIs, but also provides the PySpark shell for +interactively analyzing your data in a distributed environment. PySpark supports most +of Spark's features such as Spark SQL, DataFrame, Streaming, MLlib +(Machine Learning) and Spark Core. + +.. image:: ../../../docs/img/pyspark-components.png + :alt: PySpark Compoenents + +**Spark SQL and DataFrame** + +Spark SQL is a Spark module for structured data processing. It provides +a programming abstraction called DataFrame and can also act as distributed +SQL query engine. + +**Streaming** + +Running on top of Spark, the streaming feature in Apache Spark enables powerful +interactive and analytical applications across both streaming and historical data, +while inheriting Spark’s ease of use and fault tolerance characteristics. + +**MLlib** + +Built on top of Spark, MLlib is a scalable machine learning library that provides +a uniform set of high-level APIs that help users create and tune practical machine +learning pipelines. + +**Spark Core** + +Spark Core is the underlying general execution engine for the Spark platform that all +other functionality is built on top of. It provides an RDD (Resilient Distributed Dataset) +and in-memory computing capabilities. + .. toctree:: :maxdepth: 2 + :hidden: getting_started/index user_guide/index