From e198cd5217c5e8e0d413b18f91e7175454954b41 Mon Sep 17 00:00:00 2001 From: gayle Date: Fri, 5 Apr 2024 12:46:50 +0800 Subject: [PATCH] update readme --- README.md | 131 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 108 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 68e55a6..5b34c1b 100644 --- a/README.md +++ b/README.md @@ -2,26 +2,102 @@ -We're delighted to welcome you to Helm multi-chart installer for Spark. Here, deploying essential Spark components on Kubernetes is made as simple as possible. With just Nginx and a ReadWriteMany (RWX) Persistent Volume ready, you're merely one Helm install command away from having your Spark components up and running! -This repository includes indispensable components like Hive Metastore, Spark Thrift Server, Lighter, Jupyter Lab, and the Spark History Server. You have the freedom to select required components for installation from the config file installer/values.yaml. +# Overview +Welcome to our Helm Chart Installer for Spark. This would enable a user to easily deploy the Spark Ecosystem Components on a Kubernetes Cluster. + + +The below components enable the following features: +1. Running Spark Notebooks using Spark and Spark SQL +2. Creating Spark Jobs using python +3. Tracking Spark Jobs using a UI + + +Components: +- Hive Metastore +- Spark Thrift Server +- Spark History Server +- Lighter Server +- Jupyter Lab with SparkMagic Kernel + + + +We invite you to try this out and let us know any issues/feedback you have via Github Issues. Do let us know what adaptions you have done for your setup via Github Discussions. + + + +## Customisation of the Helm Chart + +This helm chart supports various methods of customization +1. Modifying `values.yaml` +2. Providing a new `values.yaml` file +3. Using Kustomize + +
Show Details of Customization + +### Customising values.yaml +You may customise your installation of the above components by editing the file at [installer/values.yaml](installer/values.yaml). + +### Alternative Values File +Alternatively, you can create a copy of the values file and run the following modified command +```bash + helm install spark-bundle installer --values new_values.yaml --namespace kapitanspark --create-namespace + ``` + +### Using Kustomize : +This approach prevents you from modifying the original source code and enables you to customize as per your needs. + +You may refer to this section [Advanced Installation](#advanced-installation) +
+ + +### Installing Components Separately + +If you want to install each component separately, you can also navigate to the individual chart folder and run `helm install` as needed. + +### Creating Multiple Instances + +You may create multiple instances of this Helm Chart by specifying a different Helm Chart name, for example : production, staging and testing environments. + +
Show sample commands + +```bash +helm install spark-production installer --namespace kapitanspark-prod --create-namespace +``` + +```bash +helm install spark-testing installer --namespace kapitanspark-test --create-namespace +``` + +
+ + -To meet the diverse needs of different organisations, our Helm installer supports the creation of multiple instances. This feature accommodates environments which require distinct setups for scenarios such as production, staging, and testing. You simply need to provide a different name to the Helm installer. -For installing standalone components, navigate to the desired individual charts/ folder and execute the Helm install command. This allows you to add individual components as per your necessity. -The customization of Helm values.yaml can be done by passing ---values new_values.yaml file during installation. Similarly, if you want to modify any *.yaml file in the template/ folder, you can do this by passing --post-renderer ./kustomize.sh. Refer to the example command in the sections below for practical guidance. This approach prevents you from modifying the original source code and enables you to customize as per your organisation's needs. -We encourage you to explore, adapt and utilize this repository to its fullest potential. It's time you experienced a remarkably effortless Spark component installation. ## Usage -### Basic Installation -Suitable for starters with little knowledge on Kubernetes and Helm. Can also install on Microk8s. +### Quick Start +Suitable for users with basic knowledge on Kubernetes and Helm. Can also install on Microk8s. + +Requirements: +- Ingress +- Storage that support `ReadWriteMany`
Show instructions + + +#### (Optional) Setup of Local Kubernetes Cluster +You may skip the local setup if you already an existing kubernetes cluster you would like to use + + +
See details of setup for `microk8s` +At the moment, we have only tested this locally using `microk8s` and `minikube` + 1. If you are using Microk8s, below are the steps to install Nginx and PV with RWX support: ```sh @@ -29,6 +105,9 @@ Suitable for starters with little knowledge on Kubernetes and Helm. Can also ins microk8s enable ingress ``` +
+ +#### Installation of Helm Chart 2. Choose which components you need by enabling/disabling them at `installer/values.yaml`. 3. Run the following install command, where `spark-bundle` is the name you prefer: @@ -48,30 +127,33 @@ Suitable for starters with little knowledge on Kubernetes and Helm. Can also ins
-### Advanced Installation -This method is ideal for individuals who possess some expertise in Kubernetes and Helm. This approach enables you to extend existing configurations efficiently, be it setting up HTTPS, changing secret credentials, or passing Google service account credentials. Most importantly, you can achieve all these without having to make any modifications to the existing source code— a significant advantage that empowers you to maintain system integrity whilst customising to your needs. +### Advanced Installation and Customisation +This method is ideal for advanced users who have some expertise in Kubernetes and Helm. +This approach enables you to extend existing configurations efficiently for your needs, without modifying the existing source code.
Show instructions -1. Pre-installation step for existing Kubernetes with Nginx and Persistence Volume having RWX storage class supported (Example NFS or Longhorn). +Requirements: +- Ingress (Nginx) +- Storage that support `ReadWriteMany` , eg: NFS or Longhorn NFS -2. Customize your components by enabling or disabling them in installer/values.yaml. +1. Customize your components by enabling or disabling them in `installer/values.yaml` -3. Navigate to the directory `kcustomize/example/prod/`, and modify `google-secret.yaml` and `values.yaml` files. +2. Navigate to the directory `kcustomize/example/prod/`, and modify `google-secret.yaml` and `values.yaml` files. -4. Modify `jupyterlab/requirements.txt` according to your project before installation +3. Modify `jupyterlab/requirements.txt` according to your project before installation -5. Execute the install command stated below in the folder kcustomize/example/prod/, replacing `spark-bundle` with your preferred name. You can add `--dry-run=server` to test any error in helm files before installation: +4. Execute the install command stated below in the folder `kcustomize/example/prod/`, replacing `spark-bundle` with your preferred name. You can add `--dry-run=server` to test any error in helm files before installation: ```sh cd kcustomize/example/prod/ helm install spark-bundle ../../../installer --namespace kapitanspark --post-renderer ./kustomize.sh --values ./values.yaml --create-namespace ``` -6. If any errors occur during the installation step, run the command below to uninstall it. The `--wait` flag will ensure all pods are removed. +5. If any errors occur during the installation step, run the command below to uninstall it. The `--wait` flag will ensure all pods are removed. ```sh helm uninstall spark-bundle --namespace kapitanspark --wait ``` -7. After successful installation, you should be able to access the Jupyter Lab, Spark History Server and Lighter UI based on your configuration of the Ingress section in `values.yaml`. +6. After successful installation, you should be able to access the Jupyter Lab, Spark History Server and Lighter UI based on your configuration of the Ingress section in `values.yaml`.
@@ -83,13 +165,15 @@ This method is ideal for individuals who possess some expertise in Kubernetes an | Helm | 3 | -### Component +### Component Details and Defaults
Remarks - Hive metastore - - `hive-metastore/Dockerfile` is available for rebuilding. Post rebuilding, modify `image.repository`, `image.tag` in `values.yaml`. + - You may rebuild the image using the Dockerfile `hive-metastore/Dockerfile` + - After rebuilding, modify the following keys in `values.yaml`: `image.repository`, `image.tag` in `values.yaml`. - Spark Thrift Server - - Use `spark_docker_image/Dockerfile` for a rebuild. Later, adjust `image.repository`, `image.tag` in `values.yaml`. + - You may rebuild the image using the Dockerfile `spark_docker_image/Dockerfile` + - After rebuilding, modify the following keys in `values.yaml`: `image.repository`, `image.tag` in `values.yaml`. - Spark UI has been intentionally disabled at `spark-thrift-server/templates/service.yaml`. - Dependency: `hive-metastore` component. @@ -98,13 +182,14 @@ This method is ideal for individuals who possess some expertise in Kubernetes an - Default password: `spark ecosystem` - Lighter - - Utilize `spark_docker_image/Dockerfile` for rebuilding. After rebuilding, modify `image.spark.repository`, `image.spark.tag` in `values.yaml`. - - If Spark history uses Persistence Volume to save event log instead of Blob storage S3a, ensure to install it with `spark-history-server` component on the same Kubernetes namespace. + - You may rebuild the image using the Dockerfile `spark_docker_image/Dockerfile` + - After rebuilding, modify the following keys in `values.yaml`: `image.spark.repository`, `image.spark.tag` in `values.yaml`. + - If Spark History Server uses Persistent Volumes to save event logs instead of Blob storage S3a, ensure to install it with `spark-history-server` component on the same Kubernetes namespace. - Dependencies: `hive-metastore` and `spark-history-server` components. The latter can be turned off in `values.yaml`. - Default user: `dataOps` password: `5Wmi95w4` - Spark History Server - - By default, Persitence volume is used to read event log, to change update the `dir` key in `values.yaml` and in the `lighter` component, update `spark.history.eventLog.dir` key. + - By default, Persistent Volumes is used to read event logs, you may modify this by updating the `dir` key in [`spark-history-server/values.yaml`](installer/charts/spark-history-server/values.yaml) and in the `lighter` component, update key `spark.history.eventLog.dir` in [`lighter/values.yaml`](installer/charts/lighter/values.yaml) - If using Persistence volume instead of Blob storage S3a, ensure it is installed on the same namespace as other components. - Default user: `dataOps` password: `5Wmi95w4`