diff --git a/docs/modules/hive/pages/getting_started/first_steps.adoc b/docs/modules/hive/pages/getting_started/first_steps.adoc index cc8e89a1..5a18f979 100644 --- a/docs/modules/hive/pages/getting_started/first_steps.adoc +++ b/docs/modules/hive/pages/getting_started/first_steps.adoc @@ -1,8 +1,8 @@ = First steps -:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow our setup guide and ensure all pods are ready for operation. +:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow the setup guide and ensure all pods are ready for operation. -After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Hive metastore cluster and it's dependencies. -Afterwards you can <<_verify_that_it_works, verify that it works>>. +After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, deploy a Hive metastore cluster and it's dependencies. +Afterward you can <<_verify_that_it_works, verify that it works>>. == Setup diff --git a/docs/modules/hive/pages/getting_started/index.adoc b/docs/modules/hive/pages/getting_started/index.adoc index 633e9c8c..3e36c20f 100644 --- a/docs/modules/hive/pages/getting_started/index.adoc +++ b/docs/modules/hive/pages/getting_started/index.adoc @@ -1,12 +1,12 @@ = Getting started :description: Learn to set up Apache Hive with the Stackable Operator. Includes installation, dependencies, and creating a Hive metastore on Kubernetes. -This guide will get you started with Apache Hive using the Stackable Operator. -It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance. +This guide gets you started with Apache Hive using the Stackable Operator. +It guides you through the installation of the operator, its dependencies and setting up your first Hive metastore instance. == Prerequisites -You will need: +You need: * a Kubernetes cluster * kubectl diff --git a/docs/modules/hive/pages/getting_started/installation.adoc b/docs/modules/hive/pages/getting_started/installation.adoc index ab29a3d8..118e74e6 100644 --- a/docs/modules/hive/pages/getting_started/installation.adoc +++ b/docs/modules/hive/pages/getting_started/installation.adoc @@ -1,7 +1,7 @@ = Installation -:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow our guide for easy setup and configuration. +:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow the guide for easy setup and configuration. -On this page you will install the Stackable Operator for Apache Hive and all required dependencies. +On this page you install the Stackable operator for Apache Hive and all required dependencies. For the installation of the dependencies and operators you can use Helm or `stackablectl`. The `stackablectl` command line tool is the recommended way to interact with operators and dependencies. @@ -10,7 +10,7 @@ Follow the xref:management:stackablectl:installation.adoc[installation steps] fo == Dependencies First you need to install MinIO and PostgreSQL instances for the Hive metastore. -PostgreSQL is required as a database for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to. +PostgreSQL is required as a database for Hive's metadata, and MinIO is used as a data store, which the Hive metastore also needs access to. There are two ways to install the dependencies: @@ -66,7 +66,7 @@ Now call `stackablectl` and reference those two files: include::example$getting_started/getting_started.sh[tag=stackablectl-install-minio-postgres-stack] ---- -This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators. +This installs MinIO and PostgreSQL as defined in the Stacks, as well as the operators. You can now skip the <> step that follows next. TIP: Consult the xref:management:stackablectl:quickstart.adoc[Quickstart] to learn more about how to use `stackablectl`. @@ -107,7 +107,7 @@ Run the following command to install all operators necessary for Apache Hive: include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators] ---- -The tool will show +The tool prints [source] ---- @@ -132,8 +132,7 @@ Then install the Stackable operators: include::example$getting_started/getting_started.sh[tag=helm-install-operators] ---- -Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators). -You are now ready to deploy the Apache Hive metastore in Kubernetes. +Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators). == What's next diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index 2e38dfca..3a77c1a4 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -19,7 +19,7 @@ This operator does not support deploying Hive itself, but xref:trino:index.adoc[ == Getting started -Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable Hive operator and its dependencies. +Follow the xref:getting_started/index.adoc[Getting started guide] which guides you through the installation of the Stackable Hive operator and its dependencies. It walks you through setting up a Hive metastore and connecting it to a demo Postgres database and a Minio instance to store data in. Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your Hive metastore configuration to your needs, or have a look at the <> for some example setups with either xref:trino:index.adoc[Trino] or xref:spark-k8s:index.adoc[Spark]. diff --git a/docs/modules/hive/pages/reference/commandline-parameters.adoc b/docs/modules/hive/pages/reference/commandline-parameters.adoc index e3e0abec..053f93b7 100644 --- a/docs/modules/hive/pages/reference/commandline-parameters.adoc +++ b/docs/modules/hive/pages/reference/commandline-parameters.adoc @@ -23,8 +23,8 @@ stackable-hive-operator run --product-config /foo/bar/properties.yaml *Multiple values:* false -If provided the operator will **only** watch for resources in the provided namespace. -If not provided it will watch in **all** namespaces. +If provided, the operator **only** watches for resources in the provided namespace. +If not provided, it watches in **all** namespaces. .Example: Only watch the `test` namespace [source,bash] diff --git a/docs/modules/hive/pages/reference/environment-variables.adoc b/docs/modules/hive/pages/reference/environment-variables.adoc index 22dda6b8..7c79e6af 100644 --- a/docs/modules/hive/pages/reference/environment-variables.adoc +++ b/docs/modules/hive/pages/reference/environment-variables.adoc @@ -36,7 +36,7 @@ docker run \ *Multiple values:* false -The operator will **only** watch for resources in the provided namespace `test`: +The operator **only** watches for resources in the provided namespace `test`: [source] ---- diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc index 95b67825..39b0a505 100644 --- a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -40,15 +40,19 @@ metastore: replicas: 1 ---- -All override property values must be strings. The properties will be formatted and escaped correctly into the XML file. +All override property values must be strings. +The properties are formatted and escaped correctly into the XML file. For a full list of configuration options we refer to the Hive https://cwiki.apache.org/confluence/display/hive/configuration+properties[Configuration Reference]. == The security.properties file -The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. +The `security.properties` file is used to configure JVM security properties. +It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. -The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this: +The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. +As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. +To cache resolved host names, you can configure the TTL of entries in the positive cache like this: [source,yaml] ---- @@ -64,9 +68,10 @@ NOTE: The operator configures DNS caching by default as shown in the example abo For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html -== Environment Variables +== Environment variables -In a similar fashion, environment variables can be (over)written. For example per role group: +In a similar fashion, environment variables can be (over)written. +For example per role group: [source,yaml] ---- @@ -91,3 +96,8 @@ metastore: config: {} replicas: 1 ---- + +== Pod overrides + +The Hive operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod. +Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature. diff --git a/docs/modules/hive/pages/usage-guide/database-driver.adoc b/docs/modules/hive/pages/usage-guide/database-driver.adoc index 45e38b24..7ffb1015 100644 --- a/docs/modules/hive/pages/usage-guide/database-driver.adoc +++ b/docs/modules/hive/pages/usage-guide/database-driver.adoc @@ -88,7 +88,7 @@ spec: mountPath: /stackable/externals ---- -This will make the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`. +This makes the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`. Once the above has completed successfully, you can confirm that the driver is in the expected location by running another job: diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc index b0b358b2..7ad5acea 100644 --- a/docs/modules/hive/pages/usage-guide/derby-example.adoc +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -1,9 +1,9 @@ = Derby example :description: Deploy a single-node Apache Hive Metastore with Derby or PostgreSQL. Includes setup for S3 integration and tips for database configuration. -Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. +The version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. -For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry]. +For a list of available versions check the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry]. It should generally be safe to simply use the latest image version that is available. .Create a single node Apache Hive Metastore cluster using Derby: @@ -123,7 +123,7 @@ This is called `scram-sha-256` and has been the default as of PostgreSQL 14. Unfortunately, Hive up until the latest 3.3.x version ships with JDBC drivers that do https://wiki.postgresql.org/wiki/List_of_drivers[_not_ support] this method. You might see an error message like this: `The authentication type 10 is not supported.` -If this is the case please either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`. +If this is the case, either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`. This installs PostgreSQL in version 10 to work around the issue mentioned above: [source,bash] diff --git a/docs/modules/hive/pages/usage-guide/index.adoc b/docs/modules/hive/pages/usage-guide/index.adoc index d00f5384..d0ca8ca0 100644 --- a/docs/modules/hive/pages/usage-guide/index.adoc +++ b/docs/modules/hive/pages/usage-guide/index.adoc @@ -1,6 +1,6 @@ = Usage guide :page-aliases: usage.adoc -This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways. +This Section helps you to use and configure the Stackable operator for Apache Hive in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies. diff --git a/docs/modules/hive/pages/usage-guide/operations/graceful-shutdown.adoc b/docs/modules/hive/pages/usage-guide/operations/graceful-shutdown.adoc index 36f5f129..2fe54de3 100644 --- a/docs/modules/hive/pages/usage-guide/operations/graceful-shutdown.adoc +++ b/docs/modules/hive/pages/usage-guide/operations/graceful-shutdown.adoc @@ -6,7 +6,7 @@ You can configure the graceful shutdown as described in xref:concepts:operations As a default, Hive metastores have `5 minutes` to shut down gracefully. -The Hive metastore process will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod. -After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal. +The Hive metastore process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod. +After the graceful shutdown timeout runs out, and the process is still running, Kubernetes issues a `SIGKILL` signal. However, there is no acknowledge message in the log indicating a graceful shutdown. diff --git a/docs/modules/hive/pages/usage-guide/operations/index.adoc b/docs/modules/hive/pages/usage-guide/operations/index.adoc index 21d170f9..b6a1309b 100644 --- a/docs/modules/hive/pages/usage-guide/operations/index.adoc +++ b/docs/modules/hive/pages/usage-guide/operations/index.adoc @@ -2,4 +2,4 @@ This section of the documentation is intended for the operations teams that maintain a Stackable Data Platform installation. -Please read the xref:concepts:operations/index.adoc[Concepts page on Operations] that contains the necessary details to operate the platform in a production environment. +Read the xref:concepts:operations/index.adoc[Concepts page on Operations] that contains the necessary details to operate the platform in a production environment. diff --git a/docs/modules/hive/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/hive/pages/usage-guide/operations/pod-disruptions.adoc index 0031a92c..0de0024b 100644 --- a/docs/modules/hive/pages/usage-guide/operations/pod-disruptions.adoc +++ b/docs/modules/hive/pages/usage-guide/operations/pod-disruptions.adoc @@ -2,7 +2,7 @@ You can configure the permitted Pod disruptions for Hive nodes as described in xref:concepts:operations/pod_disruptions.adoc[]. -Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs: +Unless you configure something else or disable the default PodDisruptionBudgets (PDBs), the operator writes the following PDBs: == Metastores -We only allow a single metastore to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Allow only a single metastore to be offline at any given time, regardless of the number of replicas or `roleGroups`. diff --git a/docs/modules/hive/pages/usage-guide/security.adoc b/docs/modules/hive/pages/usage-guide/security.adoc index 69fe7654..2c500038 100644 --- a/docs/modules/hive/pages/usage-guide/security.adoc +++ b/docs/modules/hive/pages/usage-guide/security.adoc @@ -14,15 +14,17 @@ Additionally, you need a service-user which the secret-operator uses to create p The next step is to enter all the necessary information into a SecretClass, as described in xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation]. The following guide assumes you have named your SecretClass `kerberos`. === 3. Configure HDFS to use SecretClass -The next step is to configure your HdfsCluster to use the newly created SecretClass. Please follow the xref:hdfs:usage-guide/security.adoc[HDFS security guide] to set up and test this. -Please make sure to use the SecretClass named `kerberos`. It is also necessary to configure 2 additional things in HDFS: +The next step is to configure your HdfsCluster to use the newly created SecretClass. +Follow the xref:hdfs:usage-guide/security.adoc[HDFS security guide] to set up and test this. +Make sure to use the SecretClass named `kerberos`. +It is also necessary to configure 2 additional things in HDFS: * Define group mappings for users with `hadoop.user.group.static.mapping.overrides` * Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive` to impersonate all other users. An example of the above can be found in this https://github.com/stackabletech/hive-operator/blob/main/tests/templates/kuttl/kerberos-hdfs/30-install-hdfs.yaml.j2[integration test]. -NOTE: This is only relevant if HDFS is used with the Hive metastore (many installations will use the metastore with an S3 backend instead of HDFS). +NOTE: This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS). === 4. Configure Hive to use SecretClass The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.