Skip to content

Commit

Permalink
Add descriptions (#511)
Browse files Browse the repository at this point in the history
  • Loading branch information
fhennig authored Sep 18, 2024
1 parent e5b30ab commit b392ffa
Show file tree
Hide file tree
Showing 15 changed files with 45 additions and 34 deletions.
6 changes: 3 additions & 3 deletions docs/modules/hive/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
= First steps
:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow our setup guide and ensure all pods are ready for operation.

After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you
will now deploy a Hive metastore cluster and it's dependencies. Afterwards you can
<<_verify_that_it_works, verify that it works>>.
After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Hive metastore cluster and it's dependencies.
Afterwards you can <<_verify_that_it_works, verify that it works>>.

== Setup

Expand Down
4 changes: 3 additions & 1 deletion docs/modules/hive/pages/getting_started/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
= Getting started
:description: Learn to set up Apache Hive with the Stackable Operator. Includes installation, dependencies, and creating a Hive metastore on Kubernetes.

This guide will get you started with Apache Hive using the Stackable Operator. It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance.
This guide will get you started with Apache Hive using the Stackable Operator.
It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance.

== Prerequisites

Expand Down
27 changes: 13 additions & 14 deletions docs/modules/hive/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
= Installation
:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow our guide for easy setup and configuration.

On this page you will install the Stackable Operator for Apache Hive and all required dependencies. For the installation
of the dependencies and operators you can use Helm or `stackablectl`.
On this page you will install the Stackable Operator for Apache Hive and all required dependencies.
For the installation of the dependencies and operators you can use Helm or `stackablectl`.

The `stackablectl` command line tool is the recommended way to interact with operators and dependencies. Follow the
xref:management:stackablectl:installation.adoc[installation steps] for your platform if you choose to work with
`stackablectl`.
The `stackablectl` command line tool is the recommended way to interact with operators and dependencies.
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform if you choose to work with `stackablectl`.

== Dependencies

First you need to install MinIO and PostgreSQL instances for the Hive metastore. PostgreSQL is required as a database
for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to.
First you need to install MinIO and PostgreSQL instances for the Hive metastore.
PostgreSQL is required as a database for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to.

There are two ways to install the dependencies:

Expand All @@ -21,9 +21,8 @@ WARNING: The dependency installations in this guide are only intended for testin

=== stackablectl

`stackablectl` was designed to install Stackable components, but its xref:management:stackablectl:commands/stack.adoc[Stacks]
feature can also be used to install arbitrary Helm Charts. You can install MinIO and PostgreSQL using the Stacks feature
as follows, but a simpler method via Helm is shown <<Helm, below>>.
`stackablectl` was designed to install Stackable components, but its xref:management:stackablectl:commands/stack.adoc[Stacks] feature can also be used to install arbitrary Helm Charts.
You can install MinIO and PostgreSQL using the Stacks feature as follows, but a simpler method via Helm is shown <<Helm, below>>.

[source,bash]
----
Expand Down Expand Up @@ -67,8 +66,8 @@ Now call `stackablectl` and reference those two files:
include::example$getting_started/getting_started.sh[tag=stackablectl-install-minio-postgres-stack]
----

This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators. You can now skip the
<<Stackable Operators>> step that follows next.
This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators.
You can now skip the <<Stackable Operators>> step that follows next.

TIP: Consult the xref:management:stackablectl:quickstart.adoc[Quickstart] to learn more about how to use `stackablectl`.

Expand Down Expand Up @@ -133,8 +132,8 @@ Then install the Stackable operators:
include::example$getting_started/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the
CRDs for the required operators). You are now ready to deploy the Apache Hive metastore in Kubernetes.
Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators).
You are now ready to deploy the Apache Hive metastore in Kubernetes.

== What's next

Expand Down
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
= Stackable Operator for Apache Hive
:description: The Stackable Operator for Apache Hive is a Kubernetes operator that can manage Apache Hive metastores. Learn about its features, resources, dependencies and demos, and see the list of supported Hive versions.
:description: Manage Apache Hive metastores on Kubernetes with the Stackable Operator. Integrates with Trino and Spark.
:keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query
:hive: https://hive.apache.org
:github: https://github.com/stackabletech/hive-operator/
Expand Down
4 changes: 3 additions & 1 deletion docs/modules/hive/pages/required-external-components.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
= Required external components
:description: Hive Metastore requires a SQL database. Supported options include MySQL, Postgres, Oracle, and MS SQL Server. Stackable Hive supports PostgreSQL by default.

The Hive Metastore requires a backend SQL database. Supported databases and versions are:
The Hive Metastore requires a backend SQL database.
Supported databases and versions are:

* MySQL 5.6.17 and above
* Postgres 9.1.13 and above
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Configuration & environment overrides
:description: Override Hive config properties and environment variables at role or role group levels. Customize hive-site.xml, security.properties, and environment vars.

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Expand All @@ -8,8 +9,8 @@ IMPORTANT: Overriding certain properties, which are set by the operator (such as

For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files:

- `hive-site.xml`
- `security.properties`
* `hive-site.xml`
* `security.properties`

For example, if you want to set the `datanucleus.connectionPool.maxPoolSize` for the metastore to 20 adapt the `metastore` section of the cluster resource like so:

Expand Down
1 change: 1 addition & 0 deletions docs/modules/hive/pages/usage-guide/data-storage.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Data storage backends
:description: Hive supports metadata storage on S3 and HDFS. Configure S3 with S3Connection and HDFS with configMap in clusterConfig.

Hive does not store data, only metadata. It can store metadata about data stored in various places. The Stackable Operator currently supports S3 and HFS.

Expand Down
3 changes: 2 additions & 1 deletion docs/modules/hive/pages/usage-guide/database-driver.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
= Database drivers
:description: Learn to configure Apache Hive with MySQL using Helm, PVCs, and custom images. Includes steps for driver setup and Hive cluster creation.

The Stackable product images for Apache Hive come with built-in support for using PostgreSQL as the metastore database.
The MySQL driver is not shipped in our images due to licensing issues.
The MySQL driver is not shipped in Stackable images due to licensing issues.
To use another supported database it is necessary to make the relevant drivers available to Hive: this tutorial shows how this is done for MySQL.

== Install the MySQL helm chart
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/hive/pages/usage-guide/derby-example.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

= Derby example
:description: Deploy a single-node Apache Hive Metastore with Derby or PostgreSQL. Includes setup for S3 integration and tips for database configuration.

Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown.
This Stackable version is the version of the underlying container image which is used to execute the processes.
Expand Down
4 changes: 3 additions & 1 deletion docs/modules/hive/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
= Usage guide
:page-aliases: usage.adoc

This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies.
This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways.
You should already be familiar with how to set up a basic instance.
Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies.
3 changes: 2 additions & 1 deletion docs/modules/hive/pages/usage-guide/listenerclass.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
= Service exposition with ListenerClasses

Apache Hive offers an API. The Operator deploys a service called `<name>` (where `<name>` is the name of the HiveCluster) through which Hive can be reached.
Apache Hive offers an API.
The Operator deploys a service called `<name>` (where `<name>` is the name of the HiveCluster) through which Hive can be reached.

This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`. Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level.

Expand Down
7 changes: 3 additions & 4 deletions docs/modules/hive/pages/usage-guide/logging.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Log aggregation
:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent.

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:
The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
Expand All @@ -14,5 +14,4 @@ spec:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:concepts:logging.adoc[].
Further information on how to configure logging, can be found in xref:concepts:logging.adoc[].
5 changes: 3 additions & 2 deletions docs/modules/hive/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Monitoring
:description: The managed Hive instances are automatically configured to export Prometheus metrics.

The managed Hive instances are automatically configured to export Prometheus metrics. See
xref:operators:monitoring.adoc[] for more details.
The managed Hive instances are automatically configured to export Prometheus metrics.
See xref:operators:monitoring.adoc[] for more details.
3 changes: 2 additions & 1 deletion docs/modules/hive/pages/usage-guide/resources.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Resource requests
:description: Set CPU and memory requests for Hive metastore in Kubernetes. Default values and customization options are provided for optimal resource management.

include::home:concepts:stackable_resource_requests.adoc[]

Expand Down Expand Up @@ -27,7 +28,7 @@ metastore:
memory: "512Mi"
----

The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:nightly@home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose.
The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose.

You can configure your own resource requests and limits by following the example above.

Expand Down
3 changes: 2 additions & 1 deletion docs/modules/hive/pages/usage-guide/security.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Security
:description: Secure Apache Hive with Kerberos authentication in Kubernetes. Configure Kerberos server, SecretClass, and access Hive securely with provided guides.

== Authentication
Currently, the only supported authentication mechanism is Kerberos, which is disabled by default.
Expand All @@ -17,7 +18,7 @@ The next step is to configure your HdfsCluster to use the newly created SecretCl
Please make sure to use the SecretClass named `kerberos`. It is also necessary to configure 2 additional things in HDFS:

* Define group mappings for users with `hadoop.user.group.static.mapping.overrides`
* Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive`´to impersonate all other users.
* Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive` to impersonate all other users.

An example of the above can be found in this https://github.com/stackabletech/hive-operator/blob/main/tests/templates/kuttl/kerberos-hdfs/30-install-hdfs.yaml.j2[integration test].

Expand Down

0 comments on commit b392ffa

Please sign in to comment.