Skip to content
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ private[spark] class SparkSubmit extends Logging {
error("Cluster deploy mode is not applicable to Spark shells.")
case (_, CLUSTER) if isSqlShell(args.mainClass) =>
error("Cluster deploy mode is not applicable to Spark SQL shell.")
case (_, CLUSTER) if isThriftServer(args.mainClass) =>
case (_, CLUSTER) if (clusterManager != KUBERNETES) && isThriftServer(args.mainClass) =>
error("Cluster deploy mode is not applicable to Spark Thrift server.")
case _ =>
}
Expand Down
33 changes: 33 additions & 0 deletions docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,39 @@ RBAC authorization and how to configure Kubernetes service accounts for pods, pl
[Using RBAC Authorization](https://kubernetes.io/docs/admin/authorization/rbac/) and
[Configure Service Accounts for Pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/).

## Running Spark Thrift Server

Thrift JDBC/ODBC Server (aka Spark Thrift Server or STS) is Spark SQL’s port of Apache Hive’s HiveServer2 that allows
JDBC/ODBC clients to execute SQL queries over JDBC and ODBC protocols on Apache Spark.

### Spark deploy mode of Client

To start STS in client mode, excute the following command

$ sbin/start-thriftserver.sh \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>

### Spark deploy mode of Cluster

To start STS in cluster mode, excute the following command

$ sbin/start-thriftserver.sh \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster

The most basic workflow is to use the pod name (driver pod name incase of cluster mode and self pod name incase of client
mode, which can be found with kubectl get pods), and run kubectl port-forward spark-app-podname 31416:10000
(https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/#forward-a
-local-port-to-a-port-on-the-pod), which will automatically forward localhost:31416 to the pod's port 10000.
Any jdbc client can then be used to query jdbc:hive2://localhost:31416.

Alternatively, any other application on the cluster can simply use spark-app-podname:10000, which will be resolved by
kube-dns. For persistent external access one can run kubectl expose pod spark-app-podname --type=NodePort --port 10000
to create a Kubernetes Service which will accept connections on a particular port of every node on the cluster and send
them to the pod's port 10000.

Note that STS will not work with Spark dynamicAllocation as Spark Shuffle Service support is not yet available.

## Future Work

There are several Spark on Kubernetes features that are currently being worked on or planned to be worked on. Those features are expected to eventually make it into future versions of the spark-kubernetes integration.
Expand Down