Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHClientException: Unknown cluster: {cluster} #362

Open
maver1ck opened this issue Oct 4, 2024 · 0 comments
Open

CHClientException: Unknown cluster: {cluster} #362

maver1ck opened this issue Oct 4, 2024 · 0 comments

Comments

@maver1ck
Copy link

maver1ck commented Oct 4, 2024

Hi,
I'm trying to query data on my clickhouse cluster.
Execution of query returns CHClientException.

Py4JJavaError: An error occurred while calling o43.sql.
: com.clickhouse.spark.exception.CHClientException:  [-1] Unknown cluster: {cluster}
	at com.clickhouse.spark.spec.TableEngineUtils$.$anonfun$resolveTableCluster$2(TableEngineUtils.scala:34)
	at scala.Option.getOrElse(Option.scala:189)
	at com.clickhouse.spark.spec.TableEngineUtils$.resolveTableCluster(TableEngineUtils.scala:34)
	at com.clickhouse.spark.ClickHouseCatalog.loadTable(ClickHouseCatalog.scala:145)
	at com.clickhouse.spark.ClickHouseCatalog.loadTable(ClickHouseCatalog.scala:44)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.getTable(CatalogV2Util.scala:363)
	at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:337)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$resolveRelation$5(Analyzer.scala:1315)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$resolveRelation$1(Analyzer.scala:1311)
	at scala.Option.orElse(Option.scala:447)

This is table definitions:

CREATE TABLE telemetry.reference_peaks_shard ON CLUSTER '{cluster}'
(
    hash_1 String,
    hash_2 String,
    ts DateTime,
    offset Int32,
    channel Int32,
    station_id String,
    tts DateTime,
    batchid Int32,
    org_ts DateTime
)
ENGINE = MergeTree
ORDER BY ts;

CREATE TABLE telemetry.reference_peaks ON CLUSTER '{cluster}' AS telemetry.reference_peaks_shard
ENGINE = Distributed('{cluster}', 'telemetry', 'reference_peaks_shard', channel);

Spark Code:

from pyspark.sql import SparkSession

# Set up the SparkSession to include ClickHouse as a custom catalog
spark = SparkSession.builder \
    .appName("ClickHouse Catalog Example") \
    .config("spark.jars.packages", "com.clickhouse.spark:clickhouse-spark-runtime-3.5_2.12:0.8.0,com.clickhouse:clickhouse-jdbc:0.6.5,org.apache.httpcomponents.client5:httpclient5:5.3.1") \
    .config("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog") \
    .config("spark.sql.catalog.clickhouse.host", "clickhouse-kim.clickhouse.svc") \
    .config("spark.sql.catalog.clickhouse.http_port", "8123") \
    .config("spark.sql.catalog.clickhouse.database", "telemetry") \
    .config("spark.sql.catalog.clickhouse.driver", "com.clickhouse.jdbc.ClickHouseDriver") \
    .config("spark.sql.catalog.clickhouse.user", "admin") \
    .config("spark.sql.catalog.clickhouse.password", "admin") \
    .getOrCreate()

spark.sql("select * from clickhouse.telemetry.reference_peaks where channel = 1").count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant