Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DataFrameWriter & DataFrameReader? #229

Closed
melin opened this issue Apr 19, 2023 · 10 comments
Closed

Support DataFrameWriter & DataFrameReader? #229

melin opened this issue Apr 19, 2023 · 10 comments

Comments

@melin
Copy link

melin commented Apr 19, 2023

提供的实例都是都是注册catalog,通过sql 读写ck。支持DataFrameWriter & DataFrameReader api 方式读写数据? @pan3793

@pan3793
Copy link
Collaborator

pan3793 commented Apr 19, 2023

No, because this connector does not implement TableProvider

@melin
Copy link
Author

melin commented Apr 20, 2023

Is there a plan to support it?

我的这个项目想通过df api 集成进来: https://github.com/melin/datatunnel

@pan3793
Copy link
Collaborator

pan3793 commented Apr 20, 2023

in low priority

@durgeksh
Copy link

Sorry this question is out of topic. What is the use of this connector? I want to connect Clickhouse with Spark 3.4 in Python. Do I need to use this connector? If YES then what is the class of driver.
Thank you for wonderful work!

@pan3793
Copy link
Collaborator

pan3793 commented May 25, 2023

@durgeksh there are quick start demos for spark-sql and spark-shell, pyspark should be similar.
https://housepower.github.io/spark-clickhouse-connector/quick_start/03_play_with_spark_shell/

@durgeksh
Copy link

durgeksh commented May 25, 2023

Thanks for reply, but i see, there is no documentation for using this connector in pyspark. When I checked to use "clickhouse-jdbc-0.4.6-shaded.jar" they use driver attribute with Class of the driver as given below. How can I use this connector in pyspark?
driver = "com.clickhouse.jdbc.ClickHouseDriver"

@pan3793
Copy link
Collaborator

pan3793 commented May 25, 2023

If you run and compare the outputs of pyspark --help and spark-shell --help, they are similar. it should be easy to translate the spark-shell demo to the pyspark one.

I'm not a Pythoneer, so the document I write does not include the PySpark demo.

This is a connector (kind of a plugin) of Spark, a plugin does not have the responsibility to teach how to use the main framework, please read the Spark docs, it provides snippets for different languages for each case.

@durgeksh
Copy link

You are right. But, I am using pyspark in python program and not in shell. Thank you. How shall I use this connector in the python program?

@camper42
Copy link
Contributor

camper42 commented Aug 8, 2023

@durgeksh just add a clickhouse catalog, and use spark.sql()

example spark-deafult.conf in our cluster

spark.sql.catalog.ck-1 xenon.clickhouse.ClickHouseCatalog
spark.sql.catalog.ck-1.host HOST
spark.sql.catalog.ck-1.protocol http

read & write

df = spark.sql("select * from `ck-1`.db.table")
df.writeTo(`ck-1`.db.table2`).append()

@durgeksh
Copy link

@durgeksh just add a clickhouse catalog, and use spark.sql()

example spark-deafult.conf in our cluster

spark.sql.catalog.ck-1 xenon.clickhouse.ClickHouseCatalog
spark.sql.catalog.ck-1.host HOST
spark.sql.catalog.ck-1.protocol http

read & write

df = spark.sql("select * from `ck-1`.db.table")
df.writeTo(`ck-1`.db.table2`).append()

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants