There are currently three ways to use TiSpark on Python:
This is the simplest way, just a decent Spark environment should be enough.
-
Make sure you have the latest version of TiSpark and a
jar
with all TiSpark's dependencies. -
Run this command in your
SPARK_HOME
directory:
./bin/pyspark --jars /where-ever-it-is/tispark-${name_with_version}.jar
- To use TiSpark, run these commands:
# First we need to import py4j java_import module
# Along with spark
from py4j.java_gateway import java_import
from pyspark.context import SparkContext
# We get a reference to py4j Java Gateway
gw = SparkContext._gateway
java_import(gw.jvm, "org.apache.spark.sql.TiContext")
# Create a TiContext
ti = gw.jvm.TiContext(spark._jsparkSession)
# Map database
ti.tidbMapDatabase("tpch_test", False, True)
# Query as usual
sql("select count(*) from customer").show()
# Result
# +--------+
# |count(1)|
# +--------+
# | 150|
# +--------+
This way is generally the same as the first way, but more readable.
-
Use
pip install pytispark
in your console to installpytispark
-
Make sure you have the latest version of TiSpark and a
jar
with all TiSpark's dependencies. -
Run this command in your
SPARK_HOME
directory:
./bin/pyspark --jars /where-ever-it-is/tispark-${name_with_version}.jar
- Use as below:
import pytispark.pytispark as pti
ti = pti.TiContext(spark)
ti.tidbMapDatabase("tpch_test")
sql("select count(*) from customer").show()
# Result
# +--------+
# |count(1)|
# +--------+
# | 150|
# +--------+
This way is useful when you want to execute your own Python scripts.
- Create a Python file named
test.py
as below:
from pyspark.sql import SparkSession
import pytispark.pytispark as pti
spark = SparkSession.builder.master("Your master url").appName("Your app name").getOrCreate()
ti = pti.TiContext(spark)
ti.tidbMapDatabase("tpch")
spark.sql("select count(*) from customer").show()
- Prepare your TiSpark environment as above and execute
./bin/spark-submit --jars /where-ever-it-is/tispark-${name_with_version}.jar test.py
- Result:
+--------+
|count(1)|
+--------+
| 150|
+--------+
See pytispark for more information.