Skip to content

Latest commit

 

History

History
102 lines (77 loc) · 2.44 KB

README_spark2.1.md

File metadata and controls

102 lines (77 loc) · 2.44 KB

TiSpark (version < 2.0) on PySpark:

Usage

There are currently three ways to use TiSpark on Python:

Directly via pyspark

This is the simplest way, just a decent Spark environment should be enough.

  1. Make sure you have the latest version of TiSpark and a jar with all TiSpark's dependencies.

  2. Run this command in your SPARK_HOME directory:

./bin/pyspark --jars /where-ever-it-is/tispark-${name_with_version}.jar
  1. To use TiSpark, run these commands:
# First we need to import py4j java_import module
# Along with spark
from py4j.java_gateway import java_import
from pyspark.context import SparkContext

# We get a reference to py4j Java Gateway
gw = SparkContext._gateway

java_import(gw.jvm, "org.apache.spark.sql.TiContext")
 
# Create a TiContext
ti = gw.jvm.TiContext(spark._jsparkSession)

# Map database
ti.tidbMapDatabase("tpch_test", False, True)

# Query as usual
sql("select count(*) from customer").show()

# Result
# +--------+
# |count(1)|
# +--------+
# |     150|
# +--------+

Via pip

This way is generally the same as the first way, but more readable.

  1. Use pip install pytispark in your console to install pytispark

  2. Make sure you have the latest version of TiSpark and a jar with all TiSpark's dependencies.

  3. Run this command in your SPARK_HOME directory:

./bin/pyspark --jars /where-ever-it-is/tispark-${name_with_version}.jar
  1. Use as below:
import pytispark.pytispark as pti

ti = pti.TiContext(spark)

ti.tidbMapDatabase("tpch_test")

sql("select count(*) from customer").show()

# Result
# +--------+
# |count(1)|
# +--------+
# |     150|
# +--------+

Via spark-submit

This way is useful when you want to execute your own Python scripts.

  1. Create a Python file named test.py as below:
from pyspark.sql import SparkSession
import pytispark.pytispark as pti

spark = SparkSession.builder.master("Your master url").appName("Your app name").getOrCreate()

ti = pti.TiContext(spark)

ti.tidbMapDatabase("tpch")

spark.sql("select count(*) from customer").show()
  1. Prepare your TiSpark environment as above and execute
./bin/spark-submit --jars /where-ever-it-is/tispark-${name_with_version}.jar test.py
  1. Result:
+--------+
|count(1)|
+--------+
|     150|
+--------+

See pytispark for more information.