-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc #12601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6be5046
db639a5
69f7c7b
88d181e
c44271e
0a98e45
cb9889e
d18efef
754b360
1d0d61c
c8e4143
f7f2615
95431e3
e8c2d7d
ae6ad8b
c387c17
57ac87e
de53734
379e00f
ea9d2fe
c686b0e
7ef7a48
c9dcdc4
447ab82
4a02c82
a238156
06c1cba
8fb86b4
724bbe2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ | |
| import java.util.Arrays; | ||
| import java.util.List; | ||
| // $example off:schema_merging$ | ||
| import java.util.Properties; | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, maybe, my previous comment was not clear. I meant import java.util.List;
// $example off:schema_merging$
import java.util.Properties;I haven't tried to build the doc against the current state of this PR but I guess we won't need this import for Parquet`s schema mering example.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HyukjinKwon Yes, that is what I was talking about...just fixed it back |
||
| // $example on:basic_parquet_example$ | ||
| import org.apache.spark.api.java.JavaRDD; | ||
|
|
@@ -235,13 +236,33 @@ private static void runJsonDatasetExample(SparkSession spark) { | |
|
|
||
| private static void runJdbcDatasetExample(SparkSession spark) { | ||
| // $example on:jdbc_dataset$ | ||
| // Note: JDBC loading and saving can be achieved via either the load/save or jdbc methods | ||
| // Loading data from a JDBC source | ||
| Dataset<Row> jdbcDF = spark.read() | ||
| .format("jdbc") | ||
| .option("url", "jdbc:postgresql:dbserver") | ||
| .option("dbtable", "schema.tablename") | ||
| .option("user", "username") | ||
| .option("password", "password") | ||
| .load(); | ||
|
|
||
| Properties connectionProperties = new Properties(); | ||
| connectionProperties.put("user", "username"); | ||
| connectionProperties.put("password", "password"); | ||
| Dataset<Row> jdbcDF2 = spark.read() | ||
| .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties); | ||
|
|
||
| // Saving data to a JDBC source | ||
| jdbcDF.write() | ||
| .format("jdbc") | ||
| .option("url", "jdbc:postgresql:dbserver") | ||
| .option("dbtable", "schema.tablename") | ||
| .option("user", "username") | ||
| .option("password", "password") | ||
| .save(); | ||
|
|
||
| jdbcDF2.write() | ||
| .jdbc("jdbc:postgresql:dbserver", "schema.tablename", connectionProperties); | ||
| // $example off:jdbc_dataset$ | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc | |
|
|
||
| import java.util.Properties | ||
|
|
||
| import org.apache.spark.sql.SQLContext | ||
| import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider} | ||
| import scala.collection.JavaConverters.mapAsJavaMapConverter | ||
|
|
||
| class JdbcRelationProvider extends RelationProvider with DataSourceRegister { | ||
| import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, SQLContext} | ||
| import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider} | ||
|
|
||
| class JdbcRelationProvider extends CreatableRelationProvider | ||
| with RelationProvider with DataSourceRegister { | ||
|
|
||
| override def shortName(): String = "jdbc" | ||
|
|
||
| /** Returns a new base relation with the given parameters. */ | ||
| override def createRelation( | ||
| sqlContext: SQLContext, | ||
| parameters: Map[String, String]): BaseRelation = { | ||
| val jdbcOptions = new JDBCOptions(parameters) | ||
| if (jdbcOptions.partitionColumn != null | ||
| && (jdbcOptions.lowerBound == null | ||
| || jdbcOptions.upperBound == null | ||
| || jdbcOptions.numPartitions == null)) { | ||
| sys.error("Partitioning incompletely specified") | ||
| } | ||
| val partitionColumn = jdbcOptions.partitionColumn | ||
| val lowerBound = jdbcOptions.lowerBound | ||
| val upperBound = jdbcOptions.upperBound | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason why this is removed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was moved into the JDBCOptions as had been previously discussed.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. uh, we do not have a test case to cover that. Since you made a change, could you add such a test case? Thanks!
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added. |
||
| val numPartitions = jdbcOptions.numPartitions | ||
|
|
||
| val partitionInfo = if (jdbcOptions.partitionColumn == null) { | ||
| val partitionInfo = if (partitionColumn == null) { | ||
| null | ||
| } else { | ||
| JDBCPartitioningInfo( | ||
| jdbcOptions.partitionColumn, | ||
| jdbcOptions.lowerBound.toLong, | ||
| jdbcOptions.upperBound.toLong, | ||
| jdbcOptions.numPartitions.toInt) | ||
| partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt) | ||
| } | ||
| val parts = JDBCRelation.columnPartition(partitionInfo) | ||
| val properties = new Properties() // Additional properties that we will pass to getConnection | ||
| parameters.foreach(kv => properties.setProperty(kv._1, kv._2)) | ||
| JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession) | ||
| } | ||
|
|
||
| /* | ||
| * The following structure applies to this code: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does this table mean? what is
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now, at least, three of reviewers are confused of this bit. Do you mind if I submit a PR to clean up this part?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the table does not exist and the mode is
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also took a look at @gatorsmile 's approach, I think it's easier to understand, why it's rejected? We can also get rid of the |
||
| * | tableExists | !tableExists | ||
| *------------------------------------------------------------------------------------ | ||
| * Ignore | BaseRelation | CreateTable, saveTable, BaseRelation | ||
| * ErrorIfExists | ERROR | CreateTable, saveTable, BaseRelation | ||
| * Overwrite* | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation | ||
| * | saveTable, BaseRelation | | ||
| * Append | saveTable, BaseRelation | CreateTable, saveTable, BaseRelation | ||
| * | ||
| * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate | ||
| */ | ||
| override def createRelation( | ||
| sqlContext: SQLContext, | ||
| mode: SaveMode, | ||
| parameters: Map[String, String], | ||
| data: DataFrame): BaseRelation = { | ||
| val jdbcOptions = new JDBCOptions(parameters) | ||
| val url = jdbcOptions.url | ||
| val table = jdbcOptions.table | ||
|
|
||
| val props = new Properties() | ||
| props.putAll(parameters.asJava) | ||
| val conn = JdbcUtils.createConnectionFactory(url, props)() | ||
|
|
||
| try { | ||
| val tableExists = JdbcUtils.tableExists(conn, url, table) | ||
|
|
||
| val (doCreate, doSave) = (mode, tableExists) match { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Initially, I meant to correct this as @gatorsmile did in here. I am not saying this is wrong or inappropriate but just personally I'd prefer this way.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also prefer to my way, which looks cleaner and easier to understand.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your way results in the need for a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I am fine, if the other are ok about it. Let me review your version.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then would it make sense if we add some comments for each case? In a quick look, it seems really confusing what each case means to me.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did add a comment in the method signature. That and the variable naming conventions should cover this. |
||
| case (SaveMode.Ignore, true) => (false, false) | ||
| case (SaveMode.ErrorIfExists, true) => throw new AnalysisException( | ||
| s"Table or view '$table' already exists, and SaveMode is set to ErrorIfExists.") | ||
| case (SaveMode.Overwrite, true) => | ||
| if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) { | ||
| JdbcUtils.truncateTable(conn, table) | ||
| (false, true) | ||
| } else { | ||
| JdbcUtils.dropTable(conn, table) | ||
| (true, true) | ||
| } | ||
| case (SaveMode.Append, true) => (false, true) | ||
| case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," + | ||
| " for handling existing tables.") | ||
| case (_, false) => (true, true) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I think this combinations of booleans might be a bit confusing. It might be better if they have some variables so that we can understand what each
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not 100% sure I get what you mean, as the booleans map directly to a variable. The only thing I can think of beyond using a var and setting the variables directly (ugly) is to create a |
||
| } | ||
|
|
||
| if (doCreate) { | ||
| val schema = JdbcUtils.schemaString(data, url) | ||
| // To allow certain options to append when create a new table, which can be | ||
| // table_options or partition_options. | ||
| // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" | ||
| val createtblOptions = jdbcOptions.createTableOptions | ||
| val sql = s"CREATE TABLE $table ($schema) $createtblOptions" | ||
| val statement = conn.createStatement | ||
| try { | ||
| statement.executeUpdate(sql) | ||
| } finally { | ||
| statement.close() | ||
| } | ||
| } | ||
| if (doSave) JdbcUtils.saveTable(data, url, table, props) | ||
| } finally { | ||
| conn.close() | ||
| } | ||
|
|
||
| createRelation(sqlContext, parameters) | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put
java.util.Listandjava.util.Propertiesimports together without additional newline. It seems you already know but just in case - imports.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be added to the example, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to not follow the guildline?