Skip to content

[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment#18724

Closed
LucaCanali wants to merge 4 commits intoapache:masterfrom
LucaCanali:JDBC_datasource_sessionInitStatement
Closed

[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment#18724
LucaCanali wants to merge 4 commits intoapache:masterfrom
LucaCanali:JDBC_datasource_sessionInitStatement

Conversation

@LucaCanali
Copy link
Copy Markdown
Contributor

Add an option to the JDBC data source to initialize the environment of the remote database session

What changes were proposed in this pull request?

This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).

See also https://issues.apache.org/jira/browse/SPARK-21519

How was this patch tested?

Manually tested using Spark SQL data source and Oracle JDBC

@gatorsmile
Copy link
Copy Markdown
Member

test this please

case "SERIALIZABLE" => Connection.TRANSACTION_SERIALIZABLE
}
// An option to execute custom SQL before fetching data from the remote DB
val sessionInitStatement = parameters.getOrElse(JDBC_SESSION_INIT_STATEMENT, "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use parameters.get

@gatorsmile
Copy link
Copy Markdown
Member

Could you add a test case to JDBCSuite?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 31, 2017

Test build #80065 has finished for PR 18724 at commit 92d082a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

ping @LucaCanali

@LucaCanali
Copy link
Copy Markdown
Contributor Author

Thank you very much @gatorsmile for the review. I plan to provide the required changes and add a test case, however it is probably going to take one more week before I can do that.

@gatorsmile
Copy link
Copy Markdown
Member

Thanks for your time!

@gatorsmile
Copy link
Copy Markdown
Member

ok to test

}

test("SPARK-21519: option sessionInitStatement, run SQL to initialize the database session.") {
val initSQL = "SET @MYTESTVAR 21519"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case for more than one statements?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 10, 2017

Test build #80505 has finished for PR 18724 at commit 0a0ff0a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.load()
assert(df1.collect() === Array(Row(21519)))

val initSQL2 = "SET SCHEMA DUMMY"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might not explain it clearly. Is that possible we can have a test case to send more than one statements in a single session initialization? Now these two examples have only one statement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I have now added a test that runs 2 SQL statements.
For future reference I'd like to stress the fact that the code executed by the option "sessionInitStatement" is just the user-provided string fed through the execute method of the JDBC connection, so it can use the features of the target database language/syntax. In the case of the test I wrote for the H2 database I have just put together two commands separated by ";". When using sessionInitStatement for querying Oracle, for example, the user-provided command can be a SQL statemnet or a PL/SQL block grouping multiple commands and logic.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 11, 2017

Test build #80507 has finished for PR 18724 at commit 55e63a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

LGTM

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 11, 2017

Test build #80539 has finished for PR 18724 at commit 5792fd6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 0377338 Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants