[SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect #25287

teeyog · 2019-07-29T11:24:10Z

What changes were proposed in this pull request?

This pr proposes to be case insensitive when matching dialects via jdbc url prefix.

When I use jdbc url such as: jdbc: MySQL://localhost/db to query data through sparksql, the result is wrong, but MySQL supports such url writing.

because sparksql matches MySQLDialect by prefix jdbc:mysql, so jdbc: MySQL is not matched with the correct dialect. Therefore, it should be case insensitive when identifying the corresponding dialect through jdbc url

https://issues.apache.org/jira/browse/SPARK-28552

How was this patch tested?

UT.

gatorsmile · 2019-07-31T00:25:31Z

ok to test

gatorsmile · 2019-07-31T00:26:18Z

cc @maropu

maropu · 2019-07-31T01:47:17Z

@teeyog hi, thanks for your first contribution! btw, can you fix this in a more general way? Probably, can we lowercase a prefix in the JdbcDialect side? I think its a bit troublesome to add the logic for lowercases in each dialect...

Also, plz add tests in JDBCSuite.

maropu · 2019-07-31T01:56:43Z

And, could you please make the PR description clearer as much as possible to make the other reviewers understood easily... we don't have a rigid format for that, but I'ld like you to clearly describe what you propose in this pr? e.g., "This pr proposes to ..."

If this pr merged, the description will be included in a commit log, so your kind cooperation makes the commit logs more readable.

SparkQA · 2019-07-31T05:09:13Z

Test build #108426 has finished for PR 25287 at commit 02f5cf9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

teeyog · 2019-07-31T06:44:49Z

@maropu Thank you very much for your guidance. I will improve this PR description. In addition, you suggest that URL be converted to lowercase only through JdbcDialect without modifying the logic of dialects. I have not thought of a better way to implement it, or to control the incoming url, which is already lowercase, but I don't think that's very good.

SparkQA · 2019-07-31T11:28:31Z

Test build #108454 has finished for PR 25287 at commit fa6b8a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-08-01T01:41:33Z

For example,

@DeveloperApi
@Evolving
abstract class JdbcDialect extends Serializable {

  def urlName: String

  /**
   * Check if this dialect instance can handle a certain jdbc url.
   * @param url the jdbc url.
   * @return True if the dialect can be applied on the given jdbc url.
   * @throws NullPointerException if the url is null.
   */
  def canHandle(url : String): Boolean = {
    url.toLowerCase(Locale.ROOT).startsWith(s"jdbc:$urlName")
  }
...
}

private object PostgresDialect extends JdbcDialect {
  override val urlName: String = "postgresql"
...
}

?

…lect

teeyog · 2019-08-01T07:26:09Z

@maropu Thank you for your example, I misunderstood what you mean, I thought you told me not to modify the logic of other dialects.

SparkQA · 2019-08-01T07:34:33Z

Test build #108507 has finished for PR 25287 at commit c89c859.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-01T07:58:19Z

Test build #108508 has finished for PR 25287 at commit 0db9f75.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-01T08:14:53Z

Test build #108510 has finished for PR 25287 at commit 7fc974e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-01T08:40:56Z

Test build #108511 has finished for PR 25287 at commit 3b7a689.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

teeyog · 2019-08-01T09:44:15Z

@maropu Hi, can you help me find out what caused this error?

[info] Packaging /home/jenkins/workspace/SparkPullRequestBuilder@2/examples/target/scala-2.12/spark-examples_2.12-3.0.0-SNAPSHOT.jar ...
[info] Done packaging.
[error]  * abstract method dbTag()java.lang.String in class org.apache.spark.sql.jdbc.JdbcDialect is present only in current version
[error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.jdbc.JdbcDialect.dbTag")
java.lang.RuntimeException: spark-sql: Binary compatibility check failed!
	at scala.sys.package$.error(package.scala:27)
	at com.typesafe.tools.mima.plugin.SbtMima$.reportModuleErrors(SbtMima.scala:83)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7$$anonfun$apply$2.apply(MimaPlugin.scala:68)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7$$anonfun$apply$2.apply(MimaPlugin.scala:59)
	at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7.apply(MimaPlugin.scala:59)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7.apply(MimaPlugin.scala:44)
	at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
	at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
	at sbt.std.Transform$$anon$4.work(System.scala:63)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
	at sbt.Execute.work(Execute.scala:237)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
	at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[error] (sql/*:mimaReportBinaryIssues) spark-sql: Binary compatibility check failed!
[error] Total time: 32 s, completed Aug 1, 2019 1:40:55 AM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/mima -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos ; received return code 1

maropu · 2019-08-01T12:32:19Z

Can you update project/MimaExcludes.scala, too?

…lect

SparkQA · 2019-08-02T02:03:24Z

Test build #108537 has finished for PR 25287 at commit 530741c.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-02T16:20:12Z

jdbc: MySQL:/... isn't a valid URI, is it?

maropu · 2019-10-01T08:09:08Z

ping @teeyog

…lect

SparkQA · 2019-11-01T07:13:25Z

Test build #113078 has finished for PR 25287 at commit 3e58b81.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-11-01T11:00:55Z

Test build #113079 has finished for PR 25287 at commit ac70da4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

I'm OK with it. I looked it up and technically URI schemes are meant to be case insensitive, so that's a good argument for this change.

maropu · 2019-11-04T02:09:59Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala


  val testH2DialectTinyInt = new JdbcDialect {
-    override def canHandle(url: String): Boolean = url.startsWith("jdbc:h2")
+    override def canHandle(url: String) : Boolean = url.startsWith("jdbc:h2")


nit: revert this.

maropu · 2019-11-04T02:10:23Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

      "`NONE`, `READ_UNCOMMITTED`, `READ_COMMITTED`, `REPEATABLE_READ` or `SERIALIZABLE`."))
  }
+
+  test("SPARK-28552: Check whether a dialect instance can be applied on the given jdbc url") {


You can simplify the tests below;

test("SPARK-28552: Case-insensitive database URLs in JdbcDialect") { assert(JdbcDialects.get("jdbc:mysql://localhost/db") === MySQLDialect) assert(JdbcDialects.get("jdbc:MySQL://localhost/db") === MySQLDialect) assert(JdbcDialects.get("jdbc:postgresql://localhost/db") === PostgresDialect) assert(JdbcDialects.get("jdbc:postGresql://localhost/db") === PostgresDialect) assert(JdbcDialects.get("jdbc:db2://localhost/db") === DB2Dialect) assert(JdbcDialects.get("jdbc:DB2://localhost/db") === DB2Dialect) assert(JdbcDialects.get("jdbc:sqlserver://localhost/db") === MsSqlServerDialect) assert(JdbcDialects.get("jdbc:sqlServer://localhost/db") === MsSqlServerDialect) assert(JdbcDialects.get("jdbc:derby://localhost/db") === DerbyDialect) assert(JdbcDialects.get("jdbc:derBy://localhost/db") === DerbyDialect) assert(JdbcDialects.get("jdbc:oracle://localhost/db") === OracleDialect) assert(JdbcDialects.get("jdbc:Oracle://localhost/db") === OracleDialect) assert(JdbcDialects.get("jdbc:teradata://localhost/db") === TeradataDialect) assert(JdbcDialects.get("jdbc:Teradata://localhost/db") === TeradataDialect) }

maropu

Could you check my comments before merging?

teeyog · 2019-11-04T03:39:58Z

Could you check my comments before merging?

ok,thanks

SparkQA · 2019-11-04T08:05:02Z

Test build #113195 has finished for PR 25287 at commit 3e1585b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

…l_dialect

SparkQA · 2019-11-04T12:33:53Z

Test build #113196 has finished for PR 25287 at commit 1883847.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-11-04T23:16:12Z

Thanks for your first contribution, @teeyog! Merged to master.

maropu · 2019-11-04T23:19:42Z

FYI: Added @teeyog in the Spark contributor list.

teeyog · 2019-11-05T01:30:32Z

Thanks! @maropu @srowen

jdbc dialect can match non-lowercase URL prefixes

02f5cf9

dongjoon-hyun added the SQL label Jul 29, 2019

jdbc dialect matching is not case sensitive

fa6b8a4

yong.tian1 added 2 commits August 1, 2019 15:14

Merge branch 'master' of https://github.com/apache/spark into sql_dia…

2d52578

…lect

jdbc dialect matching is not case sensitive

c89c859

jdbc dialect matching is not case sensitive

0db9f75

yong.tian1 added 2 commits August 1, 2019 16:05

jdbc dialect matching is not case sensitive

7591113

jdbc dialect matching is not case sensitive

7fc974e

jdbc dialect matching is not case sensitive

3b7a689

yong.tian1 added 2 commits August 2, 2019 09:45

jdbc dialect matching is not case sensitive

0334680

Merge branch 'master' of https://github.com/apache/spark into sql_dia…

530741c

…lect

Merge branch 'master' of https://github.com/apache/spark into sql_dia…

5913568

…lect

yong.tian1 added 2 commits November 1, 2019 15:02

fix conflicts

14814ef

fix conflict

3e58b81

fix conflict

ac70da4

teeyog requested a review from srowen November 4, 2019 01:38

srowen approved these changes Nov 4, 2019

View reviewed changes

maropu reviewed Nov 4, 2019

View reviewed changes

maropu changed the title ~~[SPARK-28552][SQL]Identification of different dialects insensitive to case by JDBC URL prefix~~ [SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect Nov 4, 2019

maropu requested changes Nov 4, 2019

View reviewed changes

yong.tian1 added 4 commits November 4, 2019 14:17

Merge branch 'master' of https://github.com/apache/spark

e08c95f

update

1f8b514

update

68cdff8

update

3e1585b

maropu approved these changes Nov 4, 2019

View reviewed changes

yong.tian1 added 4 commits November 4, 2019 16:31

Merge branch 'master' of https://github.com/apache/spark

2dba4b1

update

0f5f776

update

6f855b1

Merge branch 'sql_dialect' of https://github.com/teeyog/spark into sq…

1883847

…l_dialect

teeyog requested review from maropu and srowen November 4, 2019 13:05

maropu closed this in 04536b2 Nov 4, 2019

[SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect #25287

[SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect #25287

Uh oh!

Conversation

teeyog commented Jul 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Jul 31, 2019

Uh oh!

gatorsmile commented Jul 31, 2019

Uh oh!

maropu commented Jul 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Jul 31, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

teeyog commented Jul 31, 2019

Uh oh!

SparkQA commented Jul 31, 2019

Uh oh!

maropu commented Aug 1, 2019

Uh oh!

teeyog commented Aug 1, 2019

Uh oh!

SparkQA commented Aug 1, 2019

Uh oh!

SparkQA commented Aug 1, 2019

Uh oh!

SparkQA commented Aug 1, 2019

Uh oh!

SparkQA commented Aug 1, 2019

Uh oh!

teeyog commented Aug 1, 2019

Uh oh!

maropu commented Aug 1, 2019

Uh oh!

SparkQA commented Aug 2, 2019

Uh oh!

srowen commented Aug 2, 2019

Uh oh!

maropu commented Oct 1, 2019

Uh oh!

SparkQA commented Nov 1, 2019

Uh oh!

SparkQA commented Nov 1, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

maropu Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

maropu Nov 4, 2019

Choose a reason for hiding this comment

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

teeyog commented Nov 4, 2019

Uh oh!

SparkQA commented Nov 4, 2019

Uh oh!

SparkQA commented Nov 4, 2019

Uh oh!

maropu commented Nov 4, 2019

Uh oh!

maropu commented Nov 4, 2019

Uh oh!

teeyog commented Nov 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

teeyog commented Jul 29, 2019 •

edited

Loading

maropu commented Jul 31, 2019 •

edited

Loading