Skip to content

Conversation

@teeyog
Copy link
Contributor

@teeyog teeyog commented Jul 29, 2019

What changes were proposed in this pull request?

This pr proposes to be case insensitive when matching dialects via jdbc url prefix.

When I use jdbc url such as: jdbc: MySQL://localhost/db to query data through sparksql, the result is wrong, but MySQL supports such url writing.

because sparksql matches MySQLDialect by prefix jdbc:mysql, so jdbc: MySQL is not matched with the correct dialect. Therefore, it should be case insensitive when identifying the corresponding dialect through jdbc url

https://issues.apache.org/jira/browse/SPARK-28552

How was this patch tested?

UT.

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

cc @maropu

@maropu
Copy link
Member

maropu commented Jul 31, 2019

@teeyog hi, thanks for your first contribution! btw, can you fix this in a more general way? Probably, can we lowercase a prefix in the JdbcDialect side? I think its a bit troublesome to add the logic for lowercases in each dialect...

Also, plz add tests in JDBCSuite.

@maropu
Copy link
Member

maropu commented Jul 31, 2019

And, could you please make the PR description clearer as much as possible to make the other reviewers understood easily... we don't have a rigid format for that, but I'ld like you to clearly describe what you propose in this pr? e.g., "This pr proposes to ..."

If this pr merged, the description will be included in a commit log, so your kind cooperation makes the commit logs more readable.

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108426 has finished for PR 25287 at commit 02f5cf9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@teeyog
Copy link
Contributor Author

teeyog commented Jul 31, 2019

@maropu Thank you very much for your guidance. I will improve this PR description. In addition, you suggest that URL be converted to lowercase only through JdbcDialect without modifying the logic of dialects. I have not thought of a better way to implement it, or to control the incoming url, which is already lowercase, but I don't think that's very good.

@SparkQA
Copy link

SparkQA commented Jul 31, 2019

Test build #108454 has finished for PR 25287 at commit fa6b8a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 1, 2019

For example,

@DeveloperApi
@Evolving
abstract class JdbcDialect extends Serializable {

  def urlName: String

  /**
   * Check if this dialect instance can handle a certain jdbc url.
   * @param url the jdbc url.
   * @return True if the dialect can be applied on the given jdbc url.
   * @throws NullPointerException if the url is null.
   */
  def canHandle(url : String): Boolean = {
    url.toLowerCase(Locale.ROOT).startsWith(s"jdbc:$urlName")
  }
...
}

private object PostgresDialect extends JdbcDialect {
  override val urlName: String = "postgresql"
...
}

?

@teeyog
Copy link
Contributor Author

teeyog commented Aug 1, 2019

@maropu Thank you for your example, I misunderstood what you mean, I thought you told me not to modify the logic of other dialects.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108507 has finished for PR 25287 at commit c89c859.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108508 has finished for PR 25287 at commit 0db9f75.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108510 has finished for PR 25287 at commit 7fc974e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108511 has finished for PR 25287 at commit 3b7a689.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@teeyog
Copy link
Contributor Author

teeyog commented Aug 1, 2019

@maropu Hi, can you help me find out what caused this error?

[info] Packaging /home/jenkins/workspace/SparkPullRequestBuilder@2/examples/target/scala-2.12/spark-examples_2.12-3.0.0-SNAPSHOT.jar ...
[info] Done packaging.
[error]  * abstract method dbTag()java.lang.String in class org.apache.spark.sql.jdbc.JdbcDialect is present only in current version
[error]    filter with: ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.jdbc.JdbcDialect.dbTag")
java.lang.RuntimeException: spark-sql: Binary compatibility check failed!
	at scala.sys.package$.error(package.scala:27)
	at com.typesafe.tools.mima.plugin.SbtMima$.reportModuleErrors(SbtMima.scala:83)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7$$anonfun$apply$2.apply(MimaPlugin.scala:68)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7$$anonfun$apply$2.apply(MimaPlugin.scala:59)
	at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7.apply(MimaPlugin.scala:59)
	at com.typesafe.tools.mima.plugin.MimaPlugin$$anonfun$mimaReportSettings$7.apply(MimaPlugin.scala:44)
	at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
	at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
	at sbt.std.Transform$$anon$4.work(System.scala:63)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
	at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
	at sbt.Execute.work(Execute.scala:237)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
	at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
	at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[error] (sql/*:mimaReportBinaryIssues) spark-sql: Binary compatibility check failed!
[error] Total time: 32 s, completed Aug 1, 2019 1:40:55 AM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/mima -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos ; received return code 1

@maropu
Copy link
Member

maropu commented Aug 1, 2019

Can you update project/MimaExcludes.scala, too?

@SparkQA
Copy link

SparkQA commented Aug 2, 2019

Test build #108537 has finished for PR 25287 at commit 530741c.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 2, 2019

jdbc: MySQL:/... isn't a valid URI, is it?

@maropu
Copy link
Member

maropu commented Oct 1, 2019

ping @teeyog

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113078 has finished for PR 25287 at commit 3e58b81.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113079 has finished for PR 25287 at commit ac70da4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@teeyog teeyog requested a review from srowen November 4, 2019 01:38
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it. I looked it up and technically URI schemes are meant to be case insensitive, so that's a good argument for this change.


val testH2DialectTinyInt = new JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:h2")
override def canHandle(url: String) : Boolean = url.startsWith("jdbc:h2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert this.

"`NONE`, `READ_UNCOMMITTED`, `READ_COMMITTED`, `REPEATABLE_READ` or `SERIALIZABLE`."))
}

test("SPARK-28552: Check whether a dialect instance can be applied on the given jdbc url") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the tests below;

  test("SPARK-28552: Case-insensitive database URLs in JdbcDialect") {
    assert(JdbcDialects.get("jdbc:mysql://localhost/db") === MySQLDialect)
    assert(JdbcDialects.get("jdbc:MySQL://localhost/db") === MySQLDialect)
    assert(JdbcDialects.get("jdbc:postgresql://localhost/db") === PostgresDialect)
    assert(JdbcDialects.get("jdbc:postGresql://localhost/db") === PostgresDialect)
    assert(JdbcDialects.get("jdbc:db2://localhost/db") === DB2Dialect)
    assert(JdbcDialects.get("jdbc:DB2://localhost/db") === DB2Dialect)
    assert(JdbcDialects.get("jdbc:sqlserver://localhost/db") === MsSqlServerDialect)
    assert(JdbcDialects.get("jdbc:sqlServer://localhost/db") === MsSqlServerDialect)
    assert(JdbcDialects.get("jdbc:derby://localhost/db") === DerbyDialect)
    assert(JdbcDialects.get("jdbc:derBy://localhost/db") === DerbyDialect)
    assert(JdbcDialects.get("jdbc:oracle://localhost/db") === OracleDialect)
    assert(JdbcDialects.get("jdbc:Oracle://localhost/db") === OracleDialect)
    assert(JdbcDialects.get("jdbc:teradata://localhost/db") === TeradataDialect)
    assert(JdbcDialects.get("jdbc:Teradata://localhost/db") === TeradataDialect)
  }

@maropu maropu changed the title [SPARK-28552][SQL]Identification of different dialects insensitive to case by JDBC URL prefix [SPARK-28552][SQL] Case-insensitive database URLs in JdbcDialect Nov 4, 2019
Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check my comments before merging?

@teeyog
Copy link
Contributor Author

teeyog commented Nov 4, 2019

Could you check my comments before merging?

ok,thanks

@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113195 has finished for PR 25287 at commit 3e1585b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113196 has finished for PR 25287 at commit 1883847.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@teeyog teeyog requested review from maropu and srowen November 4, 2019 13:05
@maropu maropu closed this in 04536b2 Nov 4, 2019
@maropu
Copy link
Member

maropu commented Nov 4, 2019

Thanks for your first contribution, @teeyog! Merged to master.

@maropu
Copy link
Member

maropu commented Nov 4, 2019

FYI: Added @teeyog in the Spark contributor list.

@teeyog
Copy link
Contributor Author

teeyog commented Nov 5, 2019

Thanks! @maropu @srowen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.