Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 11, 2019

What changes were proposed in this pull request?

In the PR, I propose new implementation of fromDayTimeString which strictly parses strings in day-time formats to intervals. New implementation accepts only strings that match to a pattern defined by the from and to. Here is the mapping of user's bounds and patterns:

  • [+|-]D+ H[H]:m[m]:s[s][.SSSSSSSSS] for DAY TO SECOND
  • [+|-]D+ H[H]:m[m] for DAY TO MINUTE
  • [+|-]D+ H[H] for DAY TO HOUR
  • [+|-]H[H]:m[m]s[s][.SSSSSSSSS] for HOUR TO SECOND
  • [+|-]H[H]:m[m] for HOUR TO MINUTE
  • [+|-]m[m]:s[s][.SSSSSSSSS] for MINUTE TO SECOND

Closes #26327
Closes #26358

Why are the changes needed?

  • Improve user experience with Spark SQL, and respect to the bound specified by users.
  • Behave the same as other broadly used DBMS - Oracle and MySQL.

Does this PR introduce any user-facing change?

Yes, before:

spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
interval 1 weeks 3 days 11 hours 12 minutes

After:

spark-sql> SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE;
Error in query:
requirement failed: Interval string must match day-time format of '^(?<sign>[+|-])?(?<hour>\d{1,2}):(?<minute>\d{1,2})$': 10 11:12:13.123(line 1, pos 16)

== SQL ==
SELECT INTERVAL '10 11:12:13.123' HOUR TO MINUTE
----------------^^^

How was this patch tested?

  • Added tests to IntervalUtilsSuite
  • By ExpressionParserSuite
  • Updated literals.sql

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 11, 2019

@cloud-fan I cannot easily integrate old implementation to new one because of #26327 (comment). What I am considering is to just keep old implementation of fromDayTimeString and rename it to let's say to legacyFromDayTimeString. And to introduce a SQL flag to fallback to the old implementation. WDYT?

@SparkQA
Copy link

SparkQA commented Nov 12, 2019

Test build #113600 has finished for PR 26473 at commit e88c8e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

check("-12:40", HOUR, MINUTE, "-12 hours -40 minutes")
checkFail("5 12:40", HOUR, MINUTE, "must match day-time format")

check("12:40:30.999999999", HOUR, SECOND, "12 hours 40 minutes 30.999999 seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test 12:40:30.0123456789 as well?

@cloud-fan
Copy link
Contributor

+1 to keep the old implementation and add a legacy config to fallback

@SparkQA
Copy link

SparkQA commented Nov 12, 2019

Test build #113624 has finished for PR 26473 at commit 988d532.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2019

Test build #113628 has finished for PR 26473 at commit 58ba7b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2019

@cloud-fan I ran ThriftServerQueryTestSuite locally. It failed by the the same reasons.

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114278 has finished for PR 26473 at commit 73ef32f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2019

jenkins, retest this, please

@cloud-fan
Copy link
Contributor

@MaxGekk do we have a simple example to demonstrate the bug of ThriftServerQueryTestSuite? We can create a JIRA to look into it.

In the meanwhile, we don't add tests to pgsql/interval.sql to avoid the bug.

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114282 has finished for PR 26473 at commit 73ef32f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114286 has finished for PR 26473 at commit 73ef32f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2019

@MaxGekk do we have a simple example to demonstrate the bug of ThriftServerQueryTestSuite?

If this PR will be merged to the master, the bug can be reproduced by removing set ... in ansi/interval.sql pretty easy. Otherwise I don't know how to reproduce it easier.

We can create a JIRA to look into it.

It has been already created: https://issues.apache.org/jira/browse/SPARK-29933

In the meanwhile, we don't add tests to pgsql/interval.sql to avoid the bug.

Actually, it is not enough because wrong settings cause failure ansi/interval.sql due to https://issues.apache.org/jira/browse/SPARK-29920 and postgresql/interval.sql due to ThriftServerQueryTestSuite uses new implementation which is much stronger.

I could remove the tests from *interval.sql but I would prefer to leave them there to have reproducible examples for SPARK-29933 and to don't forget about the test in the future.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2019

For some reasons, the last 2 builds finished successfully #26473 (comment) & #26473 (comment) . Probably, the order of tests was changed.

…-daytime-string

# Conflicts:
#	docs/sql-migration-guide.md
@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 22, 2019

@cloud-fan Hmm, when I run all tests via build/sbt "hive-thriftserver/test-only *ThriftServerQueryTestSuite" -Phive-thriftserver, they all passed. When I run only interval.sql via build/sbt "hive-thriftserver/test-only *ThriftServerQueryTestSuite -- -z interval.sql" -Phive-thriftserver, they fail.

@SparkQA
Copy link

SparkQA commented Nov 22, 2019

Test build #114304 has finished for PR 26473 at commit d27d434.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Nov 26, 2019

I've figured out the problem: The Spark thrift server creates many SparkSessions to serve requests, and the thrift server serves requests using a single thread. One thread can only have one active SparkSession, so SQLCong.get can't get the proper conf from the session that runs the query.

For this particular problem, a simple fix in Dataset is

def sql(sqlText: String): DataFrame = {
     val tracker = new QueryPlanningTracker
+    SparkSession.setActiveSession(this)
     val plan = tracker.measurePhase(QueryPlanningTracker.PARSING) {
       sessionState.sqlParser.parsePlan(sqlText)
     }

But we should really think about how to set active session correctly.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 9, 2019

I don't know when this bug #26473 (comment) will be fixed but can we merge this PR with the workaround?

I just want to remind you that this PR fixes the bug #26473 (comment)

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 10, 2019

It has been a while, can you remind me what's your workaround? BTW I feel it's also OK to use my one-line fix: just add SparkSession.setActiveSession(this) in SparkSession/sql

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 10, 2019

can you remind me what's your workaround?

I mean explicit set ca46f44

@cloud-fan
Copy link
Contributor

The dialect config has been removed. Can you try and see if the test still fail? If it does then maybe just use my one-line fix.

…-daytime-string

# Conflicts:
#	sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out
#	sql/core/src/test/resources/sql-tests/results/interval.sql.out
@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115122 has finished for PR 26473 at commit f9510e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


check("12:40:30.999999999", HOUR, SECOND, "12 hours 40 minutes 30.999999 seconds")
check("+12:40:30.999999999", HOUR, SECOND, "12 hours 40 minutes 30.999999 seconds")
check("-12:40:30.999999999", HOUR, SECOND, "-12 hours -40 minutes -30.999999 seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid using 99999. It's hard for reviewers to count the number of digits. Let's use 123456 to ease the digits counting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to test the maximum possible value in the fractional part. I think we need at least one test for 999999999

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced 999999999 by 123456789 everywhere except one check

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one minor comment

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115177 has finished for PR 26473 at commit c16f2a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e933539 Dec 11, 2019
@MaxGekk MaxGekk deleted the strict-from-daytime-string branch June 5, 2020 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants