Skip to content

[SPARK-20910][SQL] Add build-in SQL function - UUID#18136

Closed
wangyum wants to merge 6 commits intoapache:masterfrom
wangyum:SPARK-20910
Closed

[SPARK-20910][SQL] Add build-in SQL function - UUID#18136
wangyum wants to merge 6 commits intoapache:masterfrom
wangyum:SPARK-20910

Conversation

@wangyum
Copy link
Copy Markdown
Member

@wangyum wangyum commented May 28, 2017

What changes were proposed in this pull request?

Add build-int SQL function - UUID.

How was this patch tested?

unit tests

@SparkQA
Copy link
Copy Markdown

SparkQA commented May 28, 2017

Test build #77484 has finished for PR 18136 at commit e82296a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Uuid() extends LeafExpression with CodegenFallback

46707d92-02f4-4817-8116-a4c3b23e6266
""")
// scalastyle:on line.size.limit
case class Uuid() extends LeafExpression with CodegenFallback {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement codegen for this?

""")
// scalastyle:on line.size.limit
case class Uuid() extends LeafExpression with CodegenFallback {
override def foldable: Boolean = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function is foldable. If this is foldable, it means we might use the same UUID generated by optimizer for all rows.

wangyum added 2 commits May 31, 2017 20:39
Conflicts:
	sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
	sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
@SparkQA
Copy link
Copy Markdown

SparkQA commented May 31, 2017

Test build #77593 has finished for PR 18136 at commit 953b40d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Uuid() extends LeafExpression

override def eval(input: InternalRow): Any = UTF8String.fromString(UUID.randomUUID().toString)

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
ev.copy(code = s"final ${ctx.javaType(dataType)} ${ev.value} = " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ${ctx.javaType(dataType)} -> UTF8String

select replace('abc', 'b');

-- uuid
select length(uuid()), (uuid() <> uuid());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should work if Uuid is deterministic = false.

Copy link
Copy Markdown
Member

@viirya viirya Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't misunderstand it, I think what @ueshin means is we should set deterministic = false for Uuid expression?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, thank you for supporting.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 1, 2017

Test build #77613 has finished for PR 18136 at commit 46cd181.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// scalastyle:off line.size.limit
@ExpressionDescription(
usage = "_FUNC_() - Returns a universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... an universally ...

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 1, 2017

Test build #77620 has finished for PR 18136 at commit d19ccca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Copy Markdown
Member

ueshin commented Jun 1, 2017

LGTM, pending tests.

@viirya
Copy link
Copy Markdown
Member

viirya commented Jun 1, 2017

LGTM

@wangyum wangyum changed the title [SPARK-20910][SQL] Add build-int SQL function - UUID [SPARK-20910][SQL] Add build-in SQL function - UUID Jun 1, 2017
@SparkQA
Copy link
Copy Markdown

SparkQA commented Jun 1, 2017

Test build #77626 has finished for PR 18136 at commit fe18a1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Copy Markdown
Member

ueshin commented Jun 1, 2017

Thanks! Merging to master.

@asfgit asfgit closed this in 6d05c1c Jun 1, 2017
@hvanhovell
Copy link
Copy Markdown
Contributor

I just came across this expression and I have a few concerns:

  1. A row will not get the same UUID assigned when a task fails. This might cause some really weird problems when the UUID column is used later-on.
  2. UUID.randomString() uses a lot of synchronization in the background. This might make it pretty slow.

Shall I file a ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants