-
Notifications
You must be signed in to change notification settings - Fork 29.3k
[SPARK-15988] [SQL] Implement DDL commands: Create/Drop temporary macro #13706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 26 commits
f2433c2
0b93636
301e950
808a5fa
f4ed3bc
af0136d
5550496
9fe1881
b8ffdc9
fb8b57a
e895a9c
8d520eb
651b485
277ba9f
314913d
3d05e4f
22d8b1a
d91f633
1eb23c7
ad85109
b52698f
3eacebc
fce1121
eaff4e9
97632a9
4ee32e9
b539e94
1563f12
4d8e843
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -60,6 +60,9 @@ trait FunctionRegistry { | |
| /** Drop a function and return whether the function existed. */ | ||
| def dropFunction(name: String): Boolean | ||
|
|
||
| /** Drop a macro and return whether the macro existed. */ | ||
| def dropMacro(name: String): Boolean | ||
|
|
||
| /** Checks if a function with a given name exists. */ | ||
| def functionExists(name: String): Boolean = lookupFunction(name).isDefined | ||
|
|
||
|
|
@@ -107,6 +110,14 @@ class SimpleFunctionRegistry extends FunctionRegistry { | |
| functionBuilders.remove(name).isDefined | ||
| } | ||
|
|
||
| override def dropMacro(name: String): Boolean = synchronized { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A drop function can currently also drop a macro. Can you make sure that this cannot happen? Maybe we should consolidate this into a single drop function with a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hive can drop temporary function using command 'drop Macro'. And it also can drop temporary macro using command 'drop temporary function'. |
||
| if (functionBuilders.get(name).map(_._1).filter(_.isMacro).isDefined) { | ||
| functionBuilders.remove(name).isDefined | ||
| } else { | ||
| false | ||
| } | ||
| } | ||
|
|
||
| override def clear(): Unit = synchronized { | ||
| functionBuilders.clear() | ||
| } | ||
|
|
@@ -146,6 +157,10 @@ object EmptyFunctionRegistry extends FunctionRegistry { | |
| throw new UnsupportedOperationException | ||
| } | ||
|
|
||
| override def dropMacro(name: String): Boolean = { | ||
| throw new UnsupportedOperationException | ||
| } | ||
|
|
||
| override def dropFunction(name: String): Boolean = { | ||
| throw new UnsupportedOperationException | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,3 +52,6 @@ class NoSuchPartitionsException(db: String, table: String, specs: Seq[TableParti | |
|
|
||
| class NoSuchTempFunctionException(func: String) | ||
| extends AnalysisException(s"Temporary function '$func' not found") | ||
|
|
||
| class NoSuchTempMacroException(func: String) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove it. For reasons, please see the PR #17716.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, Thanks. |
||
| extends AnalysisException(s"Temporary macro '$func' not found") | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1090,6 +1090,24 @@ class SessionCatalog( | |
| } | ||
| } | ||
|
|
||
| /** Create a temporary macro. */ | ||
| def createTempMacro( | ||
| name: String, | ||
| info: ExpressionInfo, | ||
| functionBuilder: FunctionBuilder): Unit = { | ||
| if (functionRegistry.functionExists(name)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not entirely sure if we should throw an exception here. It unfortunately depends on the semantics you follow, SQL will throw an exception, whereas the Dataframe API will just overwrite the function. Let's follow Hive for now.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hive overwrites the temporary function without issuing an exception. |
||
| throw new AnalysisException(s"Function $name already exists") | ||
| } | ||
| functionRegistry.registerFunction(name, info, functionBuilder) | ||
| } | ||
|
|
||
| /** Drop a temporary macro. */ | ||
| def dropTempMacro(name: String, ignoreIfNotExists: Boolean): Unit = { | ||
| if (!functionRegistry.dropMacro(name) && !ignoreIfNotExists) { | ||
| throw new NoSuchTempMacroException(name) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After we drop the macro, the existing function works well. That means, we did not delete the original built-in functions. The built-in function will not be dropped by
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I have update it with this case. |
||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Returns whether it is a temporary function. If not existed, returns false. | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,7 +34,7 @@ import org.apache.spark.sql.catalyst.plans.logical._ | |
| import org.apache.spark.sql.execution.command._ | ||
| import org.apache.spark.sql.execution.datasources.{CreateTable, _} | ||
| import org.apache.spark.sql.internal.{HiveSerDe, SQLConf, VariableSubstitution} | ||
| import org.apache.spark.sql.types.StructType | ||
| import org.apache.spark.sql.types.{StructField, StructType} | ||
|
|
||
| /** | ||
| * Concrete parser for Spark SQL statements. | ||
|
|
@@ -715,6 +715,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { | |
| ctx.TEMPORARY != null) | ||
| } | ||
|
|
||
| /** | ||
| * Create a [[CreateMacroCommand]] command. | ||
| * | ||
| * For example: | ||
| * {{{ | ||
| * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; | ||
| * }}} | ||
| */ | ||
| override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { | ||
| val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: you can avoid |
||
| .getOrElse(Seq.empty[StructField]) | ||
| val e = expression(ctx.expression) | ||
| CreateMacroCommand( | ||
| ctx.macroName.getText, | ||
| MacroFunctionWrapper(arguments, e)) | ||
| } | ||
|
|
||
| /** | ||
| * Create a [[DropMacroCommand]] command. | ||
| * | ||
| * For example: | ||
| * {{{ | ||
| * DROP TEMPORARY MACRO [IF EXISTS] macro_name; | ||
| * }}} | ||
| */ | ||
| override def visitDropMacro(ctx: DropMacroContext): LogicalPlan = withOrigin(ctx) { | ||
| DropMacroCommand( | ||
| ctx.macroName.getText, | ||
| ctx.EXISTS != null) | ||
| } | ||
|
|
||
| /** | ||
| * Create a [[DropTableCommand]] command. | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.execution.command | ||
|
|
||
| import org.apache.spark.sql.{AnalysisException, Row, SparkSession} | ||
| import org.apache.spark.sql.catalyst.analysis._ | ||
| import org.apache.spark.sql.catalyst.expressions._ | ||
| import org.apache.spark.sql.types.StructField | ||
|
|
||
| /** | ||
| * This class provides arguments and body expression of the macro function. | ||
| */ | ||
| case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because Analyzer will check macroFunction that is invalid if I donot use MacroFunctionWrapper. |
||
|
|
||
| /** | ||
| * The DDL command that creates a macro. | ||
| * To create a temporary macro, the syntax of using this command in SQL is: | ||
| * {{{ | ||
| * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; | ||
| * }}} | ||
| */ | ||
| case class CreateMacroCommand( | ||
| macroName: String, | ||
| funcWrapper: MacroFunctionWrapper) | ||
| extends RunnableCommand { | ||
|
|
||
| override def run(sparkSession: SparkSession): Seq[Row] = { | ||
| val catalog = sparkSession.sessionState.catalog | ||
| val columns = funcWrapper.columns.map { col => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might easier to use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, i will do it, thanks. |
||
| AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: put |
||
| val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap | ||
| if (colToIndex.size != columns.size) { | ||
| throw new AnalysisException(s"Cannot support duplicate colNames " + | ||
| s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") | ||
| } | ||
| val macroFunction = funcWrapper.macroFunction.transform { | ||
| case u: UnresolvedAttribute => | ||
| val index = colToIndex.get(u.name).getOrElse( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should respect the case-sensitivity settings here. So a lookup might not be the best idea.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, i will do it, thanks. |
||
| throw new AnalysisException(s"Cannot find colName: ${u} " + | ||
| s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) | ||
| BoundReference(index, columns(index).dataType, columns(index).nullable) | ||
| case u: UnresolvedFunction => | ||
| sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) | ||
| case s: SubqueryExpression => | ||
| throw new AnalysisException(s"Cannot support Subquery: ${s} " + | ||
| s"for CREATE TEMPORARY MACRO $macroName") | ||
| case u: UnresolvedGenerator => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this what Hive does? I really don't see why we should not support this. Please note that we cannot use generators if we decide that an expression has to be a fully resolved expression at creation time. |
||
| throw new AnalysisException(s"Cannot support Generator: ${u} " + | ||
| s"for CREATE TEMPORARY MACRO $macroName") | ||
| } | ||
|
|
||
| val macroInfo = columns.mkString(",") + " -> " + funcWrapper.macroFunction.toString | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give an example of what this would look like? |
||
| val info = new ExpressionInfo(macroInfo, macroName, true) | ||
| val builder = (children: Seq[Expression]) => { | ||
| if (children.size != columns.size) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is slightly better to |
||
| throw new AnalysisException(s"Actual number of columns: ${children.size} != " + | ||
| s"expected number of columns: ${columns.size} for Macro $macroName") | ||
| } | ||
| macroFunction.transform { | ||
| // Skip to validate the input type because check it at runtime. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we check at runtime? The current code does not seem to respect the types passed, and rely on the macro's expression to do some type validation, this means you can pass anything to the macro and the user can end up with an unexpected result: create macro plus(a int, b int) as a + b;
select plus(1.0, 1.0) as result -- This returns a decimal, and not an int as expectedSo I think we should at least validate the input expressions. The hacky way would be to add casts, and have the analyzer fail if the cast cannot be made (this is terrible UX). A better way to would be to create some sentinel expression that makes sure the analyzer will insert the correct cast, and throws a relevant exception (mentioning the macro) when this blows up...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On a related note, we are currently not sure if the macro produces a valid expression. Maybe we should run analysis on the macro expression to make sure it does not fail every query later on, e.g.: val resolvedMacroFunction = try {
val plan = Project(Alias(macroFunction, "m")() :: Nil, OneRowRelation)
val analyzed @ Project(Seq(named), OneRowRelation) =
sparkSession.sessionState.analyzer.execute(plan)
sparkSession.sessionState.analyzer.checkAnalysis(analyzed)
named.children.head
} catch {
case a: AnalysisException =>
...
}Note that we cannot use generators if we use this approach...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, Now i update it with you ideas. Thanks. |
||
| case b: BoundReference => children(b.ordinal) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do not validate the input type here. This would be entirely fine if macro arguments were defined without a
What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hvanhovell good points. Because Analyzer will check expression's checkInputDataTypes after ResolveFunctions, I think we do not validate input type here. Now i do not think it has benefits if we did casts, but it maybe cause unnecessary casts. I will add some comments for it. Thanks.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that is perfect. |
||
| } | ||
| } | ||
| catalog.createTempMacro(macroName, info, builder) | ||
| Seq.empty[Row] | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The DDL command that drops a macro. | ||
| * ifExists: returns an error if the macro doesn't exist, unless this is true. | ||
| * {{{ | ||
| * DROP TEMPORARY MACRO [IF EXISTS] macro_name; | ||
| * }}} | ||
| */ | ||
| case class DropMacroCommand(macroName: String, ifExists: Boolean) | ||
| extends RunnableCommand { | ||
|
|
||
| override def run(sparkSession: SparkSession): Seq[Row] = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will drop any function... Can we make it Macro specific? |
||
| val catalog = sparkSession.sessionState.catalog | ||
| catalog.dropTempMacro(macroName, ifExists) | ||
| Seq.empty[Row] | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { | |
| ) | ||
| } | ||
|
|
||
| test("create/drop temporary macro") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add a case for a macro without parameters? E.g.:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also test a combination of temporary macros/functions...? |
||
| intercept[AnalysisException] { | ||
| sql(s"CREATE TEMPORARY MACRO simple_add_error(x int) x + y") | ||
| } | ||
| intercept[AnalysisException] { | ||
| sql(s"CREATE TEMPORARY MACRO simple_add_error(x int, x int) x + y") | ||
| } | ||
| intercept[AnalysisException] { | ||
| sql(s"CREATE TEMPORARY MACRO simple_add_error(x int) x NOT IN (select c2 from t2) ") | ||
| } | ||
| sql("CREATE TEMPORARY MACRO fixed_number() 42") | ||
| checkAnswer(sql("SELECT fixed_number()"), Row(42)) | ||
| sql("CREATE TEMPORARY MACRO string_len_plus_two(x string) length(x) + 2") | ||
| checkAnswer(sql("SELECT string_len_plus_two('abc')"), Row(5)) | ||
| sql("CREATE TEMPORARY MACRO simple_add(x int, y int) x + y") | ||
| checkAnswer(sql("SELECT simple_add(1, 2)"), Row(3)) | ||
| intercept[AnalysisException] { | ||
| sql(s"SELECT simple_add(1)") | ||
| } | ||
| sql("DROP TEMPORARY MACRO fixed_number") | ||
| intercept[AnalysisException] { | ||
| sql(s"DROP TEMPORARY MACRO abs") | ||
| } | ||
| intercept[AnalysisException] { | ||
| sql("DROP TEMPORARY MACRO SOME_MACRO") | ||
| } | ||
| sql("DROP TEMPORARY MACRO IF EXISTS SOME_MACRO") | ||
| } | ||
|
|
||
| test("create a data source table without schema") { | ||
| import testImplicits._ | ||
| withTempPath { tempDir => | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Hive also support non-temporary macro's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Now Hive only support temporary macro's.