Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
adb1842
[SPARK-33474][SQL] Support TypeConstructed partition spec value
AngersZhuuuu Nov 19, 2020
ae59115
Update AstBuilder.scala
AngersZhuuuu Nov 19, 2020
c9a97d0
Update AstBuilder.scala
AngersZhuuuu Nov 19, 2020
bcdc7e5
Update SQLQuerySuite.scala
AngersZhuuuu Nov 19, 2020
d171377
FOLLOW COMMENT
AngersZhuuuu Nov 24, 2020
516c070
Update SQLQuerySuite.scala
AngersZhuuuu Nov 24, 2020
251c36b
Update DDLParserSuite.scala
AngersZhuuuu Nov 24, 2020
1590e8a
Merge branch 'master' into SPARK-33474
AngersZhuuuu Nov 24, 2020
6adefa7
Update DDLParserSuite.scala
AngersZhuuuu Nov 24, 2020
05f1962
Update SQLQuerySuite.scala
AngersZhuuuu Nov 24, 2020
e312697
follow comment
AngersZhuuuu Nov 24, 2020
edd270e
Update AstBuilder.scala
AngersZhuuuu Nov 24, 2020
a8f26a1
follow comment
AngersZhuuuu Nov 25, 2020
98986a0
Update AstBuilder.scala
AngersZhuuuu Nov 25, 2020
e2749c3
Update sql-migration-guide.md
AngersZhuuuu Nov 25, 2020
713da66
follow comment
AngersZhuuuu Nov 27, 2020
4898fb4
follow comment
AngersZhuuuu Nov 29, 2020
e875bcf
Update SQLQuerySuite.scala
AngersZhuuuu Nov 29, 2020
0228292
Update SQLQuerySuite.scala
AngersZhuuuu Nov 30, 2020
2db3f14
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 1, 2020
055a903
Update
AngersZhuuuu Dec 4, 2020
e6092c1
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 9, 2020
06a9321
Update DDLParserSuite.scala
AngersZhuuuu Dec 9, 2020
bc3e347
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 21, 2020
0358274
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 24, 2020
ca69533
Merge branch 'SPARK-33474' of https://github.com/AngersZhuuuu/spark i…
AngersZhuuuu Dec 24, 2020
894ac90
Merge branch 'master' into SPARK-33474
AngersZhuuuu Feb 20, 2021
0b4b211
update
AngersZhuuuu Feb 20, 2021
7264a3d
Update DDLParserSuite.scala
AngersZhuuuu Feb 20, 2021
b68ee81
Fix build issue
AngersZhuuuu Feb 20, 2021
dc0783a
Update sql-migration-guide.md
AngersZhuuuu Feb 20, 2021
d3b1960
Update AstBuilder.scala
AngersZhuuuu Feb 20, 2021
2e9058a
follow comment
AngersZhuuuu Feb 24, 2021
31c1092
Merge branch 'master' into SPARK-33474
AngersZhuuuu Feb 24, 2021
9f5d569
Update SQLQuerySuite.scala
AngersZhuuuu Feb 24, 2021
5e05adb
follow comment
AngersZhuuuu Feb 26, 2021
dd34027
follow comment
AngersZhuuuu Feb 26, 2021
4cdb4a7
Update AlterTableDropPartitionSuiteBase.scala
AngersZhuuuu Feb 26, 2021
093b062
Update SQLInsertTestSuite.scala
AngersZhuuuu Feb 27, 2021
adc3984
Merge branch 'master' into SPARK-33474
AngersZhuuuu Mar 1, 2021
f6bdbbe
follow comment
AngersZhuuuu Mar 1, 2021
2ef0fd0
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
4023041
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
fe5095a
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
63a4fb4
follow comment
AngersZhuuuu Mar 2, 2021
4f61e60
follow comment
AngersZhuuuu Mar 2, 2021
08c55f6
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ license: |

- In Spark 3.2, the output schema of `SHOW TBLPROPERTIES` becomes `key: string, value: string` whether you specify the table property key or not. In Spark 3.1 and earlier, the output schema of `SHOW TBLPROPERTIES` is `value: string` when you specify the table property key. To restore the old schema with the builtin catalog, you can set `spark.sql.legacy.keepCommandOutputSchema` to `true`.

- In Spark 3.2, we support typed literals in the partition spec of INSERT and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 and earlier, the partition value will be parsed as string value `date '2020-01-01'`, which is an illegal date value, and we add a partition with null value at the end.

## Upgrading from Spark SQL 3.0 to 3.1

- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
Expand Down
8 changes: 4 additions & 4 deletions docs/sql-ref-syntax-ddl-alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ ALTER TABLE table_identifier partition_spec RENAME TO partition_spec

* **partition_spec**

Partition to be renamed.
Partition to be renamed. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand Down Expand Up @@ -126,7 +126,7 @@ ALTER TABLE table_identifier ADD [IF NOT EXISTS]

* **partition_spec**

Partition to be added.
Partition to be added. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand All @@ -152,7 +152,7 @@ ALTER TABLE table_identifier DROP [ IF EXISTS ] partition_spec [PURGE]

* **partition_spec**

Partition to be dropped.
Partition to be dropped. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand Down Expand Up @@ -217,7 +217,7 @@ ALTER TABLE table_identifier [ partition_spec ] SET LOCATION 'new_location'

* **partition_spec**

Specifies the partition on which the property has to be set.
Specifies the partition on which the property has to be set. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand Down
15 changes: 14 additions & 1 deletion docs/sql-ref-syntax-dml-insert-into.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ INSERT INTO [ TABLE ] table_identifier [ partition_spec ] [ ( column_list ) ]
* **partition_spec**

An optional parameter that specifies a comma-separated list of key and value pairs
for partitions.
for partitions. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand Down Expand Up @@ -206,6 +206,19 @@ SELECT * FROM students;
+-------------+--------------------------+----------+
```

#### Insert Using a Typed Date Literal for a Partition Column Value
```sql
CREATE TABLE students (name STRING, address STRING) PARTITIONED BY (birthday DATE);

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES ('Amy Smith', '123 Park Ave, San Jose');

SELECT * FROM students;
+-------------+-------------------------+-----------+
| name| address| birthday|
+-------------+-------------------------+-----------+
| Amy Smith| 123 Park Ave, San Jose| 2019-01-02|
+-------------+-------------------------+-----------+
#### Insert with a column list

```sql
Expand Down
25 changes: 24 additions & 1 deletion docs/sql-ref-syntax-dml-insert-overwrite-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ INSERT OVERWRITE [ TABLE ] table_identifier [ partition_spec [ IF NOT EXISTS ] ]
* **partition_spec**

An optional parameter that specifies a comma-separated list of key and value pairs
for partitions.
for partitions. Note that one can use a typed literal (e.g., date'2019-01-02') in the partition spec.

**Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )`

Expand Down Expand Up @@ -179,6 +179,29 @@ SELECT * FROM students;
+-----------+-------------------------+----------+
```

#### Insert Using a Typed Date Literal for a Partition Column Value
```sql
CREATE TABLE students (name STRING, address STRING) PARTITIONED BY (birthday DATE);

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES ('Amy Smith', '123 Park Ave, San Jose');

SELECT * FROM students;
+-------------+-------------------------+-----------+
| name| address| birthday|
+-------------+-------------------------+-----------+
| Amy Smith| 123 Park Ave, San Jose| 2019-01-02|
+-------------+-------------------------+-----------+

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES('Jason Wang', '908 Bird St, Saratoga');

SELECT * FROM students;
+-----------+-------------------------+-----------+
| name| address| birthday|
+-----------+-------------------------+-----------+
| Jason Wang| 908 Bird St, Saratoga| 2019-01-02|
+-----------+-------------------------+-----------+
#### Insert with a column list

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -509,10 +509,16 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
protected def visitStringConstant(
ctx: ConstantContext,
legacyNullAsString: Boolean): String = withOrigin(ctx) {
ctx match {
case _: NullLiteralContext if !legacyNullAsString => null
case s: StringLiteralContext => createString(s)
case o => o.getText
expression(ctx) match {
case Literal(null, _) if !legacyNullAsString => null
case l @ Literal(null, _) => l.toString
case l: Literal =>
// TODO For v2 commands, we will cast the string back to its actual value,
// which is a waste and can be improved in the future.
Cast(l, StringType, Some(SQLConf.get.sessionLocalTimeZone)).eval().toString
case other =>
throw new IllegalArgumentException(s"Only literals are allowed in the " +
s"partition spec, but got ${other.sql}")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ package org.apache.spark.sql.catalyst.parser
import java.util.Locale

import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, GlobalTempView, LocalTempView, PersistedView, UnresolvedAttribute, UnresolvedFunc, UnresolvedNamespace, UnresolvedRelation, UnresolvedStar, UnresolvedTable, UnresolvedTableOrView, UnresolvedView}
import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, GlobalTempView, LocalTempView, PersistedView, UnresolvedAttribute, UnresolvedFunc, UnresolvedInlineTable, UnresolvedNamespace, UnresolvedRelation, UnresolvedStar, UnresolvedTable, UnresolvedTableOrView, UnresolvedView}
import org.apache.spark.sql.catalyst.catalog.{ArchiveResource, BucketSpec, FileResource, FunctionResource, JarResource}
import org.apache.spark.sql.catalyst.expressions.{EqualTo, Literal}
import org.apache.spark.sql.catalyst.expressions.{EqualTo, Hex, Literal}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition.{after, first}
import org.apache.spark.sql.connector.expressions.{ApplyTransform, BucketTransform, DaysTransform, FieldReference, HoursTransform, IdentityTransform, LiteralValue, MonthsTransform, Transform, YearsTransform}
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructType, TimestampType}
import org.apache.spark.unsafe.types.UTF8String
import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}

class DDLParserSuite extends AnalysisTest {
import CatalystSqlParser._
Expand Down Expand Up @@ -2497,4 +2497,28 @@ class DDLParserSuite extends AnalysisTest {

testCreateOrReplaceDdl(sql, expectedTableSpec, expectedIfNotExists = false)
}

test("SPARK-33474: Support typed literals as partition spec values") {
def insertPartitionPlan(part: String): InsertIntoStatement = {
InsertIntoStatement(
UnresolvedRelation(Seq("t")),
Map("part" -> Some(part)),
Seq.empty[String],
UnresolvedInlineTable(Seq("col1"), Seq(Seq(Literal("a")))),
overwrite = false, ifPartitionNotExists = false)
}
val binaryStr = "Spark SQL"
val binaryHexStr = Hex.hex(UTF8String.fromString(binaryStr).getBytes).toString
val dateTypeSql = "INSERT INTO t PARTITION(part = date'2019-01-02') VALUES('a')"
val interval = new CalendarInterval(7, 1, 1000).toString
val intervalTypeSql = s"INSERT INTO t PARTITION(part = interval'$interval') VALUES('a')"
val timestamp = "2019-01-02 11:11:11"
val timestampTypeSql = s"INSERT INTO t PARTITION(part = timestamp'$timestamp') VALUES('a')"
val binaryTypeSql = s"INSERT INTO t PARTITION(part = X'$binaryHexStr') VALUES('a')"

comparePlans(parsePlan(dateTypeSql), insertPartitionPlan("2019-01-02"))
comparePlans(parsePlan(intervalTypeSql), insertPartitionPlan(interval))
comparePlans(parsePlan(timestampTypeSql), insertPartitionPlan(timestamp))
comparePlans(parsePlan(binaryTypeSql), insertPartitionPlan(binaryStr))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@
package org.apache.spark.sql

import org.apache.spark.SparkConf
import org.apache.spark.sql.catalyst.expressions.Hex
import org.apache.spark.sql.connector.InMemoryPartitionTableCatalog
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
import org.apache.spark.unsafe.types.UTF8String

/**
* The base trait for DML - insert syntax
Expand Down Expand Up @@ -209,6 +211,52 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils {
}
}

test("SPARK-33474: Support typed literals as partition spec values") {
withTable("t1") {
val binaryStr = "Spark SQL"
val binaryHexStr = Hex.hex(UTF8String.fromString(binaryStr).getBytes).toString
sql(
"""
| CREATE TABLE t1(name STRING, part1 DATE, part2 TIMESTAMP, part3 BINARY,
| part4 STRING, part5 STRING, part6 STRING, part7 STRING)
| USING PARQUET PARTITIONED BY (part1, part2, part3, part4, part5, part6, part7)
""".stripMargin)

sql(
s"""
| INSERT OVERWRITE t1 PARTITION(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can test with only one insert: part4 to test plain string, part5, part6 and part7 to test type conversion from date/timestamp/binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| part1 = date'2019-01-01',
| part2 = timestamp'2019-01-01 11:11:11',
| part3 = X'$binaryHexStr',
| part4 = 'p1',
| part5 = date'2019-01-01',
| part6 = timestamp'2019-01-01 11:11:11',
| part7 = X'$binaryHexStr'
| ) VALUES('a')
""".stripMargin)
checkAnswer(sql(
"""
| SELECT
| name,
| CAST(part1 AS STRING),
| CAST(part2 as STRING),
| CAST(part3 as STRING),
| part4,
| part5,
| part6,
| part7
| FROM t1
""".stripMargin),
Row("a", "2019-01-01", "2019-01-01 11:11:11", "Spark SQL", "p1",
"2019-01-01", "2019-01-01 11:11:11", "Spark SQL"))

val e = intercept[AnalysisException] {
sql("CREATE TABLE t2(name STRING, part INTERVAL) USING PARQUET PARTITIONED BY (part)")
}.getMessage
assert(e.contains("Cannot use interval"))
}
}

test("SPARK-34556: " +
"checking duplicate static partition columns should respect case sensitive conf") {
withTable("t") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,4 +181,12 @@ trait AlterTableAddPartitionSuiteBase extends QueryTest with DDLCommandTestUtils
checkPartitions(t, Map("id" -> "1"), Map("id" -> "2"))
}
}

test("SPARK-33474: Support typed literals as partition spec values") {
withNamespaceAndTable("ns", "tbl") { t =>
sql(s"CREATE TABLE $t(name STRING, part DATE) USING PARQUET PARTITIONED BY (part)")
sql(s"ALTER TABLE $t ADD PARTITION(part = date'2020-01-01')")
checkPartitions(t, Map("part" ->"2020-01-01"))
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -230,4 +230,14 @@ trait AlterTableDropPartitionSuiteBase extends QueryTest with DDLCommandTestUtil
}
}
}

test("SPARK-33474: Support typed literals as partition spec values") {
withNamespaceAndTable("ns", "tbl") { t =>
sql(s"CREATE TABLE $t(name STRING, part DATE) USING PARQUET PARTITIONED BY (part)")
sql(s"ALTER TABLE $t ADD PARTITION(part = date'2020-01-01')")
checkPartitions(t, Map("part" -> "2020-01-01"))
sql(s"ALTER TABLE $t DROP PARTITION (part = date'2020-01-01')")
checkPartitions(t)
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -210,4 +210,15 @@ trait AlterTableRenamePartitionSuiteBase extends QueryTest with DDLCommandTestUt
}
}
}

test("SPARK-33474: Support typed literals as partition spec values") {
withNamespaceAndTable("ns", "tbl") { t =>
sql(s"CREATE TABLE $t(name STRING, part DATE) USING PARQUET PARTITIONED BY (part)")
sql(s"ALTER TABLE $t ADD PARTITION(part = date'2020-01-01')")
checkPartitions(t, Map("part" -> "2020-01-01"))
sql(s"ALTER TABLE $t PARTITION (part = date'2020-01-01')" +
s" RENAME TO PARTITION (part = date'2020-01-02')")
checkPartitions(t, Map("part" -> "2020-01-02"))
}
}
}