Skip to content
Closed
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
adb1842
[SPARK-33474][SQL] Support TypeConstructed partition spec value
AngersZhuuuu Nov 19, 2020
ae59115
Update AstBuilder.scala
AngersZhuuuu Nov 19, 2020
c9a97d0
Update AstBuilder.scala
AngersZhuuuu Nov 19, 2020
bcdc7e5
Update SQLQuerySuite.scala
AngersZhuuuu Nov 19, 2020
d171377
FOLLOW COMMENT
AngersZhuuuu Nov 24, 2020
516c070
Update SQLQuerySuite.scala
AngersZhuuuu Nov 24, 2020
251c36b
Update DDLParserSuite.scala
AngersZhuuuu Nov 24, 2020
1590e8a
Merge branch 'master' into SPARK-33474
AngersZhuuuu Nov 24, 2020
6adefa7
Update DDLParserSuite.scala
AngersZhuuuu Nov 24, 2020
05f1962
Update SQLQuerySuite.scala
AngersZhuuuu Nov 24, 2020
e312697
follow comment
AngersZhuuuu Nov 24, 2020
edd270e
Update AstBuilder.scala
AngersZhuuuu Nov 24, 2020
a8f26a1
follow comment
AngersZhuuuu Nov 25, 2020
98986a0
Update AstBuilder.scala
AngersZhuuuu Nov 25, 2020
e2749c3
Update sql-migration-guide.md
AngersZhuuuu Nov 25, 2020
713da66
follow comment
AngersZhuuuu Nov 27, 2020
4898fb4
follow comment
AngersZhuuuu Nov 29, 2020
e875bcf
Update SQLQuerySuite.scala
AngersZhuuuu Nov 29, 2020
0228292
Update SQLQuerySuite.scala
AngersZhuuuu Nov 30, 2020
2db3f14
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 1, 2020
055a903
Update
AngersZhuuuu Dec 4, 2020
e6092c1
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 9, 2020
06a9321
Update DDLParserSuite.scala
AngersZhuuuu Dec 9, 2020
bc3e347
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 21, 2020
0358274
Merge branch 'master' into SPARK-33474
AngersZhuuuu Dec 24, 2020
ca69533
Merge branch 'SPARK-33474' of https://github.com/AngersZhuuuu/spark i…
AngersZhuuuu Dec 24, 2020
894ac90
Merge branch 'master' into SPARK-33474
AngersZhuuuu Feb 20, 2021
0b4b211
update
AngersZhuuuu Feb 20, 2021
7264a3d
Update DDLParserSuite.scala
AngersZhuuuu Feb 20, 2021
b68ee81
Fix build issue
AngersZhuuuu Feb 20, 2021
dc0783a
Update sql-migration-guide.md
AngersZhuuuu Feb 20, 2021
d3b1960
Update AstBuilder.scala
AngersZhuuuu Feb 20, 2021
2e9058a
follow comment
AngersZhuuuu Feb 24, 2021
31c1092
Merge branch 'master' into SPARK-33474
AngersZhuuuu Feb 24, 2021
9f5d569
Update SQLQuerySuite.scala
AngersZhuuuu Feb 24, 2021
5e05adb
follow comment
AngersZhuuuu Feb 26, 2021
dd34027
follow comment
AngersZhuuuu Feb 26, 2021
4cdb4a7
Update AlterTableDropPartitionSuiteBase.scala
AngersZhuuuu Feb 26, 2021
093b062
Update SQLInsertTestSuite.scala
AngersZhuuuu Feb 27, 2021
adc3984
Merge branch 'master' into SPARK-33474
AngersZhuuuu Mar 1, 2021
f6bdbbe
follow comment
AngersZhuuuu Mar 1, 2021
2ef0fd0
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
4023041
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
fe5095a
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 1, 2021
63a4fb4
follow comment
AngersZhuuuu Mar 2, 2021
4f61e60
follow comment
AngersZhuuuu Mar 2, 2021
08c55f6
Update SQLInsertTestSuite.scala
AngersZhuuuu Mar 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ license: |

- Since Spark 3.1, CHAR/CHARACTER and VARCHAR types are supported in the table schema. Table scan/insertion will respect the char/varchar semantic. If char/varchar is used in places other than table schema, an exception will be thrown (CAST is an exception that simply treats char/varchar as string like before). To restore the behavior before Spark 3.1, which treats them as STRING types and ignores a length parameter, e.g. `CHAR(4)`, you can set `spark.sql.legacy.charVarcharAsString` to `true`.

- In Spark 3.1, we support using corresponding typed literal of partition column value type as partition column value in SQL, such as if we have a partition table with partition column of date type, we can use typed date literal `date '2020-01-01'` as partition spec `PARTITION (dt = date '2020-01-01')`, it will be treated as partition column value `2020-01-01`. In Spark 3.0 the partition value will be treated as string value `date '2020-01-01'` and it's a illegal date type string value and will been converted to `__HIVE_DEFAULT_PARTITION__`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be In Spark 3.2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be In Spark 3.2?

Hmm, yea, updated.


- In Spark 3.1, `AnalysisException` is replaced by its sub-classes that are thrown for tables from Hive external catalog in the following situations:
* `ALTER TABLE .. ADD PARTITION` throws `PartitionsAlreadyExistException` if new partition exists already
* `ALTER TABLE .. DROP PARTITION` throws `NoSuchPartitionsException` for not existing partitions
Expand Down
17 changes: 15 additions & 2 deletions docs/sql-ref-syntax-dml-insert-into.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ INSERT INTO [ TABLE ] table_identifier [ partition_spec ] [ ( column_list ) ]

* **partition_spec**

An optional parameter that specifies a comma-separated list of key and value pairs
for partitions.
An optional parameter that specifies a comma separated list of key and value pairs
for partitions. Note that one can use a typed literal (e.g., date'2019-01-02') for a partition column value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we update the sql doc for ADD/DROP/RENAME PARTITION as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done


**Syntax:** `PARTITION ( partition_col_name = partition_col_val [ , ... ] )`

Expand Down Expand Up @@ -206,6 +206,19 @@ SELECT * FROM students;
+-------------+--------------------------+----------+
```

#### Insert Using a Typed Date Literal for a Partition Column Value
```sql
CREATE TABLE students (name STRING, address STRING) PARTITIONED BY (birthday DATE);

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES ('Amy Smith', '123 Park Ave, San Jose');

SELECT * FROM students;
+-------------+-------------------------+-----------+
| name| address| birthday|
+-------------+-------------------------+-----------+
| Amy Smith| 123 Park Ave, San Jose| 2019-01-02|
+-------------+-------------------------+-----------+
#### Insert with a column list

```sql
Expand Down
27 changes: 25 additions & 2 deletions docs/sql-ref-syntax-dml-insert-overwrite-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ INSERT OVERWRITE [ TABLE ] table_identifier [ partition_spec [ IF NOT EXISTS ] ]

* **partition_spec**

An optional parameter that specifies a comma-separated list of key and value pairs
for partitions.
An optional parameter that specifies a comma separated list of key and value pairs
for partitions. Note that one can use a typed literal (e.g., date'2019-01-02') for a partition column value.

**Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )`

Expand Down Expand Up @@ -179,6 +179,29 @@ SELECT * FROM students;
+-----------+-------------------------+----------+
```

#### Insert Using a Typed Date Literal for a Partition Column Value
```sql
CREATE TABLE students (name STRING, address STRING) PARTITIONED BY (birthday DATE);

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES ('Amy Smith', '123 Park Ave, San Jose');

SELECT * FROM students;
+-------------+-------------------------+-----------+
| name| address| birthday|
+-------------+-------------------------+-----------+
| Amy Smith| 123 Park Ave, San Jose| 2019-01-02|
+-------------+-------------------------+-----------+

INSERT INTO students PARTITION (birthday = date'2019-01-02')
VALUES('Jason Wang', '908 Bird St, Saratoga');

SELECT * FROM students;
+-----------+-------------------------+-----------+
| name| address| birthday|
+-----------+-------------------------+-----------+
| Jason Wang| 908 Bird St, Saratoga| 2019-01-02|
+-----------+-------------------------+-----------+
#### Insert with a column list

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -505,10 +505,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
protected def visitStringConstant(
ctx: ConstantContext,
legacyNullAsString: Boolean): String = withOrigin(ctx) {
ctx match {
case _: NullLiteralContext if !legacyNullAsString => null
case s: StringLiteralContext => createString(s)
case o => o.getText
expression(ctx) match {
case l: Literal if l.value == null & !legacyNullAsString => null
case l: Literal =>
Cast(l, StringType, Some(SQLConf.get.sessionLocalTimeZone)).eval().toString
case _ =>
throw new IllegalArgumentException("Only support convert Literal to string when visit" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only literals are allowed in the partition spec, but got ${other.sql}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

" partition spec value")
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@ package org.apache.spark.sql.catalyst.parser
import java.util.Locale

import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, GlobalTempView, LocalTempView, PersistedView, UnresolvedAttribute, UnresolvedFunc, UnresolvedNamespace, UnresolvedRelation, UnresolvedStar, UnresolvedTable, UnresolvedTableOrView, UnresolvedView}
import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, GlobalTempView, LocalTempView, PersistedView, UnresolvedAttribute, UnresolvedFunc, UnresolvedInlineTable, UnresolvedNamespace, UnresolvedRelation, UnresolvedStar, UnresolvedTable, UnresolvedTableOrView, UnresolvedView}
import org.apache.spark.sql.catalyst.catalog.{ArchiveResource, BucketSpec, FileResource, FunctionResource, JarResource}
import org.apache.spark.sql.catalyst.expressions.{EqualTo, Literal}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition.{after, first}
import org.apache.spark.sql.connector.expressions.{ApplyTransform, BucketTransform, DaysTransform, FieldReference, HoursTransform, IdentityTransform, LiteralValue, MonthsTransform, Transform, YearsTransform}
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructType, TimestampType}
import org.apache.spark.unsafe.types.UTF8String
import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}

class DDLParserSuite extends AnalysisTest {
import CatalystSqlParser._
Expand Down Expand Up @@ -2500,4 +2500,27 @@ class DDLParserSuite extends AnalysisTest {

testCreateOrReplaceDdl(sql, expectedTableSpec, expectedIfNotExists = false)
}

test("SPARK-33474: Support TypeConstructed partition spec value") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Support typed literals as partition spec values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Support typed literals as partition spec values

Both done

def insertPartitionPlan(part: String): InsertIntoStatement = {
InsertIntoStatement(
UnresolvedRelation(Seq("t")),
Map("part" -> Some(part)),
Seq.empty[String],
UnresolvedInlineTable(Seq("col1"), Seq(Seq(Literal("a")))),
overwrite = false, ifPartitionNotExists = false)
}

val dateTypeSql = "INSERT INTO t PARTITION(part = date'2019-01-02') VALUES('a')"
val interval = new CalendarInterval(7, 1, 1000).toString
val intervalTypeSql = s"INSERT INTO t PARTITION(part = interval'$interval') VALUES('a')"
val timestamp = "2019-01-02 11:11:11"
val timestampTypeSql = s"INSERT INTO t PARTITION(part = timestamp'$timestamp') VALUES('a')"
val binaryTypeSql = s"INSERT INTO t PARTITION(part = X'537061726B2053514C') VALUES('a')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generate the binary of 537061726B2053514C instead of hardcode it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we generate the binary of 537061726B2053514C instead of hardcode it?

Done


comparePlans(parsePlan(dateTypeSql), insertPartitionPlan("2019-01-02"))
comparePlans(parsePlan(intervalTypeSql), insertPartitionPlan(interval))
comparePlans(parsePlan(timestampTypeSql), insertPartitionPlan(timestamp))
comparePlans(parsePlan(binaryTypeSql), insertPartitionPlan("Spark SQL"))
}
}
21 changes: 21 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4021,6 +4021,27 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
}
}
}

test("SPARK-33474: Support TypeConstructed partition spec value") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the test to SQLInsertTestSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the test to SQLInsertTestSuite

Done

withTable("t1", "t2", "t4") {
sql("CREATE TABLE t1(name STRING, part DATE) USING PARQUET PARTITIONED BY (part)")
sql("INSERT INTO t1 PARTITION(part = date'2019-01-02') VALUES('a')")
checkAnswer(sql("SELECT name, CAST(part AS STRING) FROM t1"), Row("a", "2019-01-02"))

sql("CREATE TABLE t2(name STRING, part TIMESTAMP) USING PARQUET PARTITIONED BY (part)")
sql("INSERT INTO t2 PARTITION(part = timestamp'2019-01-02 11:11:11') VALUES('a')")
checkAnswer(sql("SELECT name, CAST(part AS STRING) FROM t2"), Row("a", "2019-01-02 11:11:11"))

val e = intercept[AnalysisException] {
sql("CREATE TABLE t3(name STRING, part INTERVAL) USING PARQUET PARTITIONED BY (part)")
}.getMessage
assert(e.contains("Cannot use interval for partition column"))

sql("CREATE TABLE t4(name STRING, part BINARY) USING CSV PARTITIONED BY (part)")
sql(s"INSERT INTO t4 PARTITION(part = X'537061726B2053514C') VALUES('a')")
checkAnswer(sql("SELECT name, cast(part as string) FROM t4"), Row("a", "Spark SQL"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last thing that I'm concerned about is whether we already have tests corresponding to the @cloud-fan 's last comment.

#30421 (comment)

Let's make sure this feature works correctly:

 - All the literals are supported. Non-literals are forbidden. e.g. part_col=array(1) does not create a string value "array(1)".
 - Null literal is supported. We should use null instead of "null" to represent it.
 - If the literal data type doesn't match the partition column data type, we should do type check and cast like normal table insertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these three concern

All the literals are supported. Non-literals are forbidden. e.g. part_col=array(1) does not create a string value "array(1)".

Now type constructor only support DATE TIMESTAMP INTERVAL X, so that's not a concern.

Null literal is supported. We should use null instead of "null" to represent it.

It's solved since #30538

If the literal data type doesn't match the partition column data type, we should do type check and cast like normal table insertion.

Seems there is auto type conversion. I will check this with UT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm Add some UT case. Got the concern that since now we treat partition spec use string, it's not type safe.
I have chat with @cloud-fan that we can make partition sepc's value as literal. Then it will be more type safe.

So should we still continue this pr or just start work on treat partition value as Literal?

}
}

case class Foo(bar: Option[String])