[SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 #24798

mccheah · 2019-06-04T23:06:20Z

What changes were proposed in this pull request?

Implements the REPLACE TABLE and REPLACE TABLE AS SELECT logical plans. REPLACE TABLE is now a valid operation in spark-sql provided that the tables being modified are managed by V2 catalogs.

This also introduces an atomic mix-in that table catalogs can choose to implement. Table catalogs can now implement TransactionalTableCatalog. The semantics of this API are that table creation and replacement can be "staged" and then "committed".

On the execution of REPLACE TABLE AS SELECT, REPLACE TABLE, and CREATE TABLE AS SELECT, if the catalog implements transactional operations, the physical plan will use said functionality. Otherwise, these operations fall back on non-atomic variants. For REPLACE TABLE in particular, the usage of non-atomic operations can unfortunately lead to inconsistent state.

How was this patch tested?

Unit tests - multiple additions to DataSourceV2SQLSuite.

SparkQA · 2019-06-04T23:17:46Z

Test build #106170 has finished for PR 24798 at commit 266784e.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ReplaceTable(
case class ReplaceTableAsSelect(
case class ReplaceTableStatement(
case class ReplaceTableAsSelectStatement(
case class ReplaceTableExec(
case class ReplaceTableAsSelectExec(
trait StagedTableWriteExec extends V2TableWriteExec

mccheah · 2019-06-04T23:32:57Z

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

+          ident, query.schema, partitioning.toArray, properties.asJava)
+        writeToStagedTable(stagedTable, writeOptions, ident)
+      case _ =>
+        // Note that this operation is potentially unsafe, but these are the strict semantics of


I think we talked about this, and we concluded that this is the appropriate behavior - but I'm still not sure it is wise to support an inherently unsafe and potentially inconsistent operation. It's worth considering if we should throw UnsupportedOperationException here.

Yeah, I'm still on the fence about this, too.

mccheah · 2019-06-04T23:37:21Z

sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/StagedTable.java

+
+public interface StagedTable extends Table {
+
+  void commitStagedChanges();


It's not immediately obvious if this API belongs in StagedTable, or if it should be tied to the BatchWrite's commit() operation. The idea I had with tying it to StagedTable is:

Make the atomic swap part more explicit from the perspective of the physical plan execution, and

Allow both StagedTable and Table to share the same WriteBuilder and BatchWrite implementations that persist the rows, and decouple the atomic swap in this module only.

If we wanted to move the swap implementation behind the BatchWrite#commit and BatchWrite#abort APIs, then it's worth asking if we need the StagedTable interface at all - so TransactionalTableCatalog would return plain Table objects.

I like this. So the write's commit stashes changes in the staged table, which can finish or roll back.

This also solves the problem of where to document how to complete the changes staged in a StagedTable. Can you add docs that describe what these methods should do, and for the StagedTable interface?

mccheah · 2019-06-04T23:42:05Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TransactionalTableCatalog.java

+import org.apache.spark.sql.sources.v2.StagedTable;
+import org.apache.spark.sql.types.StructType;
+
+public interface TransactionalTableCatalog {


TransactionalTableCatalog is proposed in the SPIP, but we don't really encode any formal notion of transactions in these APIs. Transactionality has a particular connotation in the DBMS nomenclature, e.g. START TRANSACTION statements. Perhaps we can rename this to AtomicTableCatalog or SupportsAtomicOperations?

I think renaming this is a good idea. How about StagingTableCatalog? The main capability it introduces is staging a table so that it can be used for a write, but doesn't yet exist.

mccheah · 2019-06-05T02:01:11Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

        (AS? query)?                                                   #createHiveTable
    | CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier
        LIKE source=tableIdentifier locationSpec?                      #createTableLike
+    | replaceTableHeader ('(' colTypeList ')')? tableProvider


Are there other flavors of REPLACE TABLE that we need to support?

I'm not sure that we should support all of what's already here, at least not to begin with.

I think that the main use of REPLACE TABLE as an atomic operation is REPLACE TABLE ... AS SELECT. That's because the replacement should only happen if the write succeeds and the write could easily fail for a lot of reasons. Without a write, this is just syntactic sugar for a combined drop and create.

I think the initial PR should focus on just the RTAS case. That simplifies this because it no longer needs the type list. What do you think?

I think this should support the USING clause that is used to pass the provider name.

Because of the tableProvider field at the end I think USING is still supported right? As mentioned elsewhere, this is copied from CTAS.

Is there a test for it?

SparkQA · 2019-06-05T02:47:35Z

Test build #106171 has finished for PR 24798 at commit baeabc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-05T05:06:14Z

Test build #106178 has finished for PR 24798 at commit bc8d3b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class InMemoryTable(
class BufferedRows extends WriterCommitMessage with InputPartition with Serializable

mccheah · 2019-06-05T21:52:57Z

@rdblue @gatorsmile @HyukjinKwon this should be ready to go now, modulo the questions I've posted inline.

SparkQA · 2019-06-05T23:13:07Z

Test build #106213 has finished for PR 24798 at commit 6c958b9.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-06-06T01:12:40Z

Test build #106216 has finished for PR 24798 at commit 8c0270f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2019-06-06T21:43:23Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

+    | replaceTableHeader ('(' colTypeList ')')? tableProvider
+        ((OPTIONS options=tablePropertyList) |
+        (PARTITIONED BY partitioning=transformList) |
+        bucketSpec |


Should bucketing be added using BUCKET BY? Or should we rely on bucket as a transform in the PARTITIONED BY clause?

Same as #24798 (comment) - this is copied from the create table spec.

rdblue · 2019-06-06T21:46:07Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

    | CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier
        LIKE source=tableIdentifier locationSpec?                      #createTableLike
+    | replaceTableHeader ('(' colTypeList ')')? tableProvider
+        ((OPTIONS options=tablePropertyList) |


Should OPTIONS be supported in v2? Right now, we copy options into table properties because v2 has no separate options. I also think it is confusing to users that there are table properties and options.

In general I copied this entirely from the equivalent create table statement. How does the syntax for REPLACE TABLE differ from that of the existing CREATE TABLE? My understanding is REPLACE TABLE is exactly equivalent to CREATE TABLE with the exception of not having an IF NOT EXISTS option.

The chunk I copied I believe is https://github.com/apache/spark/pull/24798/files/bc8d3b568988c904dff4f31a19f6dc0aa33f4a8b#diff-8c1cb2af4aa1109e08481dae79052cc3R93.

True, it should be the same a CREATE TABLE. That's a good reason to carry this forward.

rdblue · 2019-06-06T23:06:51Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

    ;

+replaceTableHeader
+    : REPLACE TEMPORARY? TABLE multipartIdentifier


I'd probably remove TEMPORARY to begin with. What is the behavior of a temporary table? I think it used to be a view.

Actually, it looks fine since this is not allowed in the AST builder.

Let's get rid of TEMPORARY TABLE. It was a mistake and we've almost removed everything about TEMPORARY TABLE in Spark, only a few parser rules are left for backward compatibility reason.

To clarify, there is no TEMPORARY TABLE in Spark, it never had. Spark only has TABLE, VIEW and TEMP VIEW.

rdblue · 2019-06-06T23:10:13Z

...t/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ReplaceTableStatement.scala

+    location: Option[String],
+    comment: Option[String]) extends ParsedStatement {
+
+  override def output: Seq[Attribute] = Seq.empty


ParsedStatement now defaults these methods, so you can remove them.

rdblue · 2019-06-06T23:12:50Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala

-            s"got ${other.getClass.getName}: $sql")
+  test("create/replace table using - schema") {
+    val createSql = "CREATE TABLE my_tab(a INT COMMENT 'test', b STRING) USING parquet"
+    val replaceSql = "CREATE TABLE my_tab(a INT COMMENT 'test', b STRING) USING parquet"


rdblue · 2019-06-06T23:20:52Z

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

      throw new TableAlreadyExistsException(ident)
    }
+    catalog match {
+      case txnCatalog: TransactionalTableCatalog =>


Because so little is shared between the two implementations, I think I would probably separate them into different exec nodes. An added benefit of that is that the physical plan would tell users whether Spark is going to use an atomic operation. That way I could check EXPLAIN and run if it is atomic or do more testing if it is not.

rdblue · 2019-06-06T23:24:24Z

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

+          stagedTable.commitStagedChanges()
+          writtenRows
+        case _ =>
+          // table does not support writes


Could you also note that the catch block will abort the staged changes?

rdblue · 2019-06-06T23:27:57Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala

  before {
    spark.conf.set("spark.sql.catalog.testcat", classOf[TestInMemoryTableCatalog].getName)
+    spark.conf.set(
+        "spark.sql.catalog.testcatatomic", classOf[TestTransactionalInMemoryCatalog].getName)


Nit: consider adding an underscore to make the catalog name more readable.

rdblue · 2019-06-06T23:30:46Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala

    checkAnswer(spark.internalCreateDataFrame(rdd, table.schema), spark.table("source"))
  }

+  test("ReplaceTableAsSelect: basic v2 implementation using atomic catalog.") {


Nit: these test cases are dense because they have no new lines. Blank lines between tasks, like creating the original table, replacing it, and assertions, would help readability.

rdblue · 2019-06-06T23:33:46Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala

+    checkAnswer(
+        spark.internalCreateDataFrame(rdd, replacedTable.schema),
+        spark.table("source").select("id"))
+  }


All of the success cases should be applied to both atomic and non-atomic catalogs because we expect a difference in behavior only in failure cases.

I modified some of the success cases, don't know there are more that need to be adjusted. Think we don't have to be completely exhaustive here.

rdblue · 2019-06-06T23:35:44Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala

+        s" AS SELECT id FROM source")
+    }
+    val replacedTable = testCatalog.loadTable(Identifier.of(Array(), "table_name"))
+    assert(replacedTable != table, "Table should have been replaced.")


I think a better test assertion is that the schema matches the new table. This test could be true for the same underlying metadata if two separate instances of a table are loaded from a catalog.

rdblue · 2019-06-06T23:37:29Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala

+  }
+
+  test("ReplaceTableAsSelect: Non-atomic catalog creates the empty table, but leaves the" +
+    " table empty if the write fails.") {


Why isn't the table dropped in this case? I would expect this to have the behavior of non-atomic CTAS after the initial delete.

I think the behavior here is ambiguous. Suppose then another user went and started writing to the table concurrently - should this job drop the table that the other job is writing to?

I think the intent of RTAS is to run a combined DROP TABLE and CREATE TABLE ... AS SELECT .... CTAS doesn't worry about concurrent writes because the table doesn't "exist" until the write completes. That's why we delete after a CTAS if the write fails, even though it also has a non-atomic case where the table exists and could technically be written to concurrently.

Hm so I took a closer look at this. It turns out that using Utils.tryWithSafeFinallyAndFailureCallbacks is risky in all of these code paths as it is currently implemented.

That method, when it tries to run the catch block, will first try to set the failure reason on the task context via TaskContext.get().markTaskFailed. But since we're running this try...finally block on the driver, there is no such task context to get via TaskContext.get.

What happens in this case then is that this test passes when it should fail, because indeed, the table should be dropped. But the catch block that drops the table never gets run, because TaskContext.get().markTaskFailed NPEs before the catch block can be run.

I think there's a few ways forward:

Don't use the Utils method to do try-catch-finally

Patch the Utils method to check for null on the current task context before trying to mark the task failure reason on it.

I'm going with 2) for now, but 1) is very reasonable as well.

Either way, yeah the table should end up being dropped at the end, so this test also has to be patched.

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala

brkyvz · 2019-07-16T21:32:34Z

Approach and interface LGTM! +9000 on "keep[ing] commits smaller and more focused." in the future. Would really help speed up the development cycle.

SparkQA · 2019-07-16T22:26:22Z

Test build #107756 has finished for PR 24798 at commit 0b5c029.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-07-17T01:41:21Z

Test build #107764 has finished for PR 24798 at commit 581dba2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-17T12:40:54Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/StagingTableCatalog.java

+   * @param partitions transforms to use for partitioning data in the table
+   * @param properties a string map of table properties
+   * @return metadata for the new table
+   * @throws TableAlreadyExistsException If a table or view already exists for the identifier


I'm a little confused here, replace table does require the table exists, right?

cloud-fan · 2019-07-17T13:01:03Z

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

+ *
+ * A new table will be created using the schema of the query, and rows from the query are appended.
+ * If the table exists, its contents and schema should be replaced with the schema and the contents
+ * of the query. This is a non-atomic implementation that drops the table and then runs non-atomic


According to https://github.com/apache/spark/pull/24798/files#r302746896 , this is a broken implementation. RTAS should be able to query any existing tables, including the one that is being replaced. If we do want to have a non-atomic version, how about

create a table with a random but unique name (like UUID), insert data to it

drop the target table

rename the table created in step 1 to the target table.

IMO the non-atomic version is allowed to have undefined behavior when failure happens middle way. But it should work as the atomic version if no failure happens.

That comment applies only to RTAS queries that read the table that will be replaced. We can fix that in a follow-up.

cloud-fan · 2019-07-17T13:05:35Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala

+  def maybeSimulateFailedTableCreation(tableProperties: util.Map[String, String]): Unit = {
+    if (tableProperties.containsKey(TestInMemoryTableCatalog.SIMULATE_FAILED_CREATE_PROPERTY)
+      && tableProperties.get(TestInMemoryTableCatalog.SIMULATE_FAILED_CREATE_PROPERTY)
+      .equalsIgnoreCase("true")) {


nit: we can just write "true".equalsIgnoreCase(tableProperties.get(TestInMemoryTableCatalog.SIMULATE_FAILED_CREATE_PROPERTY))

if the key doesn't exist, "true".equalsIgnoreCase(null) returns false.

cloud-fan · 2019-07-17T13:07:26Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala

+
+    override def commitStagedChanges(): Unit = {
+      if (replaceIfExists) {
+        tables.put(ident, delegateTable)


nit: when committing REPLACE TABLE, we should fail if the table is already dropped by others.

cloud-fan · 2019-07-17T13:09:16Z

Agree with @brkyvz that it's too late to split as this PR has already got many reviews. Please try to keep the PR smaller and more focused next time.

Generally looks good, only a few comments.

cloud-fan · 2019-07-18T02:01:45Z

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala

+    extends StagedTable with SupportsWrite with SupportsRead {
+
+    override def commitStagedChanges(): Unit = {
+      if (droppedTables.contains(ident)) {


it's weird to record all the dropped tables in the history. I think a simple version is

if (replaceIfExists) { if (!tables.containsKey(ident)) { throw new RuntimeException("table already dropped") } tables.put(ident, delegateTable) }

That doesn't work because the implementation of stageCreate doesn't actually put the table in the tables map at all. So you can't necessarily say the table was dropped just because the table is not in the tables map.

It's unclear to me if this is the correct behavior - if something dropped the table from underneath this, the subsequent commit of the replace or atomic-create operation should have the final say, right?

Think about a REPLACE TABLE and DROP TABLE happen at the same time. It doesn't matter which one gets executed first, but the final result must be reachable by one certain execution order.

If REPLACE TABLE executes first, then there should be no table at the end as it's dropped.
If DROP TABLE executes first, then REPLACE TABLE should fail and there is still no table at the end.

BTW I think my proposal works for REPLACE TABLE right? stageCreate is for CTAS and I think your current code(without tracking dropped tables) already works

I thought about this a bit more and chatted with @rdblue and I realized the confusion is that the staging catalog API doesn't even support passing through the orCreate flag to the catalog. I think we need to pass this information along to the catalog, otherwise the catalog won't know that the user wanted CREATE OR REPLACE semantics.

I'm more inclined to add an extra method to StagingTableCatalog called stageCreateOrReplace, in addition to the other methods we have here already. Then the behavior of commitStagedChanges depends on whether or not the table was instantiated via stageCreateOrReplace vs. stageReplace vs. stageCreate.

SparkQA · 2019-07-18T04:29:13Z

Test build #107806 has finished for PR 24798 at commit be04476.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…priate semantics.

mccheah · 2019-07-19T01:09:21Z

Latest patch adds stageCreateOrReplace to the staging catalog API.

SparkQA · 2019-07-19T05:00:50Z

Test build #107871 has finished for PR 24798 at commit 2f6e0b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-19T06:09:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+   *
+   * Expected format:
+   * {{{
+   *   REPLACE TABLE [IF NOT EXISTS] [db_name.]table_name


this doesn't match the actual syntax now.

cloud-fan · 2019-07-19T06:14:30Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/StagingTableCatalog.java

+   * @param properties a string map of table properties
+   * @return metadata for the new table
+   * @throws UnsupportedOperationException If a requested partition transform is not supported
+   * @throws NoSuchNamespaceException If the identifier namespace does not exist (optional)


I think the implementation should throw TableNotFoundException if the table to replace doesn't exist.

cloud-fan · 2019-07-19T06:16:06Z

...e/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala

+    orCreate: Boolean) extends AtomicTableWriteExec {
+
+  override protected def doExecute(): RDD[InternalRow] = {
+    val stagedTable = if (catalog.tableExists(ident)) {


I think we can simplify this to

val stagedTable = if (orCreate) { catalog.stageCreateOrReplace( ident, query.schema, partitioning.toArray, properties.asJava) } else { catalog.stageReplace( ident, query.schema, partitioning.toArray, properties.asJava) }

stageReplace should throw exception itself if the table doesn't exist. The implementation already needs to do it before committing, it doesn't hurt to also do it at the beginning.

I disagree that there is no need to check whether the table exists. We had a similar discussion on CREATE TABLE. Spark should check existence to ensure that the error is consistently thrown. If the table does not exist and orCreate is false, then Spark should thrown an exception and not rely on the source to do it.

That said, I think it would be simpler to update the logic a little:

if (orCreate) { catalog.stageCreateOrReplace( ident, query.schema, partitioning.toArray, properties.asJava) } else if (catalog.tableExists(ident) { catalog.stageReplace( ident, query.schema, partitioning.toArray, properties.asJava) } else { throw new CannotReplaceMissingTableException(ident) }

It's minor so I don't want to block this PR on it, but Spark is unable to make sure the error is consistently thrown because anything can happen after you check the table existence and before you do the actual operation.

That said, this is just a best-effort, which is not that useful as it's not a guarantee.

mccheah · 2019-07-19T19:08:05Z

Updated some docs and cleaned up implementations based on comments.

mccheah · 2019-07-19T19:11:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplaceTableExec.scala

+      try {
+        catalog.stageReplace(
+          identifier, tableSchema, partitioning.toArray, tableProperties.asJava)
+      } catch {


The try...catch here is more for flavor and consistency - since @cloud-fan suggested that StagingTableCatalog#stageReplace should be able to throw NoSuchTableException, which could theoretically happen if the table is dropped between the above tableExists call and catalog.stageReplace calls. This ensures that the same type of exception is thrown from the code path for the same kind of illegal state.

SparkQA · 2019-07-19T23:29:16Z

Test build #107924 has finished for PR 24798 at commit 05a827d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-22T04:09:27Z

thanks, merging to master!

…T with V2 ## What changes were proposed in this pull request? Implements the `REPLACE TABLE` and `REPLACE TABLE AS SELECT` logical plans. `REPLACE TABLE` is now a valid operation in spark-sql provided that the tables being modified are managed by V2 catalogs. This also introduces an atomic mix-in that table catalogs can choose to implement. Table catalogs can now implement `TransactionalTableCatalog`. The semantics of this API are that table creation and replacement can be "staged" and then "committed". On the execution of `REPLACE TABLE AS SELECT`, `REPLACE TABLE`, and `CREATE TABLE AS SELECT`, if the catalog implements transactional operations, the physical plan will use said functionality. Otherwise, these operations fall back on non-atomic variants. For `REPLACE TABLE` in particular, the usage of non-atomic operations can unfortunately lead to inconsistent state. ## How was this patch tested? Unit tests - multiple additions to `DataSourceV2SQLSuite`. Closes apache#24798 from mccheah/spark-27724. Authored-by: mcheah <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

mccheah added 2 commits June 3, 2019 17:38

Atomic CTAS

9a21551

Wire together the rest of replace table logical plans.

266784e

Remove redundant code

baeabc8

mccheah commented Jun 4, 2019

View reviewed changes

Some unit tests

bc8d3b5

mccheah commented Jun 5, 2019

View reviewed changes

DDL parser tests for replace table

6c958b9

mccheah changed the title ~~[SPARK-27724][WIP] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2~~ [SPARK-27724] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 Jun 5, 2019

Merge remote-tracking branch 'origin/master' into spark-27724

8c0270f

dongjoon-hyun changed the title ~~[SPARK-27724] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2~~ [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 Jun 5, 2019

rdblue reviewed Jun 6, 2019

View reviewed changes

brkyvz reviewed Jul 16, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala Outdated Show resolved Hide resolved

brkyvz reviewed Jul 16, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala Outdated Show resolved Hide resolved

Address comments

581dba2

cloud-fan reviewed Jul 17, 2019

View reviewed changes

Address comments

be04476

cloud-fan reviewed Jul 18, 2019

View reviewed changes

Add stageCreateOrReplace to indicate to the staging catalog the appro…

609eb9c

…priate semantics.

Revert droppedTables stuff

2f6e0b6

cloud-fan reviewed Jul 19, 2019

View reviewed changes

Address comments

05a827d

mccheah commented Jul 19, 2019

View reviewed changes

cloud-fan closed this in 7ed0088 Jul 22, 2019

cloud-fan mentioned this pull request Jan 9, 2020

[SPARK-30214][SQL] A new framework to resolve v2 commands #26847

Closed


		public interface StagedTable extends Table {

		void commitStagedChanges();

[SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 #24798

[SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 #24798

Uh oh!

Conversation

mccheah commented Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 5, 2019

Uh oh!

SparkQA commented Jun 5, 2019

Uh oh!

mccheah commented Jun 5, 2019

Uh oh!

SparkQA commented Jun 5, 2019

Uh oh!

SparkQA commented Jun 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Jun 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

mccheah commented Jun 4, 2019 •

edited

Loading

rdblue Jun 6, 2019 •

edited

Loading

rdblue Jun 7, 2019 •

edited

Loading

mccheah Jun 7, 2019 •

edited

Loading