sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

vivekmenezes · 2017-10-08T03:21:16Z

cockroach-teamcity · 2017-10-08T03:21:22Z

This change is

dt

LGTM, though I don't remember all the subtle cases with name re-use that @andreimatei talked about the last time we were playing with the drop code.

andreimatei · 2017-11-01T22:11:12Z

With this patch, is it possible for the following?

node A: drop table foo; create table foo; insert into foo (1);
node B: insert into foo(2);

And when the dust settles, foo only contains (1)?

If so, is that OK? Your comment in the code seems to say that it is, but it sounds wrong to me. Moreover, it seems to me that this behavior is possible even if the txn doing the insert on node B synchronized with a transaction on node A happening after the drop. No?

Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

vivekmenezes · 2017-11-02T18:14:24Z

@andreimatei I've updated the comment to address your concern . Do let me know if you have other concerns.

Review status: 0 of 9 files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

vivekmenezes · 2017-11-02T18:35:39Z

actually my comment is incorrect #18354 but if that issue does get fixed it will be accurate. It is safer to leave the comment in as is.

Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

vivekmenezes · 2017-11-03T14:20:50Z

can I merge this?

andreimatei · 2017-11-03T16:27:36Z

the insert will fail because a DROP only returns back
// to the user once it has confirmed that the table descriptor has been
// purged from every cache across the cluster.
// In particular, the DROP
// is executed through two kv transactions that map to one SQL transaction;
// the first kv transaction places the table in the DROPPED state, and
// the second increments the descriptor version and waits until the previous
// version has been purged from the cluster.

The moment when the DROP returns to the client seems irrelevant to me.
Would the following be possible with this patch:

client 1 (using node A): DROP TABLE foo*;
client 2 (using node A), concurrently with the statement above: SHOW CREATE foo; -> error table does not exist
client 2 (using node A): CREATE TABLE foo**;
client 2 gives a "causality token" to client 3;
client 3 (using node B): INSERT INTO foo;  -> insert into foo*, instead of inserting into foo**

Reviewed 1 of 9 files at r1.
Review status: 1 of 9 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

vivekmenezes · 2017-11-07T20:52:33Z

I've completed reworked this PR such that all sections of the code reusing a dropped name within the
transaction are able to do so. I've also added a test TestInsertWhileTableBeingDropCreated to describe
the limitations of this change.

Review status: 0 of 8 files reviewed at latest revision, all discussions resolved, some commit checks pending.

Comments from Reviewable

vivekmenezes · 2017-11-07T20:55:15Z

@andreimatei thanks for pointing out the problem with the original change. Hope this is good enough. I think while it has one limitation it's still a major benefit for our users.

This change allows general name reuse within a transaction and also covers CREATE VIEW and the ALTER TABLE RENAME cases. fixes cockroachdb#12123

andreimatei · 2017-11-08T19:39:42Z

When we were chatting offline just now, I had confused myself when talking about causality tokens. Causality token are needed only when caring about the ordering of non-overlapping transactions from the perspective of a 3rd party observer. No such confusion should be at play when discussing about the consistency hazards of this change. Here, the most immediate problem is, I believe, akin to a read not seeing the latest write:

3 transactions executed serially, even potentially by a single client:

txn 1: "insert into foo* values (1)"
txn 2: "begin; drop foo*; create foo**; insert into foo** values (2); commit"
txn 3: "select * from foo**" -> returns 1; not seeing the latest value

Not sure how this fares with serializability, but I believe this is a violation of crdb's "linearizability on individual keys" guarantee (related to seeing your own writes).

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.

pkg/sql/session.go, line 1418 at r3 (raw file):

// transaction and a user is running another transaction inserting data
// through node B in a coordinated manner such that the insert is executed
// after the create is complete, the insert will normally add data into

nit: what does "normally" mean? If nothing, I'd strike it.

Comments from Reviewable

vivekmenezes · 2017-11-08T22:00:34Z

txn 3 will see 2 unless it is directed to another node in which case it might see 1.
#19925 which I just created documents this and has nothing to do with this change.

This change creates a new problem: A transaction that lies in between (timestamp wise) the commit of the DROP-CREATE kv transaction and the execution of the schema change can see the dropped table.

You know what I can put the newly created table in the ADDING state (as is done for tables with FKs) and place it in the public visible state once the old map has been flushed out. At least that way a user sees an error rather than their data disappearing. If you feel that is mergeable I can work on it.

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.

Comments from Reviewable

vivekmenezes · 2017-11-09T02:12:57Z

What I'm suggesting above is not going to work for the RENAME case that this PR also includes and which BTW an important customer has asked for. So I'm not going ahead with that plan. I think the current proposal is pretty good.

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

Comments from Reviewable

andreimatei · 2017-11-09T15:40:05Z

#19925 is a current bug, but not a fundamental one; we can fix it. This PR is a fundamental thing - we're introducing an incoherent cache in crdb. This is akin to embedding memcache in crdb to serve some reads without integrating it with the rest of the system.
My preference would be to explore blocking for the schema change protocol inside the transaction doing the drop/create or rename, even at the cost of either blocking or even failing every other use of the table that's being dropped even while the transaction doing the drop/rename is pending. A sane user would not have any traffic on the table while doing this anyway.

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

Comments from Reviewable

dianasaur323 · 2017-11-09T15:50:54Z

I actually need this for a current use case -> I want to be able to automate a restore, but I can't restore into an existing table, so I have to be able to drop the table and then restore it. Currently, I would have to do this in two separate steps, so there would be some brief downtime.

vivekmenezes · 2017-11-09T16:11:08Z

@andreimatei I think a better solution will be to put the new table in the ADDING state waiting for the old table to get dropped from all caches before making the table public. That will remove RENAME from this PR. I can very easily do that. BTW for the TRUNCATE case we do just that.

Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.

Comments from Reviewable

nvanbenschoten · 2018-01-22T22:17:42Z

Will this change be making it into 2.0?

vivekmenezes · 2018-01-23T02:49:06Z

I doubt it. If we move forward with this change we have to make a decision whether we want 1. The name is transferred in the transaction but is followed by a short period of unavailability for the name, when the old uses of the name are being drained from the cluster, or 2. the name is transferred in the transaction and the name refers to two tables for a short period of time.

This change is going with approach 2, and both approaches can confuse users depending on timing.

The default is to do nothing which is to not support this until some user comes along with a need for this and is okay with approach 2.

vivekmenezes requested review from a team October 8, 2017 03:21

vivekmenezes force-pushed the create branch 2 times, most recently from 1f73a21 to ecad42b Compare October 16, 2017 16:40

vivekmenezes requested a review from dt October 16, 2017 17:35

dt approved these changes Nov 1, 2017

View reviewed changes

vivekmenezes force-pushed the create branch 2 times, most recently from 29551f0 to 4f8fbba Compare November 2, 2017 18:11

sql: remove duplicate write descriptor in RENAME

65ca55b

vivekmenezes force-pushed the create branch from 4f8fbba to 72a5d85 Compare November 7, 2017 20:49

vivekmenezes force-pushed the create branch from 72a5d85 to 16cf8cb Compare November 7, 2017 20:53

sql: allow CREATE TABLE after DROP TABLE in the same txn

7cbfc48

This change allows general name reuse within a transaction and also covers CREATE VIEW and the ALTER TABLE RENAME cases. fixes cockroachdb#12123

vivekmenezes force-pushed the create branch from 16cf8cb to 7cbfc48 Compare November 7, 2017 21:39

nvanbenschoten mentioned this pull request Jan 22, 2018

CockroachDB: infer_schema fails with Database Error (information_schema.referential_constraints does not exist) diesel-rs/diesel#1134

Closed

tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019

jordanlewis closed this Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

vivekmenezes commented Oct 8, 2017

cockroach-teamcity commented Oct 8, 2017

dt left a comment

andreimatei commented Nov 1, 2017

vivekmenezes commented Nov 2, 2017

vivekmenezes commented Nov 2, 2017

vivekmenezes commented Nov 3, 2017

andreimatei commented Nov 3, 2017

vivekmenezes commented Nov 7, 2017

vivekmenezes commented Nov 7, 2017

andreimatei commented Nov 8, 2017

vivekmenezes commented Nov 8, 2017

vivekmenezes commented Nov 9, 2017

andreimatei commented Nov 9, 2017

dianasaur323 commented Nov 9, 2017

vivekmenezes commented Nov 9, 2017 •

edited

Loading

nvanbenschoten commented Jan 22, 2018

vivekmenezes commented Jan 23, 2018

sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

Conversation

vivekmenezes commented Oct 8, 2017

cockroach-teamcity commented Oct 8, 2017

dt left a comment

Choose a reason for hiding this comment

andreimatei commented Nov 1, 2017

vivekmenezes commented Nov 2, 2017

vivekmenezes commented Nov 2, 2017

vivekmenezes commented Nov 3, 2017

andreimatei commented Nov 3, 2017

vivekmenezes commented Nov 7, 2017

vivekmenezes commented Nov 7, 2017

andreimatei commented Nov 8, 2017

vivekmenezes commented Nov 8, 2017

vivekmenezes commented Nov 9, 2017

andreimatei commented Nov 9, 2017

dianasaur323 commented Nov 9, 2017

vivekmenezes commented Nov 9, 2017 • edited Loading

nvanbenschoten commented Jan 22, 2018

vivekmenezes commented Jan 23, 2018

vivekmenezes commented Nov 9, 2017 •

edited

Loading