Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow CREATE TABLE after DROP TABLE in the same txn #19112

Closed
wants to merge 2 commits into from

Conversation

vivekmenezes
Copy link
Contributor

fixes #12123

@vivekmenezes vivekmenezes requested review from a team October 8, 2017 03:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@vivekmenezes vivekmenezes force-pushed the create branch 2 times, most recently from 1f73a21 to ecad42b Compare October 16, 2017 16:40
@vivekmenezes vivekmenezes requested a review from dt October 16, 2017 17:35
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I don't remember all the subtle cases with name re-use that @andreimatei talked about the last time we were playing with the drop code.

@andreimatei
Copy link
Contributor

With this patch, is it possible for the following?

node A: drop table foo; create table foo; insert into foo (1);
node B: insert into foo(2);

And when the dust settles, foo only contains (1)?

If so, is that OK? Your comment in the code seems to say that it is, but it sounds wrong to me. Moreover, it seems to me that this behavior is possible even if the txn doing the insert on node B synchronized with a transaction on node A happening after the drop. No?


Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@vivekmenezes vivekmenezes force-pushed the create branch 2 times, most recently from 29551f0 to 4f8fbba Compare November 2, 2017 18:11
@vivekmenezes
Copy link
Contributor Author

@andreimatei I've updated the comment to address your concern . Do let me know if you have other concerns.


Review status: 0 of 9 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

actually my comment is incorrect #18354 but if that issue does get fixed it will be accurate. It is safer to leave the comment in as is.


Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

can I merge this?

@andreimatei
Copy link
Contributor

the insert will fail because a DROP only returns back
// to the user once it has confirmed that the table descriptor has been
// purged from every cache across the cluster.
// In particular, the DROP
// is executed through two kv transactions that map to one SQL transaction;
// the first kv transaction places the table in the DROPPED state, and
// the second increments the descriptor version and waits until the previous
// version has been purged from the cluster.

The moment when the DROP returns to the client seems irrelevant to me.
Would the following be possible with this patch:

client 1 (using node A): DROP TABLE foo*;
client 2 (using node A), concurrently with the statement above: SHOW CREATE foo; -> error table does not exist
client 2 (using node A): CREATE TABLE foo**;
client 2 gives a "causality token" to client 3;
client 3 (using node B): INSERT INTO foo;  -> insert into foo*, instead of inserting into foo**

Reviewed 1 of 9 files at r1.
Review status: 1 of 9 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

I've completed reworked this PR such that all sections of the code reusing a dropped name within the
transaction are able to do so. I've also added a test TestInsertWhileTableBeingDropCreated to describe
the limitations of this change.


Review status: 0 of 8 files reviewed at latest revision, all discussions resolved, some commit checks pending.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

@andreimatei thanks for pointing out the problem with the original change. Hope this is good enough. I think while it has one limitation it's still a major benefit for our users.

This change allows general name reuse within a transaction
and also covers CREATE VIEW and the ALTER TABLE RENAME
cases.

fixes cockroachdb#12123
@andreimatei
Copy link
Contributor

When we were chatting offline just now, I had confused myself when talking about causality tokens. Causality token are needed only when caring about the ordering of non-overlapping transactions from the perspective of a 3rd party observer. No such confusion should be at play when discussing about the consistency hazards of this change. Here, the most immediate problem is, I believe, akin to a read not seeing the latest write:

3 transactions executed serially, even potentially by a single client:

txn 1: "insert into foo* values (1)"
txn 2: "begin; drop foo*; create foo**; insert into foo** values (2); commit"
txn 3: "select * from foo**" -> returns 1; not seeing the latest value

Not sure how this fares with serializability, but I believe this is a violation of crdb's "linearizability on individual keys" guarantee (related to seeing your own writes).


Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/session.go, line 1418 at r3 (raw file):

// transaction and a user is running another transaction inserting data
// through node B in a coordinated manner such that the insert is executed
// after the create is complete, the insert will normally add data into

nit: what does "normally" mean? If nothing, I'd strike it.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

txn 3 will see 2 unless it is directed to another node in which case it might see 1.
#19925 which I just created documents this and has nothing to do with this change.

This change creates a new problem: A transaction that lies in between (timestamp wise) the commit of the DROP-CREATE kv transaction and the execution of the schema change can see the dropped table.

You know what I can put the newly created table in the ADDING state (as is done for tables with FKs) and place it in the public visible state once the old map has been flushed out. At least that way a user sees an error rather than their data disappearing. If you feel that is mergeable I can work on it.


Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

What I'm suggesting above is not going to work for the RENAME case that this PR also includes and which BTW an important customer has asked for. So I'm not going ahead with that plan. I think the current proposal is pretty good.


Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@andreimatei
Copy link
Contributor

#19925 is a current bug, but not a fundamental one; we can fix it. This PR is a fundamental thing - we're introducing an incoherent cache in crdb. This is akin to embedding memcache in crdb to serve some reads without integrating it with the rest of the system.
My preference would be to explore blocking for the schema change protocol inside the transaction doing the drop/create or rename, even at the cost of either blocking or even failing every other use of the table that's being dropped even while the transaction doing the drop/rename is pending. A sane user would not have any traffic on the table while doing this anyway.


Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@dianasaur323
Copy link
Contributor

I actually need this for a current use case -> I want to be able to automate a restore, but I can't restore into an existing table, so I have to be able to drop the table and then restore it. Currently, I would have to do this in two separate steps, so there would be some brief downtime.

@vivekmenezes
Copy link
Contributor Author

vivekmenezes commented Nov 9, 2017

@andreimatei I think a better solution will be to put the new table in the ADDING state waiting for the old table to get dropped from all caches before making the table public. That will remove RENAME from this PR. I can very easily do that. BTW for the TRUNCATE case we do just that.


Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Will this change be making it into 2.0?

@vivekmenezes
Copy link
Contributor Author

I doubt it. If we move forward with this change we have to make a decision whether we want 1. The name is transferred in the transaction but is followed by a short period of unavailability for the name, when the old uses of the name are being drained from the cluster, or 2. the name is transferred in the transaction and the name refers to two tables for a short period of time.

This change is going with approach 2, and both approaches can confuse users depending on timing.

The default is to do nothing which is to not support this until some user comes along with a need for this and is okay with approach 2.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: Can't drop and replace a table within a transaction
8 participants