-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: allow CREATE TABLE after DROP TABLE in the same txn #19112
Conversation
1f73a21
to
ecad42b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I don't remember all the subtle cases with name re-use that @andreimatei talked about the last time we were playing with the drop code.
With this patch, is it possible for the following?
And when the dust settles, foo only contains (1)? If so, is that OK? Your comment in the code seems to say that it is, but it sounds wrong to me. Moreover, it seems to me that this behavior is possible even if the txn doing the insert on node B synchronized with a transaction on node A happening after the drop. No? Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
29551f0
to
4f8fbba
Compare
@andreimatei I've updated the comment to address your concern . Do let me know if you have other concerns. Review status: 0 of 9 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
actually my comment is incorrect #18354 but if that issue does get fixed it will be accurate. It is safer to leave the comment in as is. Review status: 0 of 9 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
can I merge this? |
The moment when the DROP returns to the client seems irrelevant to me.
Reviewed 1 of 9 files at r1. Comments from Reviewable |
I've completed reworked this PR such that all sections of the code reusing a dropped name within the Review status: 0 of 8 files reviewed at latest revision, all discussions resolved, some commit checks pending. Comments from Reviewable |
@andreimatei thanks for pointing out the problem with the original change. Hope this is good enough. I think while it has one limitation it's still a major benefit for our users. |
This change allows general name reuse within a transaction and also covers CREATE VIEW and the ALTER TABLE RENAME cases. fixes cockroachdb#12123
When we were chatting offline just now, I had confused myself when talking about causality tokens. Causality token are needed only when caring about the ordering of non-overlapping transactions from the perspective of a 3rd party observer. No such confusion should be at play when discussing about the consistency hazards of this change. Here, the most immediate problem is, I believe, akin to a read not seeing the latest write: 3 transactions executed serially, even potentially by a single client: txn 1: "insert into foo* values (1)" Not sure how this fares with serializability, but I believe this is a violation of crdb's "linearizability on individual keys" guarantee (related to seeing your own writes). Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. pkg/sql/session.go, line 1418 at r3 (raw file):
nit: what does "normally" mean? If nothing, I'd strike it. Comments from Reviewable |
txn 3 will see 2 unless it is directed to another node in which case it might see 1. This change creates a new problem: A transaction that lies in between (timestamp wise) the commit of the DROP-CREATE kv transaction and the execution of the schema change can see the dropped table. You know what I can put the newly created table in the ADDING state (as is done for tables with FKs) and place it in the public visible state once the old map has been flushed out. At least that way a user sees an error rather than their data disappearing. If you feel that is mergeable I can work on it. Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. Comments from Reviewable |
What I'm suggesting above is not going to work for the RENAME case that this PR also includes and which BTW an important customer has asked for. So I'm not going ahead with that plan. I think the current proposal is pretty good. Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. Comments from Reviewable |
#19925 is a current bug, but not a fundamental one; we can fix it. This PR is a fundamental thing - we're introducing an incoherent cache in crdb. This is akin to embedding memcache in crdb to serve some reads without integrating it with the rest of the system. Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. Comments from Reviewable |
I actually need this for a current use case -> I want to be able to automate a restore, but I can't restore into an existing table, so I have to be able to drop the table and then restore it. Currently, I would have to do this in two separate steps, so there would be some brief downtime. |
@andreimatei I think a better solution will be to put the new table in the ADDING state waiting for the old table to get dropped from all caches before making the table public. That will remove RENAME from this PR. I can very easily do that. BTW for the TRUNCATE case we do just that. Review status: 0 of 8 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. Comments from Reviewable |
Will this change be making it into 2.0? |
I doubt it. If we move forward with this change we have to make a decision whether we want 1. The name is transferred in the transaction but is followed by a short period of unavailability for the name, when the old uses of the name are being drained from the cluster, or 2. the name is transferred in the transaction and the name refers to two tables for a short period of time. This change is going with approach 2, and both approaches can confuse users depending on timing. The default is to do nothing which is to not support this until some user comes along with a need for this and is okay with approach 2. |
fixes #12123