Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic cutover is unsound on mysql 5.6.41 #653

Closed
syrnick opened this issue Oct 21, 2018 · 7 comments
Closed

Atomic cutover is unsound on mysql 5.6.41 #653

syrnick opened this issue Oct 21, 2018 · 7 comments

Comments

@syrnick
Copy link

syrnick commented Oct 21, 2018

We observed gh-ost not performing migration atomically and I was investigating possible causes. I was reading the description of the atomic cutover #82 and found that it relies on undocumented MySQL internal engine behavior. I tested it on 5.6.41 and observed behavior that breaks the assumptions necessary for the cutover to be atomic. Here's a basic example.

Setup: Mysql 5.6.41 (on Mac OS), 3 interactive sessions. First session does the write lock, second session does the INSERT, third session does ALTER TABLE RENAME.

Expected - INSERT is blocked until after RENAME goes through and errors out.

Observed - INSERT goes through before RENAME.

Session 1

-- step 1 [session 1]: prepare by creating the table
CREATE TABLE `reallyBigTable` (
  `autoincr_id` int(11) NOT NULL AUTO_INCREMENT,
  `id` char(17) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  PRIMARY KEY (`autoincr_id`),
  UNIQUE KEY `IDX_id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;


-- step 3 [session 1]
LOCK TABLE reallyBigTable WRITE;

select * from reallyBigTable;

-- step 6 [session 1]
unlock tables;

Session 2

-- step 2 [session 2]
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET AUTOCOMMIT=0;
START TRANSACTION;

-- select * from reallyBigTable where autoincr_id = 1;


-- step 4 [session 2]


INSERT INTO `reallyBigTable` (`autoincr_id`, `id`)
VALUES
    (20000, 'DkiIkDL5PE5VIyTyk');
    
-- step 7 [session 2]

COMMIT;

Session 3

-- step 5 [session 3]
alter table reallyBigTable rename to reallyBigTable_old;


-- step 8 [session 3]
select * from reallyBigTable_old;

-- expected: 0 records
-- actual: 1 record

In this example, DML was allowed to go through between WRITE lock being released and ALTER TABLE RENAME taking effect. All those DMLs would be lost by gh-ost during cutover.

Obviously, the right thing is to RENAME from the connection holding the lock. That's available in 8.x only.

For other versions, If this is specific to only some mysql versions, it needs to be documented with a giant warning sign and ideally gh-ost should explicitly assert that it runs against only supported version. It could possibly run the provided test explicitly before migration to catch change in behavior. Discussion in #82 suggests that it's a highly stable behavior, but my test contradicts that.

@shlomi-noach
Copy link
Contributor

Thank you for this well written report! I will proceed to test it.

@shlomi-noach
Copy link
Contributor

Sorry, this report does not in fact reproduce the atomic cut-over in gh-ost. gh-ost uses an additional control table, and a multi-table locking mechanism, as well as using a rename statement as opposed to a alter table ... rename (this last fact may be just syntactic sugar).

I see you use a singular LOCK TABLE statement, where gh-ost uses LOCK TABLES tbl WRITE, tbl_old WRITE. You use a singular RENAME where gh-ost uses RENAME TABLE tbl TO tbl_old, ghost TO tbl.

Would you like to try and reproduce using the exact gh-ost cut-over logic? I'm happy to look deeply into an exact reproduction of the logic that finds a flaw.

@syrnick
Copy link
Author

syrnick commented Oct 23, 2018

I will try to get the same behavior with exact cut-over that gh-ost uses. I just found and reported a specific counter-example to this statement in #82

  • A blocked RENAME is always prioritized over a blocked INSERT/UPDATE/DELETE, no matter who came first

I haven't read mysql lock management code to judge how correct that statement is, but this example shows that it's not generally true.

@shlomi-noach
Copy link
Contributor

I just found and reported a specific counter-example to this statement

Agreed, and concerning. At this exact moment my availability to experiment this on my own is limited or zero. I can look into it another week, or if you are capable to reconstruct the problem with the gh-ost flow that would be helpful. I appreciate your contribution!

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Oct 28, 2018

@syrnick I'm able to reproduce your use case on a 5.7.17 server.

However, the error does not reproduce on the logic used by gh-ost. Please consider the following flow:

session 1

-- step 1 [session 1]
drop table if exists `reproduce`;
drop table if exists `reproduce_old`;
drop table if exists `reproduce_gho`;
CREATE  TABLE `reproduce` (
  `autoincr_id` int(11) NOT NULL AUTO_INCREMENT,
  `id` char(17) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
  PRIMARY KEY (`autoincr_id`),
  UNIQUE KEY `IDX_id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8mb4;

insert into `reproduce` values (11, 'initial');
create table `reproduce_old` like `reproduce`;
create table `reproduce_gho` like `reproduce`;
-- step 3 [session 1]
LOCK TABLE reproduce WRITE, reproduce_old WRITE;
-- step 6a [session 1]
DROP TABLE reproduce_old;
-- step 6b [session 1]
unlock tables;

session 2

-- step 2 [session 2]
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET AUTOCOMMIT=0;
START TRANSACTION;
-- step 4 [session 2]
INSERT INTO `reproduce` VALUES (20000, 'DkiIkDL5PE5VIyTyk');
-- step 7 [session 2]
COMMIT;

session 3

-- step 5 [session 3]
rename table reproduce to reproduce_old, reproduce_gho to reproduce;
-- step 8a [session 3]
select * from reproduce_old;

-- expected: 1 record (the "initial" test)
-- actual: 1 record (the "initial" test)

-- step 8b [session 3]
select * from reproduce;

-- expected: 1 record (id = 20000)
-- found: 1 record (id = 20000)

I will explain shortly why this flow, the one that gh-ost uses for atomic cut-over, is inherently different from your use case. I've got to attend to much stuff, but wanted to first clear this off my table.

@syrnick
Copy link
Author

syrnick commented Oct 28, 2018

Thanks! I can confirm that with this flow the cut-over works as intended. I'll try poking holes in it, but I couldn't find anything so far.

@timvaillancourt
Copy link
Collaborator

👋 closing this as only MySQL 5.7+ is supported and this requirement is documented

Let us know if you see problems on the latest-stable and MySQL 5.7+ 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants