Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the _xxxx_gho table is queried, data may be lost. #887

Open
brightgogo opened this issue Oct 13, 2020 · 24 comments
Open

When the _xxxx_gho table is queried, data may be lost. #887

brightgogo opened this issue Oct 13, 2020 · 24 comments

Comments

@brightgogo
Copy link

brightgogo commented Oct 13, 2020

gh-ost Refer to this document for switching order :#82

Suppose now operating DDL on table a, three temporary tables _a_del, _a_gho, and_a_ghc will be generated.

rename operation is: rename a to _a_del, _a_gho to a,
lock table statement is: lock table a write, _a_del write
According to the order of acquiring the lock:

first acquire the _a_del table lock,
then _a_gho table lock ,
and finally the a table lock.

knk_2020-10-13  12-14-13

Thank you!

@shlomi-noach
Copy link
Contributor

There seems to be a corruption in the original comment. On my mobile GitHub app, I can actually see what the question is. Copy+pasting here (as is, including typos):

If there is a query transaction on the _a_gho table lock that causes the acquisition if the lock to be delayed, other threads may be before rename First obtain the lock of table a, write the data in table, and then perform table rename operation, at this time the data is lost

This situation does not happen. First, gh-ost locks the original table, then, it waits until the _gho table has received all the writes from the original table up to the lock. From that point, there can be no mor ewrites on _gho table. gh-ost then proceeds to lock _gho table.

@wukongHH
Copy link

wukongHH commented Oct 13, 2020

How to repeat :

create table a (id int);
create table _a_del (id int);
create table _a_gho (id int);
session 1 session 2 session 3 session 4
lock table a write, _a_del write;      
  insert into a values (1);    
      begin;select * from _a_gho;
    rename table a to _a_del,_a_gho to a;  
drop table _a_del;unlock tables;      
      commit;

mysql> select * from a;
Empty set (0.00 sec)

mysql> select * from _a_del;
+------+
| id |
+------+
| 1 |
+------+
1 row in set (0.00 sec)

@wukongHH
Copy link

https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html

DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.

@brightgogo
Copy link
Author

Sorry for accidentally closed, reopen.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Oct 13, 2020

@wukongHH

session 3: rename table a to _a_del,_a_gho to a;

when does this statement run? In gh-ost, this statement only runs after _gho table is known to have received the insert from a. Did you mimic this behavior?

DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.

I'm not sure what this quote means to this issue's context. Can you please clarify?

@zhongliangkang
Copy link

zhongliangkang commented Oct 13, 2020

@wukongHH

session 3: rename table a to _a_del,_a_gho to a;

when does this statement run? In gh-ost, this statement only runs after _gho table is known to have received the insert from a. Did you mimic this behavior?

Yes. Rename statement happened after all data sync from table a.
Here the insert operation is from APP SERVERS, this operation comes after gh-ost have got the write lock, normally this insert should be blocked until rename finished.

In our case, there are other sessions acces to the _gho table (got one transaction start before unlock tables , and not finished when unlock), because rename table a to _a_del,_a_gho to a acquiring locks on explicitly named tables in name order, when there got a trx on the _gho table, rename is waiting for the trx on _gho table to commit before acquire table a lock , this cause the insert from APP SERVERS run before rename finished(which should blocked after rename ok), data lost happend here.

Session1 session2 Session3 Session4  
create table a (id int);
create table _a_del(id int);
create table _a_gho(id int);
lock table a write,_a_del write; #OK
       
  insert into a set id=1; #blocked      
    rename table a to _a_del, _a_gho to a;
#blocked
#acquiring locks order:  _a_del, _a_gho, a
#waiting lock of:   _a_del
   
      begin;
select * from _a_gho;
 
drop table _a_del;        
    #get table lock:   _a_del
#waiting lock of: _a_gho  ,cause session 4 trx not commited
   
unlock tables;        
  —OK      
    #now table a with no WRITE lock
#session 2 insert may insert to table a
   
      commit;  
    —OK    
  select * from a;
Empty set (0.00 sec)
     
  select * from _a_del;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0.01 sec)
     

DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.

I'm not sure what this quote means to this issue's context. Can you please clarify?

This is the acquiring locks order of DDL statement from MySQL manual.

@shlomi-noach
Copy link
Contributor

Thank you, I will look into it.

I'm meanwhile curious: who would be reading from _a_gho? This table is unknown to anyone outside gh-ost itself or the engineer that runs the migration.

@zhongliangkang
Copy link

zhongliangkang commented Oct 13, 2020

I'm meanwhile curious: who would be reading from _a_gho? This table is unknown to anyone outside gh-ost itself or the engineer that runs the migration.

--
DB manage program such as table checksum/data replication program/migration table check may access all tables in MySQL.
Table _a_gho is unkown to user, but known for DBAs ,we think this read comes from DBA's program:)

We met a case two days ago which cause a table lost one row after ALTER table through gh-ost.
This is the first time we meet this case, after check ,we think there maybe a risk of data loss if someone access the _gho table(get lock on _a_gho table before unlock tables command) when cut-over。

Thanks for you reply ,waiting for your review result.

@zhongliangkang
Copy link

Thank you, I will look into it.

I'm meanwhile curious: who would be reading from _a_gho? This table is unknown to anyone outside gh-ost itself or the engineer that runs the migration.

hi Shlomi-noach , is there anly update for the case please ^_^?

Sorry to bother, but we look forward fix this ASAP for there is a risk of lossing data, we run gh-ost doing DDL every day。

A fix for the case above we think is just change the _a_ghc and _a_gho to another name which name order is after table a, so the rename waiting to get the lock , there would no time "gap" for DML to get lock and run between unlock tables and the rename

Thanks.

@shlomi-noach
Copy link
Contributor

Hi,

we look forward fix this ASAP

I'm not sure I will be able to provide a fix in the next few days.

A fix for the case above we think is just change the _a_ghc and _a_gho to another name which name order is after table a, so the rename waiting to get the lock , there would no time "gap" for DML to get lock and run between unlock tables and the rename 。

I'm not sure I understand exactly what you mean. Can you please paste the code change? Better yet, if you think you have the correct solution, could you create a pull request?

I'm not the owner of https://github.com/github/gh-ost and will not be able to merge any changes here. But I may merge such changes in https://github.com/openark/gh-ost

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Nov 9, 2020

I can confirm I get the same behavior as yours, but something's strange here:

At first I thought this might be related to the fact your table doesn;t have a PRIMARY KEY; I added a PRIMARY KEY and I'm still getting your results.

I then tries a different table structure, and I got different results, which are good. See below.

Can you please clarify what is the output of your:

select @@tx_isolation;
select @@sql_mode;
select @@version;

I'm using the following:

select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+


select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+

select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.30    |
+-----------+

Here's a different table definition which works correctly:

CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `n` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
create table tdel like t;
create table tgho like t;


session1 session2 session3 session4
lock tables t write, tdel write;
OK
insert into t values(null,13);#blocked
#blocked rename table t to tdel, tgho to t;
#blocked
#blocked #blocked begin;
select * from tgho;
#blocked #blocked OK, transaction open
drop table tdel; #blocked #blocked transaction open
OK #blocked #blocked transaction open
unlock tables; #blocked #blocked transaction open
OK #blocked #blocked transaction open
#blocked #blocked commit;
OK OK OK
select * from t;
+----+------+
| id | n    |
+----+------+
|  1 |   13 |
+----+------+

elect * from tdel;
Empty set (0.01 sec)

Still looking into why it behaves different than yours, and correctly so.

@shlomi-noach
Copy link
Contributor

it's not the AUTO_INCREMENT. Still trying to understand what.

@shlomi-noach
Copy link
Contributor

Sorry, I did not execute the rename and the select * from tgho in the same order as you did.

@zhongliangkang
Copy link

I can confirm I get the same behavior as yours, but something's strange here:

At first I thought this might be related to the fact your table doesn;t have a PRIMARY KEY; I added a PRIMARY KEY and I'm still getting your results.

I then tries a different table structure, and I got different results, which are good. See below.

Can you please clarify what is the output of your:

select @@tx_isolation;
select @@sql_mode;
select @@version;

I'm using the following:

select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+


select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+

select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.30    |
+-----------+

Here's a different table definition which works correctly:

CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `n` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
create table tdel like t;
create table tgho like t;

session1 session2 session3 session4
lock tables t write, tdel write;
OK
insert into t values(null,13);#blocked
#blocked rename table t to tdel, tgho to t;
#blocked
#blocked #blocked begin;
select * from tgho;
#blocked #blocked OK, transaction open
drop table tdel; #blocked #blocked transaction open
OK #blocked #blocked transaction open
unlock tables; #blocked #blocked transaction open
OK #blocked #blocked transaction open
#blocked #blocked commit;
OK OK OK

select * from t;
+----+------+
| id | n    |
+----+------+
|  1 |   13 |
+----+------+

elect * from tdel;
Empty set (0.01 sec)

Still looking into why it behaves different than yours, and correctly so.

The prefix of tables' name in your test is different with me ,that's the biggest difference.
You can change the tdel to _t_del,tgho to _t_gho and try again, you'll get a defferent result.

The reason why tables' name affect the result is metioned in the manual:

https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html

DDL statements, LOCK TABLES, and other similar statements try to reduce the number of possible deadlocks between concurrent DDL statements by acquiring locks on explicitly named tables in name order.

The DEL/GHO tables' name are in different order, as follow:

+---------------+
| Tables_in_sg2 |
+---------------+
| t             |
| tdel          |
| tgho          |
+---------------+
3 rows in set (0.00 sec)

+---------------+
| Tables_in_sg3 |
+---------------+
| _t_del        |
| _t_gho        |
| t             |
+---------------+
3 rows in set (0.00 sec)

In database sg2 ,the table t comes first in table name order , and in database sg3, the table t comes last.
The name order affect the acquiring locks order in rename operation.

When you use t,tdel,tgho, the rename operation try acquire lock on table t first, after session1 unlock tables, the rename session get the table lock first; but when you use t,_t_del,_t_gho, when you lock table t,_t_del in session1, the rename operation try acquire lock on _t_del first(waiting for the lock of _t_del table), and after session1 drop table _t_del, the rename try to acquire lock on _t_gho table, and if there is some transaction on the _t_gho table, the rename is waiting for _t_gho lock, if now session1 unlock tables, the rename still waiting for _g_gho until the transaction on _t_gho end, and that the problem happen: the insert in session2 could get the lock on t and could insert success.

Don't know if i made it clear,the table name order affect the test result.

---the variables:

> select @@tx_isolation;
+----------------+
| @@tx_isolation |
+----------------+
| READ-COMMITTED |
+----------------+
1 row in set (0.00 sec)
> select @@sql_mode;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

> select @@version;
+------------------+
| @@version        |
+------------------+
| 5.7.18|
+------------------+
1 row in set (0.00 sec)

@shlomi-noach
Copy link
Contributor

The name order affect the acquiring locks order in rename operation.

WOW. OK, let me digest this and see how to proceed.

@shlomi-noach
Copy link
Contributor

from https://dev.mysql.com/doc/refman/5.7/en/rename-table.html:

Renaming operations are performed left to right

Metadata locks on tables are acquired in name order, which in some cases can make a difference in operation outcome when multiple transactions execute concurrently. See Section 8.11.4, “Metadata Locking”.

I'll keep looking at the implications.

@zhongliangkang
Copy link

from https://dev.mysql.com/doc/refman/5.7/en/rename-table.html:

Renaming operations are performed left to right

Metadata locks on tables are acquired in name order, which in some cases can make a difference in operation outcome when multiple transactions execute concurrently. See Section 8.11.4, “Metadata Locking”.

I'll keep looking at the implications.

Hi,shlomi, is there any update here ? Look forward the fix for this case :)

@shlomi-noach
Copy link
Contributor

Sorry for the delays, I hope to give this a shot by end of month

@shlomi-noach
Copy link
Contributor

Again, sorry for the delay. I took some time to review this issue again and can confirm it is as you describe. I appreciate your findings and analysis. OK, so there's an immediate solution for you (sorry, I could have figured it out much sooner), and then we need to consider long term solutions.

Immediate term

use e.g.: gh-ost --force-table-name=zzz_8e57cbfa9eabd7d50d0e ..., where:

  • zzz is to ensure table name is greater than your real table name (assuming you don't have tables called zzz*
  • 8e57cbfa9eabd7d50d0e is some random value generated by application, e.g. via openssl rand -hex 10

This will ensure gh-ost creates the following tables:

  • zzz_8e57cbfa9eabd7d50d0e_gho
  • zzz_8e57cbfa9eabd7d50d0e_ghc
  • zzz_8e57cbfa9eabd7d50d0e_del

long term

Is a bit of a problem. _ is a common prefix for "temporary" or "internal" tables. And it's a valid ascii character which does not require escaping. In the ascii table, z is only followed by {, '|', '}' and '~'.. None of which are valid characters for a non-escaped table.

So I'm going to bang my head a bit around fixing the problem without modifying the prefix _. This will not be easy and I'mn not sure if I can crack it. The trick for atomic table renames was complex enough without this added constraint.

Cross fingers.

Oh, another alternative is to use -cut-over=two-step, which leaves a gap in time when the table does not exist... It's a compromise.

@shlomi-noach
Copy link
Contributor

Sorry, my suggested short term solution above will not work as I hoped, because right now gh-ost always prepends a _ to the table name. This would be an easy fix if needed.

@zhongliangkang
Copy link

Thanks for your solutions 👍

I had discussed the short term solution with @shatianjiang , maybe there is still exists risk of lost data when there got a table named 'zzzzzXXX'(with 'zzzz' prefix...), but it would work most of time:)

So look forward the long term fix for this issue.
And we're keep finding a solution also, if there is any progress ,we'll update here.

Thanks again.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Dec 31, 2020

The change you'd need to make for short term solution is in:

gh-ost/go/base/context.go

Lines 280 to 286 in c681c54

func getSafeTableName(baseName string, suffix string) string {
name := fmt.Sprintf("_%s_%s", baseName, suffix)
if len(name) <= mysql.MaxTableNameLength {
return name
}
extraCharacters := len(name) - mysql.MaxTableNameLength
return fmt.Sprintf("_%s_%s", baseName[0:len(baseName)-extraCharacters], suffix)

As you can see gh-ost always prepends _ to table names. I'll likely add a simple option (e.g. --force-table-name-prefix or something) to control that.

@rudy-gao
Copy link

@zhongliangkang Even if the _a_gho table is not queried, I have also encountered a scenario where data is lost in the rename phase. Because in the rename phase, the order acquires table locks is: _a_del, _a_gho, a. When gh-ost acquired _a_del table lock and try to _a_gho table lock, although the time is very short, it is also possible that the a table has dml SQL to be allowed to execute.
I think the following methods may be solve it:

  1. after drop table _a_del then time.sleep(100ms), finally unlock tables
  2. For the generation of the gh-ost table of gh-ost, use ascii ~ to replace _, such as ~a_gho replace _a_gho

@cenkore For pull code #987 , I think it is very good, but I am not sure why code not be merged?

@shlomi-noach For "I'll likely add a simple option (e.g. --force-table-name-prefix or something) to control that" having any update?

@zhongliangkang
Copy link

Yes. Even if no query on the _gho table, data may lost, add a query make it easy to reproduce.
We fix it by add 'zz' prefix temporary..

#987 may need more deep thinking..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants