You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running gh-ost version 1.1.6 with the the following arguments, I am encountering an issue with the RENAME being stuck in a state of "Waiting for table metadata lock".
While this command is running, it emits these logs:
2024-09-05 12:32:33 INFO Grabbing voluntary lock: gh-ost.110215429.lock
2024-09-05 12:32:33 INFO Setting LOCK timeout as 20 seconds
2024-09-05 12:32:33 INFO Looking for magic cut-over table
2024-09-05 12:32:33 INFO Creating magic cut-over table `qwebertest`.`_activity_log_small_del`
2024-09-05 12:32:33 INFO Magic cut-over table created
2024-09-05 12:32:33 INFO Locking `qwebertest`.`activity_log_small`, `qwebertest`.`_activity_log_small_del`
2024-09-05 12:32:33 INFO Tables locked
2024-09-05 12:32:33 INFO Session locking original & magic tables is 110215429
2024-09-05 12:32:33 INFO Writing changelog state: AllEventsUpToLockProcessed:1725564753915176000
2024-09-05 12:32:34 INFO Intercepted changelog state AllEventsUpToLockProcessed
2024-09-05 12:32:34 INFO Handled changelog state AllEventsUpToLockProcessed
2024-09-05 12:32:35 INFO Waiting for events up to lock
2024-09-05 12:32:35 INFO Waiting for events up to lock: got AllEventsUpToLockProcessed:1725564753915176000
2024-09-05 12:32:35 INFO Done waiting for events up to lock; duration=1.135605542s
2024-09-05 12:32:35 INFO Setting RENAME timeout as 10 seconds
2024-09-05 12:32:35 INFO Session renaming tables is 110198088
2024-09-05 12:32:36 INFO Issuing and expecting this to block: rename /* gh-ost */ table `qwebertest`.`activity_log_small` to `qwebertest`.`_activity_log_small_del`, `qwebertest`.`_activity_log_small_gho` to `qwebertest`.`activity_log_small`
2024-09-05 12:32:46 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2024-09-05 12:32:46 INFO Will now proceed to drop magic table and unlock tables
2024-09-05 12:32:46 INFO Dropping magic cut-over table
2024-09-05 12:32:46 INFO Releasing lock from `qwebertest`.`activity_log_small`, `qwebertest`.`_activity_log_small_del`
2024-09-05 12:32:46 INFO Tables unlocked
2024-09-05 12:32:46 INFO Looking for magic cut-over table
2024-09-05 12:32:47 INFO Checking session lock: gh-ost.110215429.lock
2024-09-05 12:32:48 ERROR Session lock gh-ost.110215429.lock expected to be found but wasn't
2024-09-05 12:32:48 INFO Removing socket file: /tmp/gh-ost.qwebertest.activity_log_small.sock
2024-09-05 12:32:48 FATAL Session lock gh-ost.110215429.lock expected to be found but wasn't
Unable to find the expected entry in the processlist
After this log was emitted Issuing and expecting this to block..., I looked at the processlist:
mysql> show processlist;
+----------+---------+-------------------+--------------------+-------------+------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----------+---------+-------------------+--------------------+-------------+------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| 16851464 | foouser | 00.00.0.003:43342 | qwebertest | Query | 2 | Waiting for table metadata lock | rename /* gh-ost */ table `qwebertest`.`activity_log_small` to `qwebertest`.`_activity_log_small_del |
| 16852076 | foouser | 00.00.00.00:54836 | qwebertest | Sleep | 0 | | NULL |
| 16852531 | foouser | 00.00.0.005:39916 | qwebertest | Sleep | 4 | | NULL |
| 16852552 | foouser | 00.00.0.005:39958 | qwebertest | Sleep | 135 | | NULL |
| 16852793 | foouser | 00.00.00.00:58424 | information_schema | Sleep | 132 | | NULL |
| 16852812 | foouser | 00.00.00.00:58478 | information_schema | Query | 0 | init | show processlist |
| 16852995 | foouser | 00.00.00.00:59596 | information_schema | Binlog Dump | 136 | Source has sent all binlog to replica; waiting for more updates | NULL |
+----------+---------+-------------------+--------------------+-------------+------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
7 rows in set (0.09 sec)
You can clearly see the RENAME is there with the state of "waiting for table metadata lock", which is what the tool expects:
this.migrationContext.Log.Infof("Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)")
Therefore, the tool gets stuck in retrying the query for SHOW PROCESSLIST.
I think this is because of the following oddity of my session:
mysql> select id, command, info from information_schema.processlist;
+----------+---------+--------------------------------------------------------------+
| id | command | info |
+----------+---------+--------------------------------------------------------------+
| 16858880 | Sleep | NULL |
| 16858899 | Sleep | NULL |
| 16858891 | Sleep | NULL |
| 16858533 | Query | select id, command, info from information_schema.processlist |
+----------+---------+--------------------------------------------------------------+
4 rows in set (0.08 sec)
mysql> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 125465300 |
+-----------------+
1 row in set (0.09 sec)
mysql> SELECT @@pseudo_thread_id;
+--------------------+
| @@pseudo_thread_id |
+--------------------+
| 16858533 |
+--------------------+
1 row in set (0.09 sec)
You can see that the entry in the processlist is using the @@pseudo_thread_id, not the connection_id().
But the connection_id() is what gets passed in to the query:
qsweber
changed the title
Atomic cut-over failing due to RENAME being stuck "waiting for table metadata lock"
Atomic cut-over failing due to RENAME running on an unexpected connection id
Sep 6, 2024
I figured this out. It is related to me running the command against a ProxySQL host, which caused the divergence between the thread_id and connection_id. I was able to fix by connecting directly the MySQL host.
When running gh-ost version 1.1.6 with the the following arguments, I am encountering an issue with the RENAME being stuck in a state of "Waiting for table metadata lock".
Gh-ost Command
Logs
While this command is running, it emits these logs:
Unable to find the expected entry in the processlist
After this log was emitted
Issuing and expecting this to block...
, I looked at the processlist:You can clearly see the RENAME is there with the state of "waiting for table metadata lock", which is what the tool expects:
gh-ost/go/logic/migrator.go
Line 694 in 59c1a24
However, we never see the the log that comes immediately after that:
gh-ost/go/logic/migrator.go
Line 703 in 59c1a24
Therefore, the tool gets stuck in retrying the query for
SHOW PROCESSLIST
.I think this is because of the following oddity of my session:
You can see that the entry in the processlist is using the
@@pseudo_thread_id
, not theconnection_id()
.But the
connection_id()
is what gets passed in to the query:gh-ost/go/logic/applier.go
Line 952 in 75a346b
gh-ost/go/logic/applier.go
Line 875 in 75a346b
So in conclusion, the query is running on on an unexpected connection identifier.
I believe this also explains why I'm getting this error log:
The connection ID returned by
is_used_lock
is not the value we are expecting:gh-ost/go/logic/applier.go
Line 861 in 75a346b
The text was updated successfully, but these errors were encountered: