Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut-off transaction is very slow, leading to downtimes in production[Question] #1191

Open
Shukla-Ankur opened this issue Oct 21, 2022 · 1 comment

Comments

@Shukla-Ankur
Copy link

Shukla-Ankur commented Oct 21, 2022

MySql v8.0.23

  1. I recently did multiple migrations in production. I noticed for table switch at the end, gh-ost takes a very long time. here are actual logs from production.
  2. The row count in these tables ranged from 15k to 5M. Also, I did not see any correlation between table size and lock duration, which is as expected.
  3. Below are actual logs from production run.

`2022-10-19 07:53:14 INFO Lock & rename duration: 1.017494674s. During this time, queries on entity were blocked

2022-10-19 07:21:28 INFO Lock & rename duration: 1.042562025s. During this time, queries on entity were blocked

2022-10-19 07:50:31 INFO Lock & rename duration: 2.022935482s. During this time, queries on entity were blocked

2022-10-19 07:45:12 INFO Lock & rename duration: 2.020735566s. During this time, queries on entity were blocked

2022-10-19 07:42:29 INFO Lock & rename duration: 1.01602586s. During this time, queries on entity were blocked`

  1. Such high cut off time means a downtime. I expected some single digit millisecond operation for this transaction. One question here- does this timestamp include any lock wait time also for tx to even acquire a lock? However, these tables almost did not have any traffic.
@dm-2
Copy link
Contributor

dm-2 commented Oct 21, 2022

👋 @Shukla-Ankur the cut-over step is a complicated process which requires coordination between a few different operations to ensure that the table cut-over happens atomically - and as a result, it takes some time to complete.

The times you are seeing are normal, this is a sample of cutover times for some of our recent schema migrations (on both very small and very large tables):

INFO Lock & rename duration: 1.034362751s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.958744449s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.274112085s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 5.060551315s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.622521302s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.026753275s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.042612384s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.702527513s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.07640219s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.404149521s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.322024889s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.652230173s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.254592072s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.073429469s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.555971072s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 3.083120692s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.188444113s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.136634135s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.634421897s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.066720756s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 2.542204541s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.185863156s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.073191921s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.218012359s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.320745324s. During this time, queries on <redacted> were blocked
INFO Lock & rename duration: 1.320744386s. During this time, queries on <redacted> were blocked

We'd certainly welcome any PRs with optimisations that reduce the cut-over time whilst maintaining it as a safe, atomic operation. That being said: we find these times to be acceptable to our needs, and we regularly perform online schema migrations using gh-ost whilst our services remain up and in-use by customers 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants