Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binlog end log pos over lost data #1367

Merged
merged 14 commits into from
Mar 12, 2024

Conversation

shaohk
Copy link
Contributor

@shaohk shaohk commented Jan 17, 2024

Related issue: #1366

Description

This PR [Fix binlog end log pos over lost data]

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

shaohoukun added 2 commits January 16, 2024 15:25
…er-lost-data

Calculate whether the binlog log_pos overflows beyond 4G using end_log_pos and event_size.
@dikang123
Copy link

when will this PR be merged?

@timvaillancourt
Copy link
Collaborator

@shaohk thanks a lot for this PR, this sounds like a difficult problem to debug!

Can you think of any ways that this check can be tested in go unit tests? I wonder if a test could:

  1. Test that a normal *replication.RowsEvent event will be accepted
  2. Test that an oversized *replication.RowsEvent event it not accepted

Luckily, handleRowEvents doesn't need a real MySQL server as it pushes events to entriesChannel chan<- *BinlogEntry, which should make this relatively simple to test

Copy link
Collaborator

@timvaillancourt timvaillancourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Added one comment requesting unit tests 🙇

@shaohk
Copy link
Contributor Author

shaohk commented Feb 7, 2024

@shaohk thanks a lot for this PR, this sounds like a difficult problem to debug!

Can you think of any ways that this check can be tested in go unit tests? I wonder if a test could:

  1. Test that a normal *replication.RowsEvent event will be accepted
  2. Test that an oversized *replication.RowsEvent event it not accepted

Luckily, handleRowEvents doesn't need a real MySQL server as it pushes events to entriesChannel chan<- *BinlogEntry, which should make this relatively simple to test

@timvaillancourt Thank you for reviewing this PR. I have added unit tests as requested. Could you please take a look again? Thank you for your hard work.

@@ -74,3 +75,23 @@ func (this *BinlogCoordinates) SmallerThanOrEquals(other *BinlogCoordinates) boo
}
return this.LogFile == other.LogFile && this.LogPos == other.LogPos
}

// IsLogPosOverflowBeyond4Bytes returns true if the coordinate endpos is overflow beyond 4 bytes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaohk last request, can we add a reference to why a value over 4 bytes is an overflow?

Copy link
Contributor Author

@shaohk shaohk Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timvaillancourt OK, I have added. Please take a look again, Thks.

change IsLogPosOverflowBeyond4Bytes comment.
@meiji163
Copy link
Contributor

@shaohk Can you take a look at the failing tests and linter?

@shaohk
Copy link
Contributor Author

shaohk commented Feb 29, 2024

@shaohk Can you take a look at the failing tests and linter?

@meiji163
OK, Please take a look again, Thks.

@meiji163
Copy link
Contributor

LGTM, can you take another look @timvaillancourt?

@timvaillancourt
Copy link
Collaborator

LGTM, can you take another look @timvaillancourt?

@meiji163 / @shaohk: I think erroring here is the easiest next step, so LGTM 👍

But longer term, what do we "want" gh-ost to do here? Or to put it differently, what does the MySQL replica do in this scenario? I'm assuming it doesn't error right? 🤔

@timvaillancourt timvaillancourt self-requested a review March 12, 2024 21:29
Copy link
Collaborator

@timvaillancourt timvaillancourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with longer-term question

@meiji163 meiji163 merged commit 48b34bc into github:master Mar 12, 2024
7 checks passed
@meiji163
Copy link
Contributor

I'm not sure how MySQL handles it
I found this bug report from last year https://bugs.mysql.com/bug.php?id=112189

@shaohk
Copy link
Contributor Author

shaohk commented Mar 13, 2024

LGTM, can you take another look @timvaillancourt?

@meiji163 / @shaohk: I think erroring here is the easiest next step, so LGTM 👍

But longer term, what do we "want" gh-ost to do here? Or to put it differently, what does the MySQL replica do in this scenario? I'm assuming it doesn't error right? 🤔

https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_max_binlog_cache_size.

When [gtid_mode](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html#sysvar_gtid_mode) is not ON, the maximum recommended value is 4GB, due to the fact that, in this case, MySQL cannot work with binary log positions greater than 4GB; when gtid_mode is ON, this limitation does not apply, and the server can work with binary log positions of arbitrary size.

So I believe that MySQL master-slave replication is based on the GTID mode, so there is no problem with MySQL master-slave replication. The gh-ost tool for synchronizing binlogs uses a non-GTID mode, which is why this issue arises.

@timvaillancourt
Copy link
Collaborator

So I believe that MySQL master-slave replication is based on the GTID mode, so there is no problem with MySQL master-slave replication. The gh-ost tool for synchronizing binlogs uses a non-GTID mode, which is why this issue arises.

@shaohk great, thanks for clarifying! I made an attempt to add GTID support to gh-ost long ago and I may revive this effort for many reasons. This adds a new reason to the pile

@ramshad-sha ramshad-sha mentioned this pull request Mar 25, 2024
Closed
@timvaillancourt timvaillancourt added this to the v1.1.7 milestone Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants