Skip to content

mysqlctl: attempt to fix MySQL replication bug#5116

Closed
derekperkins wants to merge 1 commit intovitessio:masterfrom
nozzle:delete-sock-lock
Closed

mysqlctl: attempt to fix MySQL replication bug#5116
derekperkins wants to merge 1 commit intovitessio:masterfrom
nozzle:delete-sock-lock

Conversation

@derekperkins
Copy link
Copy Markdown
Member

Fixes #5067

@derekperkins derekperkins requested a review from sougou as a code owner August 20, 2019 04:16
@derekperkins derekperkins requested a review from enisoc August 20, 2019 04:16
@derekperkins
Copy link
Copy Markdown
Member Author

cc @sjmudd

@derekperkins derekperkins force-pushed the delete-sock-lock branch 2 times, most recently from 02f7b58 to 27ee7fa Compare August 20, 2019 11:35
Signed-off-by: Derek Perkins <derek@derekperkins.com>
case err == nil:
return nil

case strings.ToUpper(query) == "START SLAVE" && isReplicationErr1872(err):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a generic "execute query" function is the right place for this specific logic. If there's not a subroutine like mysqld.StartSlave() that all the code paths you're trying to catch use, then we should add one and change the appropriate sites to use it.

Or alternatively, would it work for you if we only check for this error in the repair section of the replication health reporter? That seems safer than injecting the fix at this low level, unless for some reason the periodic repair doesn't work for your needs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it felt weird to do this at this low level. I started at repairReplication and kept digging. That code path never called StartSlave that I found.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're ok with only doing this check in the repairReplication code path, then we can just make sure sufficient error detail gets propagated back up to that level to detect this specific case, and put the logic there. WDYT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me. This was just a quick stab in the dark to get some feedback.

@derekperkins derekperkins added this to the v4.0 milestone Sep 23, 2019
@morgo morgo modified the milestones: v4.0, v5.0 Nov 4, 2019
@deepthi
Copy link
Copy Markdown
Collaborator

deepthi commented Dec 31, 2019

I have tried to fix this at a different place in #5627

@yzsfly
Copy link
Copy Markdown

yzsfly commented Jan 24, 2022

Cannot find the fix code in the branch 12.0.0, still met the issue , could u help check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

repairReplication: attempt to fix 'Slave failed to initialize relay log info structure from the repository (errno 1872)'

5 participants