Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

region or datacenter causes failover to fail #1455

@lazzyfu

Description

@lazzyfu

触发bug的前置条件(满足任意一个或者同时启用)

  • PreventCrossRegionMasterFailover为true
  • PreventCrossDataCenterMasterFailover为true

触发时机

看运气

触发结果

切换失败

触发报错截图

image
image

分析报错

review下代码

image
发现代码in后面是有值的,而日志输出的这里的in后面是空值,导致判断不通过,触发了region保护,导致故障转移失败

如何复现

代码文件:go/inst/instance_dao.go
在instanceFound = true下面增加for循环(你想多少秒都行,不要太大,太大检测过慢,效果显现慢)
image

在Master节点上shutdown

这里要选好时机,时机就是调试代码的5秒逻辑,shutdown命令要在循环Loop 5秒期间执行

主库shutdown要在输出下面期间内执行
image

观察拓扑

此时拓扑恢复失败了
image

确认下日志

image

观察下ORC表的记录

image
发现宕机的Master节点10.10.1.220对应的region记录为空了

贴下解决办法吧(过程懒的讲了)

其实修复起来已经很简单了,调整下DetectRegionQuery代码顺序即可

image

添加代码的位置

image

补充代码

// Get datacenter、region etc
func() {
    var getMetaWaitGroup sync.WaitGroup
    if config.Config.DetectDataCenterQuery != "" && !isMaxScale {
        getMetaWaitGroup.Add(1)
        go func() {
            defer getMetaWaitGroup.Done()
            err := db.QueryRow(config.Config.DetectDataCenterQuery).Scan(&instance.DataCenter)
            logReadTopologyInstanceError(instanceKey, "DetectDataCenterQuery", err)
        }()
    }
    if config.Config.DetectRegionQuery != "" && !isMaxScale {
        getMetaWaitGroup.Add(1)
        go func() {
            defer getMetaWaitGroup.Done()
            err := db.QueryRow(config.Config.DetectRegionQuery).Scan(&instance.Region)
            logReadTopologyInstanceError(instanceKey, "DetectRegionQuery", err)
        }()
    }
    if config.Config.DetectPhysicalEnvironmentQuery != "" && !isMaxScale {
        getMetaWaitGroup.Add(1)
        go func() {
            defer getMetaWaitGroup.Done()
            err := db.QueryRow(config.Config.DetectPhysicalEnvironmentQuery).Scan(&instance.PhysicalEnvironment)
            logReadTopologyInstanceError(instanceKey, "DetectPhysicalEnvironmentQuery", err)
        }()
    }
 
    if config.Config.DetectInstanceAliasQuery != "" && !isMaxScale {
        getMetaWaitGroup.Add(1)
        go func() {
            defer getMetaWaitGroup.Done()
            err := db.QueryRow(config.Config.DetectInstanceAliasQuery).Scan(&instance.InstanceAlias)
            logReadTopologyInstanceError(instanceKey, "DetectInstanceAliasQuery", err)
        }()
    }
 
    if config.Config.DetectSemiSyncEnforcedQuery != "" && !isMaxScale {
        getMetaWaitGroup.Add(1)
        go func() {
            defer getMetaWaitGroup.Done()
            err := db.QueryRow(config.Config.DetectSemiSyncEnforcedQuery).Scan(&instance.SemiSyncPriority)
            logReadTopologyInstanceError(instanceKey, "DetectSemiSyncEnforcedQuery", err)
        }()
    }
    getMetaWaitGroup.Wait()
}()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions