docs: Document StaleTopoPrimary VTOrc analysis and recovery#2059
docs: Document StaleTopoPrimary VTOrc analysis and recovery#2059
Conversation
✅ Deploy Preview for vitess ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
| | `DeadPrimary` | VTOrc detects when the primary tablet is dead | VTOrc runs EmergencyReparentShard to elect a different primary | | ||
| | `PrimaryIsReadOnly`, `PrimarySemiSyncMustBeSet`, `PrimarySemiSyncMustNotBeSet` | VTOrc detects when the primary tablet has configuration issues like being read-only, semi-sync being set or not being set | VTOrc fixes the configurations on the primary. | | ||
| | `NotConnectedToPrimary`, `ConnectedToWrongPrimary`, `ReplicationStopped`, `ReplicaIsWritable`, `ReplicaSemiSyncMustBeSet`, `ReplicaSemiSyncMustNotBeSet` | VTOrc detects when a replica has configuration issues like not being connected to the primary, connected to the wrong primary, replication stopped, replica being writable, semi-sync being set or not being set | VTOrc fixes the configurations on the replica. | | ||
| | `StaleTopoPrimary` | VTOrc detects when a tablet still has type PRIMARY in the topology but a newer primary has already been elected. This can happen if a topology update fails during an emergency reparent operation. | VTOrc demotes the stale primary to a read-only replica and updates its type to REPLICA in the topology. | |
There was a problem hiding this comment.
Citation: Based on PR #19173 which adds the StaleTopoPrimary analysis code in go/vt/vtorc/inst/analysis.go and the demoteStaleTopoPrimary recovery function in go/vt/vtorc/logic/topology_recovery.go. The description and fix action are derived from the PR description and implementation code.
View source
| | `DeadPrimary` | VTOrc detects when the primary tablet is dead | VTOrc runs EmergencyReparentShard to elect a different primary | | ||
| | `PrimaryIsReadOnly`, `PrimarySemiSyncMustBeSet`, `PrimarySemiSyncMustNotBeSet` | VTOrc detects when the primary tablet has configuration issues like being read-only, semi-sync being set or not being set | VTOrc fixes the configurations on the primary. | | ||
| | `NotConnectedToPrimary`, `ConnectedToWrongPrimary`, `ReplicationStopped`, `ReplicaIsWritable`, `ReplicaSemiSyncMustBeSet`, `ReplicaSemiSyncMustNotBeSet` | VTOrc detects when a replica has configuration issues like not being connected to the primary, connected to the wrong primary, replication stopped, replica being writable, semi-sync being set or not being set | VTOrc fixes the configurations on the replica. | | ||
| | `StaleTopoPrimary` | VTOrc detects when a tablet still has type PRIMARY in the topology but a newer primary has already been elected. This can happen if a topology update fails during an emergency reparent operation. | VTOrc demotes the stale primary to a read-only replica and updates its type to REPLICA in the topology. | |
There was a problem hiding this comment.
Citation: Updated recovery description based on PR review feedback from @nickvanw. The demoteStaleTopoPrimary function in go/vt/vtorc/logic/topology_recovery.go was updated to call setReplicationSource() to configure the demoted tablet to replicate from the current primary.
View source
|
Just a reminder: If you'd like me to act on any feedback you have via Github comments, just type @Promptless in your suggestion and I'll get right on it! (I won't show up in the user dropdown, but I'll process any request that has @Promptless in the comment body.) |
Open this suggestion in Promptless to view citations and reasoning process
Added a new row to the VTOrc recovery table documenting the
StaleTopoPrimaryanalysis and recovery (PR #19173). This recovery detects tablets that still have type PRIMARY in the topology after a newer primary has been elected—which can occur if topology updates fail during emergency reparent operations. VTOrc automatically demotes these stale primaries to read-only replicas and updates the topology accordingly.Trigger Events
StaleTopoPrimaryanalysis and recoveryHelp us improve Promptless — If this suggestion missed the mark, please share quick feedback.
If you want Promptless to make further changes on this PR, feel free to leave a comment tagging Promptless (It won't show up in the user drop down but Promptless will get it!)