tm revamp: remove most topo.ChangeType#6139
Conversation
|
We deprecated the TER vttablet RPC in 4.0 but forgot to remove it in 5.0/6.0. We can delete that and then finalize can also be deleted. |
There was a problem hiding this comment.
We are calling an RPC within an RPC, so it's blocking on the actionMutex. Specifically we called agent.lock(ctx) just before this call to ChangeType which calls agent.lock(ctx) as the first thing. I don't think it is necessary to call it at the beginning of this code block any more.
There was a problem hiding this comment.
I think we were taking the lock in this block so that, in the case of drain-for-backup, we exclude all other RPC actions while the backup is happening. If that's true, we still need to hold the lock throughout the rest of this function, not just while changing type. The pattern we typically use in Vitess for this is to pull the meat of ChangeType() out into changeTypeLocked() and call the latter whenever we already hold the lock.
There was a problem hiding this comment.
Good point. I forgot that defer only gets executed at the end of the function, not at the end of the block.
There was a problem hiding this comment.
The next call to refreshTablet is unnecessary, because ChangeType already calls it. Ditto for the call to agent.runHealthCheckLocked
enisoc
left a comment
There was a problem hiding this comment.
I'm generally concerned about almost all of the removals of agent.lock() calls. This seems to be defeating an important protection against multiple actions/RPCs running concurrently.
There was a problem hiding this comment.
I think we were taking the lock in this block so that, in the case of drain-for-backup, we exclude all other RPC actions while the backup is happening. If that's true, we still need to hold the lock throughout the rest of this function, not just while changing type. The pattern we typically use in Vitess for this is to pull the meat of ChangeType() out into changeTypeLocked() and call the latter whenever we already hold the lock.
There was a problem hiding this comment.
This looks unsafe. There's a lot happening now without holding the lock. The current goal of the lock is to serialize entire RPCs, not just certain changes. I don't think we can relax that without thinking through a full redesign of the locking scheme.
sougou
left a comment
There was a problem hiding this comment.
I've restored all the action locks. I'll revisit them on a case by case basis.
There was a problem hiding this comment.
Why were these changes needed?
There was a problem hiding this comment.
Because we're now using agent.ChangeType instead of directly updating the topo. So, the ActionAgent needs to be initialized for it to process ChangeType properly.
There was a problem hiding this comment.
You could revert all changes to rpc_external_reparent.go and this test and delete the functions in that file + tests either in this PR or in a subsequent one. It doesn't make sense to make changes to it when we should be deleting it.
There was a problem hiding this comment.
It's fine. I'll ping you separately for the cleanup.
|
local_example has been failing, but it passes on my machine. I'll keep retrying. |
ae42484 to
3abdcf4
Compare
All those calls have been replaced with agent.ChangeType. agent.ChangeType is the only one that should call topo.ChangeType. One exception is agent.finalizeTabletExternallyReparented. It actually changes the tablet's internal state before updating topo, which seems to break the main rule. So, I've left it alone for now. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
In many of those flows action lock wasn't needed. We'll eventually deprecate actionLock altogether. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
7cb5b97 to
d89f7e4
Compare
changeTypeLocked runs health, but InitMaster should not. Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
All those calls have been replaced with agent.ChangeType.
agent.ChangeType is the only one that should call topo.ChangeType.
One exception is agent.finalizeTabletExternallyReparented. It
actually changes the tablet's internal state before updating
topo, which seems to break the main rule. So, I've left it alone
for now.
Signed-off-by: Sugu Sougoumarane ssougou@gmail.com