Skip to content

Fix VTOrc holding locks after shutdown#11442

Merged
GuptaManan100 merged 2 commits intovitessio:mainfrom
planetscale:vtorc-lock-shard
Oct 6, 2022
Merged

Fix VTOrc holding locks after shutdown#11442
GuptaManan100 merged 2 commits intovitessio:mainfrom
planetscale:vtorc-lock-shard

Conversation

@GuptaManan100
Copy link
Contributor

Description

This PR fixes the issue described in #11441. On further investigation it was found that this problem was introduced when we started using the servenv package in VTOrc.

The proposed fix is to run the code that wait for the shards to be let go as part of the synchronous on termination hooks.
Unit tests validating the wait function have also been added.

Related Issue(s)

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

…rm hooks and add tests for it

Signed-off-by: Manan Gupta <manan@planetscale.com>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 5, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@GuptaManan100 GuptaManan100 requested a review from rsajwani October 5, 2022 10:41
@deepthi
Copy link
Collaborator

deepthi commented Oct 5, 2022

I think this is good practice anyway, but I do wonder if we have a bug in the lock code where we are not checking for a canceled context.

Copy link
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we already had the code for this, we just needed to register it properly with servenv, correct?

Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Contributor Author

@deepthi Yes, we had code for this already. The thing is that when we started using servenv, it also came with its own handling of SIGTERM. So this code and servenv code would both see the SIGTERM, this code would block until all the locks were let go, but in the meantime servenv would finish, which would cause the main go routine to finish and the program would stop killing all the other go-routines including this one.

Now, we register this function as a on-termination hook to be run, so SIGTERM is only caught by servenv, but as part of the closing of the servenv, this function is called and this blocks until all the locks are released.

@GuptaManan100 GuptaManan100 merged commit ce049a8 into vitessio:main Oct 6, 2022
@GuptaManan100 GuptaManan100 deleted the vtorc-lock-shard branch October 6, 2022 06:05
GuptaManan100 added a commit that referenced this pull request Oct 6, 2022
* feat: call the logic for waiting for shard locks on synchronous on-term hooks and add tests for it

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix comment for acceptSighupSignal function

Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VTOrc Vitess Orchestrator integration Type: Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: VTOrc does not let go of the shards it is holding during shutdown

2 participants