Skip to content

Modify the default grpc keepalive time parameter#5478

Closed
inolddays wants to merge 957 commits intovitessio:masterfrom
inolddays:master
Closed

Modify the default grpc keepalive time parameter#5478
inolddays wants to merge 957 commits intovitessio:masterfrom
inolddays:master

Conversation

@inolddays
Copy link
Copy Markdown
Contributor

1.set grpc_keepalive_time to 10 seconds
2.set grpc_keepalive_timeout to 10 seconds
These settings will help to avoid the situation which query will hang there when pod or Bare metal is broken.

@inolddays inolddays requested a review from sougou as a code owner November 27, 2019 09:47
@inolddays inolddays force-pushed the master branch 2 times, most recently from 03fbdcc to 1fbdd6c Compare November 27, 2019 10:23
@sougou
Copy link
Copy Markdown
Contributor

sougou commented Dec 1, 2019

This looks safe to me, but let's have @rafael and @tirsen (or @mpawliszyn ) who are sensitive to grpc behavior changes have a look before we merge.

@rafael
Copy link
Copy Markdown
Member

rafael commented Dec 5, 2019

This particular change shouldn't create issues in our end as we already set values for this.

However, there is something I think it is worth calling out. We've been super conservative in setting default values for external dependencies. The rationale is that introducing this silently could create weird behavior for folks that were relying on default values of the library itself.

For instance, in this case we even add some extra logic in the case where this is zero to not pass the value at all: https://github.com/vitessio/vitess/blob/master/go/vt/grpcclient/client.go#L69

If we would like to change/reconsider this pattern and this we should discuss it more broadly as we use this approach in many other places in the code base.

Personally, for external libraries like grpc/s3/xtrabackup I prefer not setting any defaults from Vitess perspective and let users be explicit about it. We can come up with recommendations, but by default not set it.

@sougou
Copy link
Copy Markdown
Contributor

sougou commented Dec 10, 2019

I think @rafael has a point. However, VTGate hanging forever on a down tablet is a problem that still needs solving.

In reality, there are already specific changes in behavior that we make based on different use cases within vitess. For example, VTGate dials vttablet with the FailFast option, while the default behavior is not to fail fast. This change is required so that vtgate can quickly retry another tablet if the current one is down.

Along the same lines, I think VTGate should explicitly pass these flags, and I believe these should be the same as the healthcheck interval. In other words, there should be no need to export a separate value.

Based on my understanding, if the existing (global) flag is overridden, it will end up superseding the values supplied by vtgate (later values win). So, it should be a non-breaking change for anyone that's already overriding these values.

In other words, I think we should make the tablet dialer require these values on Dial, and then pass then through to grpcclient.Dial.

@morgo
Copy link
Copy Markdown
Contributor

morgo commented Feb 6, 2020

@Johnny-Three Can you rebase this PR on master so the testsuite passes? From the feedback, it looks like we should be good to merge this.

deepthi and others added 23 commits February 6, 2020 14:34
Merge Sharding tests in Go migrated from Python
Signed-off-by: Saif Alharthi <saif@saifalharthi.me>
Signed-off-by: Saif Alharthi <saif@saifalharthi.me>
…cleanup

Signed-off-by: Saif Alharthi <saif@saifalharthi.me>
* initial commit for backup_only

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* changes in package structure

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updating package name and fixing a test

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* removed unrequired teardown code

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* Removed debug code

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* inital commit for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* fix sequencing of cleanup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* fix terminate restore for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated config for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* backup-mysqlctld: mysqlctld setup module created.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* added xtrabackup stream mode

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated config for xtrabackup stream

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* minor changes to config

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* initial commit for backup_only

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* changes in package structure

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updating package name and fixing a test

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* removed unrequired teardown code

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* Removed debug code

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* inital commit for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* fix sequencing of cleanup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* fix terminate restore for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated config for xtrabackup

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* added xtrabackup stream mode

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated config for xtrabackup stream

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* minor changes to config

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* rebased to resolve conflict

Signed-off-by: Arindam Nayak <arindam.nayak@outlook.com>

* backup-mysqlctld: mysqlctld health check changes.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* backup_mysqlctld: mysqlctld teardown fixes. vttablet restart method created.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* backup-mysqlctld: mysqlctld restart fixed, backup utils refactor.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* backup_transform_mysqlctld: backup_transform testing using mysqlctld.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* Added percona 56 new dependency

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* Putting the dependency at right place

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated apt tp apt-get

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* corrected the typo

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* fixed a comma in config.json

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* review changes.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* test added in config.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* package name changed.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* mysql_ctld: file name change.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* mysqlctld: review changes,
code refactor, removed some functions.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* backup_transform: refator.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* backup_mysqlctld: config changes.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* restore vtBackup test

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* redistribute travis tests

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* updated shard matrix for transform test

Signed-off-by: Ajeet jain <ajeet@planetscale.com>

* mysqlctld: mysqlctld teardown issue resolved.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* mysqlctld: mysqlctld teardown issue resolved.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* mysql-ctld: process teardown changes.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* review changes and newConnection method modified.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

* mysqlctld: process hang issue resolved.

Signed-off-by: pradip parmar <prince.soamedia@gmail.com>

Co-authored-by: Ajeet Jain <ajeet.jain@gmail.com>
Co-authored-by: Arindam Nayak <arindamnayak@users.noreply.github.com>
Co-authored-by: Deepthi Sigireddi <deepthi.sigireddi@gmail.com>
…metadata

Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: pradip parmar <prince.soamedia@gmail.com>
Signed-off-by: pradip parmar <prince.soamedia@gmail.com>
Signed-off-by: pradip parmar <prince.soamedia@gmail.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Fixes vitessio#5800

The function hasAnother commit is supposed to separate out events
that have to be applied in a transaction from events that are
applied as autocommits. But it was not checking for the OTHER
and JOURNAL events. They should not be batched with regular
transactions.

This caused a bug where a regular transaction occured followed
by an OTHER event. The apply of this event happened assuming
autocommit behavior, but it actually got added to the previous
uncommitted transaction.

If the stop position is reached at this point, vplayer exits
because it thinks the autocommit happened, and the entire
transaction gets rolled back, and the stop position ends
up not being actually reached.

The copier which expects this behavior will then start applying
the next set of rows, but they are not consistent with the current
stop position. This will cause the target to go out of sync.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Ajeet jain <ajeet@planetscale.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
migrating messaging python testcase to go
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Fetch MariaDB 10.1 from MariaDB repos (works more consistently)

Signed-off-by: Morgan Tocker <tocker@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
sougou and others added 22 commits March 8, 2020 21:10
Also fix an incorrect test.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Not all code paths were updating the stats. This was causing
the stats reported in /debug/status to be unreliable. This refactor
keeps the stats and reporting in sync by performing the updates
at the lower level functions that also update the vreplication table.

Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Morgan Tocker <tocker@gmail.com>
Do not drop leading zeroes in microsecond timestamps for prepared statements.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
vstreamer: send immediate GTID on "current"
vrepl: more documentation and fix stats
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Andres Taylor <andres@planetscale.com>
1.set grpc_keepalive_time to 10 seconds
2.set grpc_keepalive_timeout to 10 seconds
These settings will help to avoid the situation which query will hang there when pod or Bare metal is broken.

Signed-off-by: JohnnyThree <whereshallyoube@gmail.com>
@inolddays
Copy link
Copy Markdown
Contributor Author

close this and created 5922 to replace .

@inolddays
Copy link
Copy Markdown
Contributor Author

close this and created 5922 to replace.

@inolddays inolddays closed this Mar 16, 2020
@morgo morgo mentioned this pull request Mar 16, 2020
@inolddays inolddays deleted the master branch March 17, 2020 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.