Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
be89e30
vtbackup, mysqlctl: detailed backup and restore metrics
maxenglander Dec 15, 2022
1504861
move fakemysqldaemon.go to mysqlctl package
maxenglander Dec 19, 2022
21d82f4
go/vt/mysqlctl: add backup/restore stats tests
maxenglander Dec 19, 2022
cc324e1
address pr comments round 1
maxenglander Dec 27, 2022
fef5519
checkpoint
maxenglander Jan 5, 2023
cdc748c
reorganize metric names
maxenglander Jan 9, 2023
86871e2
track bytes processed in backup/restore
maxenglander Jan 9, 2023
3d15817
address pr comments: remove 'this file' from comments
maxenglander Jan 10, 2023
b64eac4
Update go/vt/mysqlctl/backupstats/stats.go
maxenglander Jan 17, 2023
6a01830
Update go/vt/mysqlctl/backupstats/stats.go
maxenglander Jan 17, 2023
d5364e6
Update go/vt/mysqlctl/backupstats/stats.go
maxenglander Jan 17, 2023
9657a8d
pr review: improve comments
maxenglander Jan 17, 2023
1142e5f
make stats more expvars-friendly
maxenglander Jan 18, 2023
a86ecc6
-backupstorage.Parameterizer, +backupstorage.BackupStorage.WithParams
maxenglander Jan 18, 2023
71d0680
merge <- main
maxenglander Feb 6, 2023
76cd27f
remove deleted field
maxenglander Feb 6, 2023
05622d3
Merge branch 'main' into maxeng-details-backup-stats
maxenglander Feb 8, 2023
8f97dd6
merge <- main
maxenglander Feb 10, 2023
26b62de
Update go/ioutil/meter.go
maxenglander Feb 10, 2023
143d47e
Update go/ioutil/meter.go
maxenglander Feb 10, 2023
bbb8fb3
Update go/ioutil/meter.go
maxenglander Feb 10, 2023
49a5b94
-constants.go +release notes
maxenglander Feb 10, 2023
7294e92
add stats example to release notes
maxenglander Feb 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions doc/releasenotes/17_0_0_summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,26 @@
### Table of Contents

- **[Major Changes](#major-changes)**
- **[Breaking Changes](#breaking-changes)**
- [Deprecated Stats](#deprecated-stats)
- **[New command line flags and behavior](#new-flag)**
- [Builtin backup: read buffering flags](#builtin-backup-read-buffering-flags)
- **[New stats](#new-stats)**
- [Detailed backup and restore stats](#detailed-backup-and-restore-stats)

## <a id="major-changes"/> Major Changes

### <a id="breaking-changes"/>Breaking Changes

#### <a id="deprecated-stats"/>Deprecated Stats

These stats are deprecated in v17.

| Deprecated stat | Supported alternatives |
|-|-|
| `backup_duration_seconds` | `BackupDurationNanoseconds` |
| `restore_duration_seconds` | `RestoreDurationNanoseconds` |

### <a id="new-flag"/> New command line flags and behavior

#### <a id="builtin-backup-read-buffering-flags" /> Backup --builtinbackup-file-read-buffer-size and --builtinbackup-file-write-buffer-size
Expand All @@ -25,3 +40,123 @@ These flags are applicable to the following programs:
- `vtctld`
- `vttablet`
- `vttestserver`

### <a id="new-stats"/> New stats

#### <a id="detailed-backup-and-restore-stats"/> Detailed backup and restore stats

##### Backup metrics

Metrics related to backup operations are available in both Vtbackup and VTTablet.

**BackupBytes, BackupCount, BackupDurationNanoseconds**

Depending on the Backup Engine and Backup Storage in-use, a backup may be a complex pipeline of operations, including but not limited to:

* Reading files from disk.
* Compressing files.
* Uploading compress files to cloud object storage.

These operations are counted and timed, and the number of bytes consumed or produced by each stage of the pipeline are counted as well.

##### Restore metrics

Metrics related to restore operations are available in both Vtbackup and VTTablet.

**RestoreBytes, RestoreCount, RestoreDurationNanoseconds**

Depending on the Backup Engine and Backup Storage in-use, a restore may be a complex pipeline of operations, including but not limited to:

* Downloading compressed files from cloud object storage.
* Decompressing files.
* Writing decompressed files to disk.

These operations are counted and timed, and the number of bytes consumed or produced by each stage of the pipeline are counted as well.

##### Vtbackup metrics

Vtbackup exports some metrics which are not available elsewhere.

**DurationByPhaseSeconds**

Vtbackup fetches the last backup, restores it to an empty mysql installation, replicates recent changes into that installation, and then takes a backup of that installation.

_DurationByPhaseSeconds_ exports timings for these individual phases.

##### Example

**A snippet of vtbackup metrics after running it against the local example after creating the initial cluster**

(Processed with `jq` for readability.)

```
{
"BackupBytes": {
"BackupEngine.Builtin.Source:Read": 4777,
"BackupEngine.Builtin.Compressor:Write": 4616,
"BackupEngine.Builtin.Destination:Write": 162,
"BackupStorage.File.File:Write": 163
},
"BackupCount": {
"-.-.Backup": 1,
"BackupEngine.Builtin.Source:Open": 161,
"BackupEngine.Builtin.Source:Close": 322,
"BackupEngine.Builtin.Compressor:Close": 161,
"BackupEngine.Builtin.Destination:Open": 161,
"BackupEngine.Builtin.Destination:Close": 322
},
"BackupDurationNanoseconds": {
"-.-.Backup": 4188508542,
"BackupEngine.Builtin.Source:Open": 10649832,
"BackupEngine.Builtin.Source:Read": 55901067,
"BackupEngine.Builtin.Source:Close": 960826,
"BackupEngine.Builtin.Compressor:Write": 278358826,
"BackupEngine.Builtin.Compressor:Close": 79358372,
"BackupEngine.Builtin.Destination:Open": 16456627,
"BackupEngine.Builtin.Destination:Write": 11021043,
"BackupEngine.Builtin.Destination:Close": 17144630,
"BackupStorage.File.File:Write": 10743169
},
"DurationByPhaseSeconds": {
"InitMySQLd": 2,
"RestoreLastBackup": 6,
"CatchUpReplication": 1,
"TakeNewBackup": 4
},
"RestoreBytes": {
"BackupEngine.Builtin.Source:Read": 1095,
"BackupEngine.Builtin.Decompressor:Read": 950,
"BackupEngine.Builtin.Destination:Write": 209,
"BackupStorage.File.File:Read": 1113
},
"RestoreCount": {
"-.-.Restore": 1,
"BackupEngine.Builtin.Source:Open": 161,
"BackupEngine.Builtin.Source:Close": 322,
"BackupEngine.Builtin.Decompressor:Close": 161,
"BackupEngine.Builtin.Destination:Open": 161,
"BackupEngine.Builtin.Destination:Close": 322
},
"RestoreDurationNanoseconds": {
"-.-.Restore": 6204765541,
"BackupEngine.Builtin.Source:Open": 10542539,
"BackupEngine.Builtin.Source:Read": 104658370,
"BackupEngine.Builtin.Source:Close": 773038,
"BackupEngine.Builtin.Decompressor:Read": 165692120,
"BackupEngine.Builtin.Decompressor:Close": 51040,
"BackupEngine.Builtin.Destination:Open": 22715122,
"BackupEngine.Builtin.Destination:Write": 41679581,
"BackupEngine.Builtin.Destination:Close": 26954624,
"BackupStorage.File.File:Read": 102416075
},
"backup_duration_seconds": 4,
"restore_duration_seconds": 6
}
```

Some notes to help understand these metrics:

* `BackupBytes["BackupStorage.File.File:Write"]` measures how many bytes were read from disk by the `file` Backup Storage implementation during the backup phase.
* `DurationByPhaseSeconds["CatchUpReplication"]` measures how long it took to catch-up replication after the restore phase.
* `DurationByPhaseSeconds["RestoreLastBackup"]` measures to the duration of the restore phase.
* `RestoreDurationNanoseconds["-.-.Restore"]` also measures to the duration of the restore phase.
19 changes: 19 additions & 0 deletions go/cmd/vtbackup/vtbackup.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,12 @@ import (
"vitess.io/vitess/go/cmd"
"vitess.io/vitess/go/exit"
"vitess.io/vitess/go/mysql"
"vitess.io/vitess/go/stats"
"vitess.io/vitess/go/vt/dbconfigs"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/logutil"
"vitess.io/vitess/go/vt/mysqlctl"
"vitess.io/vitess/go/vt/mysqlctl/backupstats"
"vitess.io/vitess/go/vt/mysqlctl/backupstorage"
topodatapb "vitess.io/vitess/go/vt/proto/topodata"
"vitess.io/vitess/go/vt/servenv"
Expand Down Expand Up @@ -119,6 +121,11 @@ var (
detachedMode bool
keepAliveTimeout = 0 * time.Second
disableRedoLog = false
durationByPhase = stats.NewGaugesWithSingleLabel(
"DurationByPhaseSeconds",
"How long it took vtbackup to perform each phase (in seconds).",
"phase",
)
)

func registerFlags(fs *pflag.FlagSet) {
Expand Down Expand Up @@ -258,9 +265,11 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
}
initCtx, initCancel := context.WithTimeout(ctx, mysqlTimeout)
defer initCancel()
initMysqldAt := time.Now()
if err := mysqld.Init(initCtx, mycnf, initDBSQLFile); err != nil {
return fmt.Errorf("failed to initialize mysql data dir and start mysqld: %v", err)
}
durationByPhase.Set("InitMySQLd", int64(time.Since(initMysqldAt).Seconds()))
// Shut down mysqld when we're done.
defer func() {
// Be careful not to use the original context, because we don't want to
Expand Down Expand Up @@ -291,6 +300,7 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
Keyspace: initKeyspace,
Shard: initShard,
TabletAlias: topoproto.TabletAliasString(tabletAlias),
Stats: backupstats.BackupStats(),
}
// In initial_backup mode, just take a backup of this empty database.
if initialBackup {
Expand All @@ -312,12 +322,14 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
if err := mysqlctl.Backup(ctx, backupParams); err != nil {
return fmt.Errorf("backup failed: %v", err)
}
durationByPhase.Set("InitialBackup", int64(time.Since(backupParams.BackupTime).Seconds()))
log.Info("Initial backup successful.")
return nil
}

backupDir := mysqlctl.GetBackupDir(initKeyspace, initShard)
log.Infof("Restoring latest backup from directory %v", backupDir)
restoreAt := time.Now()
params := mysqlctl.RestoreParams{
Cnf: mycnf,
Mysqld: mysqld,
Expand All @@ -328,6 +340,7 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
DbName: dbName,
Keyspace: initKeyspace,
Shard: initShard,
Stats: backupstats.RestoreStats(),
}
backupManifest, err := mysqlctl.Restore(ctx, params)
var restorePos mysql.Position
Expand All @@ -345,6 +358,7 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
default:
return fmt.Errorf("can't restore from backup: %v", err)
}
durationByPhase.Set("RestoreLastBackup", int64(time.Since(restoreAt).Seconds()))

// Disable redo logging (if we can) before we start replication.
disabledRedoLog := false
Expand Down Expand Up @@ -413,6 +427,7 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
// We're caught up on replication to at least the point the primary
// was at when this vtbackup run started.
log.Infof("Replication caught up to %v after %v", status.Position, time.Since(waitStartTime))
durationByPhase.Set("CatchUpReplication", int64(time.Since(waitStartTime).Seconds()))
break
}
if !status.Healthy() {
Expand Down Expand Up @@ -446,6 +461,7 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
}

if restartBeforeBackup {
restartAt := time.Now()
log.Info("Proceeding with clean MySQL shutdown and startup to flush all buffers.")
// Prep for full/clean shutdown (not typically the default)
if err := mysqld.ExecuteSuperQuery(ctx, "SET GLOBAL innodb_fast_shutdown=0"); err != nil {
Expand All @@ -459,12 +475,15 @@ func takeBackup(ctx context.Context, topoServer *topo.Server, backupStorage back
if err := mysqld.Start(ctx, mycnf); err != nil {
return fmt.Errorf("Could not start MySQL after full shutdown: %v", err)
}
durationByPhase.Set("RestartBeforeBackup", int64(time.Since(restartAt).Seconds()))
}

// Now we can take a new backup.
backupAt := time.Now()
if err := mysqlctl.Backup(ctx, backupParams); err != nil {
return fmt.Errorf("error taking backup: %v", err)
}
durationByPhase.Set("TakeNewBackup", int64(time.Since(backupAt).Seconds()))

// Return a non-zero exit code if we didn't meet the replication position
// goal, even though we took a backup that pushes the high-water mark up.
Expand Down
18 changes: 18 additions & 0 deletions go/ioutil/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/*
Copyright 2022 The Vitess Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package ioutil provides wrappers around golang IO interfaces.
package ioutil
58 changes: 58 additions & 0 deletions go/ioutil/meter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
Copyright 2022 The Vitess Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

/*
The meter struct contains time-and-byte-tracking functionality used by
MeteredReader, MeteredReadCloser, MeteredWriter, MeteredWriteCloser.
*/

package ioutil

import "time"

// meter contains time-and-byte-tracking functionality.
type meter struct {
fs []func(b int, d time.Duration)
bytes int64
duration time.Duration
}

// Bytes reports the total bytes processed in calls to measure().
func (mtr *meter) Bytes() int64 {
return mtr.bytes
}

// Duration reports the total time spent in calls to measure().
func (mtr *meter) Duration() time.Duration {
return mtr.duration
}

// measure tracks the time spent and bytes processed by f. Time is accumulated into total,
// and reported to callback fns.
func (mtr *meter) measure(f func(p []byte) (int, error), p []byte) (b int, err error) {
s := time.Now()
b, err = f(p)
d := time.Since(s)

mtr.bytes += int64(b)
mtr.duration += d

for _, cb := range mtr.fs {
cb(b, d)
}

return
}
Loading