Skip to content

expose /metrics from vtbackup at http --port#76

Closed
maxenglander wants to merge 1 commit intovtbackup_flagsfrom
maxeng-vtbackup-prom-stats
Closed

expose /metrics from vtbackup at http --port#76
maxenglander wants to merge 1 commit intovtbackup_flagsfrom
maxeng-vtbackup-prom-stats

Conversation

@maxenglander
Copy link

@maxenglander maxenglander commented Sep 28, 2022

NB: Have opened this PR against https://github.com/planetscale/vitess/tree/vtbackup_flags, but don't want to merge against that branch. Plan to wait for that to merge to main before considering merging this one.

Description

As far as I can tell vtbackup does not currently publish metrics. It would be awesome if it did. In particular it would be great to have the following timings:

  • How long it takes to download the last backup.
  • How long it takes to start and stop MySQL (MySQL startup can be slow sometimes, e.g. during InnoDB initialization).
  • How long it takes to connect to the primary.
  • How long it takes to download the binary log.
  • How long it takes to apply the binlog.
  • How long it takes to perform and upload the new backup.

This PR modifies vtbackup command so that a server is started on --port. The server is similar to that launched by other VT components, and includes a /metrics endpoint. Currently this endpoint will include two metrics which are managed by the mysqlctl package:

  • vtbackup_restore_duration_seconds
  • vtbackup_backup_duration_seconds

Keeping this PR small just to lay the groundwork. The other metrics can come in a later PR if we're OK with this overall approach.

Use cases

PlanetScale makes ongoing internal and public efforts to improve backup and restore performance. It would be great to have detailed metrics on current performance so that we can make more informed decisions on where to put our energy.

Signed-off-by: Max Englander <max@planetscale.com>
}

func init() {
mathrand.Seed(time.Now().UnixNano())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want/need this. Copy-pasted from elsewhere.

func init() {
mathrand.Seed(time.Now().UnixNano())
servenv.RegisterDefaultFlags()
servenv.RegisterFlags()
Copy link
Author

@maxenglander maxenglander Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think maybe we could do without servenv.RegisterFlags

// Catch SIGTERM and SIGINT so we get a chance to clean up.
ctx, cancel := context.WithCancel(context.Background())
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal handling now done by servenv

exit.Return(1)
}

if keepAliveTimeout > 0 {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this for local testing, but I think this could also be useful in a K8s context to allow the next Prometheus scrape interval to pass before the process exits.

@maxenglander maxenglander changed the title expose /metrics from vtbackup http --port expose /metrics from vtbackup at http --port Sep 28, 2022
@maxenglander maxenglander marked this pull request as ready for review September 28, 2022 21:51
@maxenglander maxenglander deleted the maxeng-vtbackup-prom-stats branch September 28, 2022 21:59
@maxenglander maxenglander restored the maxeng-vtbackup-prom-stats branch September 28, 2022 21:59
@maxenglander
Copy link
Author

Closed this in favor of vitessio#11388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant