Skip to content

xtrabackup: Add a timeout on closing backup files.#5193

Merged
enisoc merged 2 commits intovitessio:masterfrom
planetscale:xtrabackup-close-timeout
Sep 16, 2019
Merged

xtrabackup: Add a timeout on closing backup files.#5193
enisoc merged 2 commits intovitessio:masterfrom
planetscale:xtrabackup-close-timeout

Conversation

@enisoc
Copy link
Member

@enisoc enisoc commented Sep 14, 2019

We've seen backup attempts that apparently stalled while waiting for
Close() on the file returned by AddFile() to return. We've only seen
this on xtrabackup backups, likely because we perform a small number of
long-running file uploads, instead of uploading each file individually.

This adds a timeout to the Close() step. If it times out, the backup
will be aborted and will need to be retried from scratch. However,
that's better than getting stuck forever.

Signed-off-by: Anthony Yeh enisoc@planetscale.com

We've seen backup attempts that apparently stalled while waiting for
Close() on the file returned by AddFile() to return. We've only seen
this on xtrabackup backups, likely because we perform a small number of
long-running file uploads, instead of uploading each file individually.

This adds a timeout to the Close() step. If it times out, the backup
will be aborted and will need to be retried from scratch. However,
that's better than getting stuck forever.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc requested a review from deepthi September 14, 2019 06:33
@enisoc enisoc requested a review from sougou as a code owner September 14, 2019 06:33
Copy link
Collaborator

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// returned to abort.
cancelAddFiles()
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to merge, but... I'm not sure this solves the problem as described. The above go func will cancel the context and exit on the time out, but the outer defer func can still hang because closeFile doesn't care about cancelAddFiles.

Did you mean to do it the other way round, where the closeFile runs in a goroutine but the defer func should exit on timeout without waiting for closeFile to finish?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fail to make Close() return, then abandoning its goroutine will leak it, which could have unpredictable side effects down the line. It seems safer to me to hang if Close() hangs. We could also force a hard crash, but I'd rather give this a try first and add the crash only if we really need it. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to document the above caveat.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants