Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "storage/copy-to-s3: emit empty file even if input is empty" #30844

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

benesch
Copy link
Member

@benesch benesch commented Dec 16, 2024

This reverts commit b1b2c28.

Fix https://github.com/MaterializeInc/database-issues/issues/8599.

Motivation

  • This PR adds a known-desirable feature.

Tips for reviewer

Still TODO: track down test flake that caused this to get reverted in the first place. Test flake is solved! See commit messages for details.

image

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@benesch
Copy link
Member Author

benesch commented Dec 16, 2024

cc @pH14 @doy-materialize — lightly poking at this to get the catalog exporter work moving again.

@benesch benesch force-pushed the copy-to-s3-empty branch 3 times, most recently from c8279d9 to c8fc988 Compare December 22, 2024 08:19
@benesch benesch requested review from petrosagg and teskje December 22, 2024 17:00
@benesch benesch marked this pull request as ready for review December 22, 2024 17:00
@benesch benesch requested review from a team as code owners December 22, 2024 17:00
@benesch
Copy link
Member Author

benesch commented Dec 22, 2024

The second commit in this PR fixes the CI flakiness that caused the original revert. I've written up a more detailed diagnosis of the problem and the fix on the issue: https://github.com/MaterializeInc/database-issues/issues/8599#issuecomment-2558518257

@benesch
Copy link
Member Author

benesch commented Dec 22, 2024

I've been using this commit to run CI on the cloudtest testdrive script that was causing the flake 10 times: bdf7b1a

IME this has reliably reproduced the flake in about 4/10 runs.

// partial data.
if shutdown_token.strong_count() == 0 {
// Wedge until the operator is dropped.
future::pending::<()>().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the mechanism that will drop the iterator? I think it could be the button AsyncOperatorBuilder::build returns, but we are not using that for this operator. Doesn't that mean we will keep this future alive forever?

Copy link
Member Author

@benesch benesch Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded on Slack, but whoops, thanks! Updated the approach here to use the async operator builder's buttons.

@benesch benesch requested a review from a team as a code owner December 24, 2024 06:52
@benesch benesch requested a review from ParkMyCar December 24, 2024 06:52
@benesch benesch force-pushed the copy-to-s3-empty branch 4 times, most recently from 01932d6 to 6da67dd Compare December 24, 2024 14:37
@benesch
Copy link
Member Author

benesch commented Dec 24, 2024

Whew, ok, test flakes tracked down. I had CI run "full testdrive in cloudtest" 10 times against this patchset and it passed all ten times.

Details the fixes are in commit messages.

@@ -20,7 +20,7 @@ bytes = "1.3.0"
bytesize = "1.1.0"
http = "1.1.0"
hyper-tls = "0.5.0"
mz-ore = { path = "../ore", default-features = false }
mz-ore = { path = "../ore", default-features = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we list the specific features required instead? just enabling default-features is going to break cloud, because it'll make aws-util start pulling in the workspace-hack crate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this was just part of the revert commit. I'll just back this out. Doesn't seem to be required.

The COPY TO S3 sink must not flush files to S3 during shutdown, as those
files may only be partially written.

This can cause correctness issues when running copy operations on
multiprocess replicas, as the lagging replica's copy sink will get
dropped partway through.

The solution is to properly convert the buttons returned by the async
operators in the S3 sink's implementation into tokens that are dropped
when the sink is dropped. This ensures that the oneshot sink is properly
shut down when dropped, before it can write partial data to S3.

Fix MaterializeInc/database-issues#8599.
A copy to operation involves some preflight checks: validating that the
requested prefix of S3 is empty, and writing an INCOMPLETE sentinel
file.

Previously these checks were run as the first operation on every replica
participating in the copy to operation. This was racy though: a lagging
replica could write the INCOMPLETE sentinel file *after* the leading
replica finished the copy and had thus *deleted* the INCOMPLETE sentinel
file. Since the lagging replica's work is canceled once the leading
replica's work is finished, this could result in a INCOMPLETE sentinel
file that never got removed.

This commit moves these preflight checks to the adapter, where they are
performed exactly once before any replica is notified of the copy to
operation. This ensures that the INCOMPLETE file is written at most once
in an copy to operation, and in particular that lagging replicas can't
cause the file to reappear afer the leading replica finishes the
operation.
So that the adapter crate doesn't need to depend on the
mz-storage-operators crate, which is verboten.

This commit is pure code movement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants