Skip to content

Commit 159fa77

Browse files
cbi42facebook-github-bot
authored andcommitted
Fix an assertion failure in error handler (#13251)
Summary: we saw this [assertion](https://github.com/facebook/rocksdb/blob/02b4197544f758bdf84d80fe9319238611848c48/db/error_handler.cc#L576) failing in crash test. The LOG shows that there's a call to SetOptions() concurrent to ResumeImpl(). It's possible that while waiting for error recovery flush (with mutex released), SetOptions() failed to write to MANIFEST and added a file to be quarantined. This triggered the assertion failure when ResumeImpl() calls ClearBGError(). This PR fixes the issue by setting background error when SetOptions() fails to write to MANIFEST. Pull Request resolved: #13251 Test Plan: monitor future crash test failures. Reviewed By: hx235 Differential Revision: D67660106 Pulled By: cbi42 fbshipit-source-id: 1b52bb23005c4b544f8f9bceefd3b9dcbaf0edfa
1 parent e48ccc2 commit 159fa77

File tree

3 files changed

+9
-1
lines changed

3 files changed

+9
-1
lines changed

db/db_impl/db_impl.cc

+5-1
Original file line numberDiff line numberDiff line change
@@ -1299,11 +1299,15 @@ Status DBImpl::SetOptions(
12991299
VersionEdit dummy_edit;
13001300
s = versions_->LogAndApply(cfd, new_options, read_options, write_options,
13011301
&dummy_edit, &mutex_, directories_.GetDbDir());
1302+
if (!versions_->io_status().ok()) {
1303+
assert(!s.ok());
1304+
error_handler_.SetBGError(versions_->io_status(),
1305+
BackgroundErrorReason::kManifestWrite);
1306+
}
13021307
// Trigger possible flush/compactions. This has to be before we persist
13031308
// options to file, otherwise there will be a deadlock with writer
13041309
// thread.
13051310
InstallSuperVersionAndScheduleWork(cfd, &sv_context, new_options);
1306-
13071311
persist_options_status =
13081312
WriteOptionsFile(write_options, true /*db_mutex_already_held*/);
13091313
bg_cv_.SignalAll();

db/error_handler.cc

+2
Original file line numberDiff line numberDiff line change
@@ -573,6 +573,8 @@ Status ErrorHandler::ClearBGError() {
573573

574574
// Signal that recovery succeeded
575575
if (recovery_error_.ok()) {
576+
// If this assertion fails, it means likely bg error is not set after a
577+
// file is quarantined during MANIFEST write.
576578
assert(files_to_quarantine_.empty());
577579
Status old_bg_error = bg_error_;
578580
// old_bg_error is only for notifying listeners, so may not be checked

db/version_set.cc

+2
Original file line numberDiff line numberDiff line change
@@ -5939,6 +5939,8 @@ Status VersionSet::LogAndApply(
59395939
}
59405940
TEST_SYNC_POINT_CALLBACK("VersionSet::LogAndApply:WakeUpAndDone", mu);
59415941
#endif /* !NDEBUG */
5942+
// FIXME: One MANIFEST write failure can cause all writes to SetBGError,
5943+
// should only SetBGError once.
59425944
return first_writer.status;
59435945
}
59445946

0 commit comments

Comments
 (0)