feat: implement irreversible vacuum drop table protection #18809
+461
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
🤖 Generated with Claude Code
This PR implements a critical data integrity feature that prevents undrop operations on tables whose data may have been partially or fully cleaned up by vacuum processes. The core principle is: once vacuum has started for a retention period, tables dropped before that period can never be restored, ensuring users cannot accidentally restore tables with incomplete or inconsistent data.
Problem Statement
Previously, there was a dangerous race condition where:
This could lead to silent data corruption and inconsistent database state.
Design Philosophy
Tenant-Level Global Watermark Design
We chose a tenant-level global vacuum watermark approach instead of per-table or per-database granularity for several key reasons:
Implementation Approach
Core Mechanism: Monotonic Timestamp Protection
retention_time = now() - retention_days)drop_timeagainst vacuum timestampdrop_time <= vacuum_timestamp→ undrop FORBIDDENAtomic Operations & Concurrent Safety
Monotonic Timestamp Updates: Uses
crud_upsert_withwith CAS semantics to ensure vacuum watermark only advances forward, preventing timestamp rollback.Safety & Behavior
Protection Matrix (Sample Scenarios)
Safety Guarantees
Tables whose data may have been cleaned by vacuum processes cannot be restored via undrop, preventing restoration of incomplete or corrupted data.
Technical Implementation
Data Structures
MetaStore Storage
Integration Points
VacuumDropTablesInterpreter::execute2()sets watermark before cleanuphandle_undrop_table()validates drop_time vs watermarkCritical Flow
Concurrent Safety Example
Race condition protection during undrop:
Timeline Example
Timeline (Scenario: data_retention_time_in_days = 30):
Test Coverage
Files Modified
Core Implementation
src/meta/api/src/garbage_collection_api.rs- Vacuum watermark timestamp managementsrc/meta/api/src/schema_api.rs- Undrop protection logic with concurrent safetysrc/query/service/src/interpreters/interpreter_vacuum_drop_tables.rs- Integration point for vacuum operationsData Model & Serialization
src/meta/app/src/schema/vacuum_watermark.rs- Core VacuumWatermark structuresrc/meta/app/src/schema/vacuum_watermark_ident.rs- Storage identifiersrc/meta/proto-conv/src/vacuum_watermark_from_to_protobuf_impl.rs- Protobuf conversionsrc/meta/protos/proto/vacuum_watermark.proto- Protobuf definitionError Handling
src/meta/app/src/app_error.rs- UndropTableRetentionGuard error handlingsrc/common/exception/src/exception_code.rs- New error code for vacuum protectionTests
src/meta/api/src/schema_api_test_suite.rs- Comprehensive test coveragesrc/meta/proto-conv/tests/it/v152_vacuum_retention.rs- Backward compatibility testsMigration Safety
Performance Impact
This implementation provides robust data integrity protection while maintaining performance and operational simplicity. The tenant-level design offers a balanced approach between safety, simplicity, and future extensibility.
Tests
Type of change
This change is