Skip to content

Conversation

@dantengsky
Copy link
Member

@dantengsky dantengsky commented Sep 29, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

🤖 Generated with Claude Code

This PR implements a critical data integrity feature that prevents undrop operations on tables whose data may have been partially or fully cleaned up by vacuum processes. The core principle is: once vacuum has started for a retention period, tables dropped before that period can never be restored, ensuring users cannot accidentally restore tables with incomplete or inconsistent data.

Problem Statement

Previously, there was a dangerous race condition where:

  1. User drops a table
  2. Vacuum process starts cleaning up the table's data
  3. User attempts undrop while vacuum is in progress
  4. Undrop succeeds but table data is incomplete/corrupted

This could lead to silent data corruption and inconsistent database state.

Design Philosophy

Tenant-Level Global Watermark Design

We chose a tenant-level global vacuum watermark approach instead of per-table or per-database granularity for several key reasons:

  1. Simplicity & Clarity: A single timestamp per tenant is easier to reason about, implement, and maintain
  2. Sufficient for Use Case: Vacuum operations are typically tenant-wide administrative tasks, making global coordination natural
  3. Consistent Semantics: All undrop operations within a tenant follow the same rules, reducing cognitive overhead
  4. Performance: Single KV read/write per tenant vs. potentially thousands for per-table tracking
  5. Smooth Evolution Path: The current design can be extended to table/DB-level granularity without breaking changes

Implementation Approach

Core Mechanism: Monotonic Timestamp Protection

VacuumWatermark {
    time: DateTime<Utc>, // Monotonically increasing, never decreases
}
  • Vacuum Phase: Sets timestamp when vacuum starts (retention_time = now() - retention_days)
  • Undrop Phase: Compares table's drop_time against vacuum timestamp
  • Safety Rule: drop_time <= vacuum_timestamp → undrop FORBIDDEN

Atomic Operations & Concurrent Safety

Monotonic Timestamp Updates: Uses crud_upsert_with with CAS semantics to ensure vacuum watermark only advances forward, preventing timestamp rollback.

Safety & Behavior

Protection Matrix (Sample Scenarios)

Scenario Vacuum Watermark Table Drop Time Undrop Result Reason
Pre-vacuum State None (never set) Any time ALLOWED No vacuum has run yet - data guaranteed safe
Post-vacuum Risk Set to 2023-12-01 (example) Dropped 2023-11-20 (example) BLOCKED Drop predates vacuum - data may be cleaned
Post-vacuum Safe Set to 2023-12-01 (example) Dropped 2023-12-05 (example) ALLOWED Drop postdates vacuum - data guaranteed intact

Safety Guarantees

Tables whose data may have been cleaned by vacuum processes cannot be restored via undrop, preventing restoration of incomplete or corrupted data.

Technical Implementation

Data Structures

// Rust structure
pub struct VacuumWatermark {
    pub time: DateTime<Utc>,
}
// Protobuf serialization (v152)
message VacuumWatermark {
  uint64 ver = 100;
  uint64 min_reader_ver = 101;
  string time = 1;         // Timestamp string
}

MetaStore Storage

Key Format:   __fd_vacuum_watermark_ts/{tenant_name}
Key Example:  __fd_vacuum_watermark_ts/default
Value Type:   VacuumWatermark (protobuf serialized)
Scope:        Global per tenant (one watermark per tenant)

Integration Points

  1. Vacuum Trigger: VacuumDropTablesInterpreter::execute2() sets watermark before cleanup
  2. Protection Check: handle_undrop_table() validates drop_time vs watermark

Critical Flow

1. VACUUM DROP TABLE → Set watermark (fail-safe: abort if fails)
2. Data cleanup proceeds only after watermark is established
3. UNDROP TABLE → Check drop_time <= watermark → REJECT if true

Concurrent Safety Example

Race condition protection during undrop:

1. `Undrop` operation reads watermark with seq=N
2. Concurrent `vacuum` updates watermark (seq=N+1)
3. KV transaction submitted by `Undrop` operation fails due to seq mismatch → Safe abort

Timeline Example

Timeline (Scenario: data_retention_time_in_days = 30):

Oct-15        Nov-01       Nov-20       Dec-01       Jan-05
│             │            │            │            │
TableA        │            TableB       VACUUM       UNDROP
Dropped       │            Dropped     EXECUTION    Requests
│             │                         (sets        │
│             │                         watermark    │
│             │                         = Nov-01)    │
│             │                                      │
│             │                                      └─ TableA: ❌ BLOCKED
│             │                                         (Oct-15 ≤ Nov-01)
│             │
│             │                                      └─ TableB: ✅ ALLOWED
│             │                                         (Nov-20 > Nov-01)
│             │
│             └─ Watermark boundary
│                (retention cutoff)
│
└─ TableA dropped before watermark
   (data potentially cleaned)

Note: Watermark = vacuum_execution_time - data_retention_time_in_days

Test Coverage

  • Unit Tests: Core API behavior and monotonic property validation
  • Integration Tests: End-to-end vacuum-undrop workflows
  • Concurrency Tests: Race condition handling validation
  • Compatibility Tests: Protobuf serialization/deserialization (v152)
  • Error Handling: Failure mode validation and fail-safe behavior

Files Modified

Core Implementation

  • src/meta/api/src/garbage_collection_api.rs - Vacuum watermark timestamp management
  • src/meta/api/src/schema_api.rs - Undrop protection logic with concurrent safety
  • src/query/service/src/interpreters/interpreter_vacuum_drop_tables.rs - Integration point for vacuum operations

Data Model & Serialization

  • src/meta/app/src/schema/vacuum_watermark.rs - Core VacuumWatermark structure
  • src/meta/app/src/schema/vacuum_watermark_ident.rs - Storage identifier
  • src/meta/proto-conv/src/vacuum_watermark_from_to_protobuf_impl.rs - Protobuf conversion
  • src/meta/protos/proto/vacuum_watermark.proto - Protobuf definition

Error Handling

  • src/meta/app/src/app_error.rs - UndropTableRetentionGuard error handling
  • src/common/exception/src/exception_code.rs - New error code for vacuum protection

Tests

  • src/meta/api/src/schema_api_test_suite.rs - Comprehensive test coverage
  • src/meta/proto-conv/tests/it/v152_vacuum_retention.rs - Backward compatibility tests

Migration Safety

  • No Breaking Changes: Existing functionality preserved when no vacuum watermark exists
  • Backward Compatible: Protobuf v152 maintains compatibility with existing deployments
  • Graceful Migration: Systems without watermarks continue to work normally
  • Safe Rollback: Can be disabled without data loss or corruption

Performance Impact

  • Minimal Overhead: Single KV read/write per tenant during vacuum operations
  • Efficient Storage: Compact protobuf representation for watermark timestamps
  • Fast Validation: Simple timestamp comparison for undrop protection
  • No Query Impact: Zero performance impact on normal table operations

This implementation provides robust data integrity protection while maintaining performance and operational simplicity. The tenant-level design offers a balanced approach between safety, simplicity, and future extensibility.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Sep 29, 2025
@dantengsky dantengsky marked this pull request as ready for review October 10, 2025 06:36
@dantengsky dantengsky requested a review from SkyFan2002 October 10, 2025 06:36
@dantengsky dantengsky force-pushed the feat/irreversible-vacuum-drop-table branch from 871dd71 to 320e0fc Compare October 14, 2025 11:39
@dantengsky
Copy link
Member Author

@drmingdrmer Meta proto upgraded to v152 with new VacuumWatermark message type, please have a look.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 12 of 15 files at r1, all commit messages.
Reviewable status: 12 of 15 files reviewed, all discussions resolved

@dantengsky dantengsky force-pushed the feat/irreversible-vacuum-drop-table branch from 197e889 to 0067f1c Compare October 16, 2025 05:53
@dantengsky dantengsky merged commit 692f0ce into databendlabs:main Oct 16, 2025
87 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants