Skip to content

Remote Compaction (Experimental)

Jay Zhuang edited this page Oct 8, 2021 · 5 revisions

Remote Compaction feature enables the user to run the compaction remotely, it could be a different process or even on a remote host. It separates the background compaction from the primary host, which has performance benefits and improves the flexibility. Especially if the compactions are offloaded to a remote host, there won't be background compaction job impacting the read/write requests. And on the remote host that dedicated for compaction, it can be tuned only for compaction and used for running compactions from different DBs. Currently, the remote host has to access to the DB to run the compaction.

Here is an overview of Remote Compaction feature:

1. Schedule

The first step is primary DB triggers the compaction, instead of running the compaction locally, it sends the compaction information to a callback in CompactionService. The user needs to implement the CompactionService::Start(), which sends the the compaction information to a remote process to schedule the compaction.

2. Compact

On the remote Compaction Worker side, it needs to run DB::OpenAndCompact() with the compaction information sent from the primary. Based on the compaction information, the worker open the DB in read-only mode and run the compaction. The compaction worker cannot change the LSM tree, it outputs the compaction result to a temporary location that the user needs to set.

3. Return Result

Once the compaction is done, the compaction result needs to be sent back to primary, which includes the metadata about the compacted SSTs and some internal information. The same as scheduling, the user needs to implement the communication between primary and compaction workers.

4. Install & Purge

The primary is waiting for the result by callback CompactionService::WaitForComplete(). The result should be passed to that API and return function call. After that, the primary will install the result by renaming the result SST files in temporary workplace to the LSM files. Then the compaction input files will be purged. As RocksDB is renaming the result SST files, make sure the temporary workplace and the DB are on the same file system. If not, the user needs to copy the file to the DB file system before returning the WaitForComplete() call.

Here is the overview of the API between Primary and Compaction Worker. The Compaction Service part needs to be implemented by the user and set by Options.CompactionService.

Contents

Clone this wiki locally