-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Tailing Iterator
Since version 2.7, RocksDB supports a special type of iterator (named tailing iterator) optimized for a use case in which new data is read as soon as it's added to the database. Its key features are:
- Tailing iterator doesn't create a snapshot when it's created. Therefore, it can also be used to read newly added data (whereas ordinary iterators won't see any records added after the iterator was created).
- It's optimized for doing sequential reads -- it might avoid doing potentially expensive seeks on SST files and immutable memtables in many cases.
To enable it, set ReadOptions::tailing
to true
when creating a new iterator. Note that tailing iterator currently supports only moving in the forward direction (in other words, Prev()
and SeekToLast()
are not supported).
A tailing iterator provides a merged view of two internal iterators:
- a mutable iterator, used to access current memtable contents only
- an immutable iterator, used to read data from SST files and immutable memtables
Both of these internal iterators are created by specifying kMaxSequenceNumber
, effectively disabling filtering based on internal sequence numbers and enabling access to records inserted after the creation of these iterators.
In addition, each tailing iterator keeps track of the database's state changes (such as memtable flushes and compactions) and invalidates its internal iterators when it happens. This enables it to always be up-to-date.
Since SST files and immutable memtables don't change, a tailing iterator can often get away by performing a seek operation only on the mutable iterator. For this purpose, it maintains the interval
(prev_key, current_key]
currently covered by the immutable iterator (in other words, there are no records with key k
such that prev_key < k < current_key
neither in SST files nor immutable memtables).
Therefore, when Seek(target)
is called and target
is within that interval, immutable iterator is already at the correct position and it is not necessary to move it.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc