-
Notifications
You must be signed in to change notification settings - Fork 258
Data structures
TiedotDatabase # A database "TiedotDatabase" ├── CollectionA # A collection called "CollectionA" │ ├── Book!Author!Name # An index on path "Book" -> "Author" -> "Name" │ │ ├── 0 # Index data partition 0 │ │ └── 1 # Index data partition 1 │ ├── dat_0 # Document data partition 0 │ ├── dat_1 # Document data partition 1 │ ├── id_0 # Document ID lookup table for partition 0 │ └── id_1 # Document ID lookup table for partition 1 ├── CollectionB # Another collection called "CollectionB" │ ├── Day!Temperature!High │ │ ├── 0 │ │ └── 1 │ ├── dat_0 │ ├── dat_1 │ ├── id_0 │ └── id_1 └── number_of_partitions
Collection data file contains document data. Every document has a binary header and UTF-8 text content. The file has an initial size (32MB) and will grow beyond the initial size (by 32MB incrementally) to fit more documents.
New documents are inserted to end-of-data position, and they are left with room for future updates and size growth. Every document is assigned to a randomly generated, practically unique document ID, which also decides into which partition the document goes.
Updating document usually happens in-place, however if there is not pre-allocated enough room for the updated version, the document has to be deleted and re-inserted; document ID remains the same.
Deleted documents are marked as deleted, the wasted space is recovered in the next scrub operation.
Type | Size (bytes) | Description | |
---|---|---|---|
Byte (8 bit signed integer) | 1 | Validity | 0 - deleted, 1 - valid |
Signed 64-bit integer | 10 | Allocated room | How much room is left for the document |
Char Array | Size of document content | Document content | Encoded in UTF-8 |
Char Array | Allocated room - size of document | Padding (UTF-8 spaces) | Room for future updates, for the document to grow its size |
Hash table file contains binary content; it implements a static hash table made of hash buckets and integer entries.
Every bucket has a fixed number of entries. When a bucket becomes full, a new bucket is chained to it in order to store more entries. Every entry has an integer key and value.
An entry key may have multiple values assigned to it, however the combination of entry key and value must be unique across the entire hash table.
Type | Size (bytes) | Description | |
---|---|---|---|
Signed 64-bit integer | 10 | Next chained bucket number | When a bucket is the last in its chain, this number is 0. |
Bucket Entry | 21 * number of entries per bucket | Bucket entries | See "Bucket entry format" |
Type | Size (bytes) | Description | |
---|---|---|---|
Byte (signed 8-bit integer) | 1 | Validity | 0 - deleted, 1 - valid |
Signed 64-bit integer | 10 | Key | Entry key |
Signed 64-bit integer | 10 | Value | Entry value |