Skip to content
mtrencseni edited this page Aug 23, 2011 · 1 revision

ScalienDB 2.0 is a key-value store, which means data is fundamentally organized into

key => value

pairs. Both key and value can be any binary data, although it's recommended to keep the keys human-readable, eg. ASCII or UTF-8 encoded.

Additionally, data is organized into databases and tables, similar to SQL databases. Databases contain tables, tables contain key-value pairs:

+
|
+-- database1
|   |
|   +-- table1
|   |    |
|   |    +-- key1 => value1
|   |    +-- key2 => value2
|   |    ..
|   |
|   +-- table2
|        |
|        +-- key1 => value1
|        +-- key2 => value2
|        ..
|
+-- database2
    |
    +-- table1
    |    |
    |    +-- key1 => value1
    |    +-- key2 => value2
    |    ..
    |
    +-- table2
         |
         +-- key1 => value1
         +-- key2 => value2
         ..

Unlike SQL databases, ScalienDB does not support indices. This means that indices have to be maintained by hand in separate tables.

For example, if storing comments in ScalienDB you create a table comments like:

 comment_id => comment_object

Eg.

 56837 => { user_id: 345, user_name: "johnf", text: "Hello world.", date: "2011-08-10 17:55:45"}

Suppose you want to retrieve all comments by a certain user, identified by user_id. To support this query, you create a table index_comments_user_id like:

user_id => list of comment_ids

The index entry for the comment above is

345 => { 56837 , ... }

This organization of data and indices has both pros and cons. The pros are related to sharding since we get a finer control over how to organize data and its indices, which is especially important when dealing with large data sets. For example, designing a Gmail-like service, we can choose to

  • Keep all per-user data (messages, meta-data, indices) together by using prefixes in one table. This will result in high physical locality both on disk and in the cluster when sharding.
  • Keep different kinds of data into different tables but keep related data on the same servers by appropriate placement of shards into quorums.
  • Keep messages on one set of servers and indices on another set, eg. have "servers for messages", "servers for indices", and so on.

The cons are that we have to maintain indices by hand, and (until transactions are introduced in a future version) the possibility of indices and object data going out of sync.

To address these issues, the next version of ScalienDB will support a richer data model, indices and transactions.