-
Notifications
You must be signed in to change notification settings - Fork 92
Key Value Operations
At the end of this guide, you should be familiar with:
- Working with buckets and bucket properties
- Listing buckets and keys
- Fetching, storing, and deleting values
- Using value metadata
This and the other guides in this wiki assume you have Riak installed locally. If you don't have Riak already, please read and follow how to install Riak and then come back to this guide. If you haven't yet installed the client library, please do so before starting this guide.
This guide also assumes you know how to
connect the client to Riak. All examples assume
a local variable of client
which is an instance of Riak::Client
and points at some Riak node you want to work with.
Riak is a near-pure
key-value store, which
means that you have no tables, collections, or databases. Each key
stands on its own, independent of all the others. Keys are namespaced
in Buckets, which also
encapsulate properties which are common among all keys in the
namespace (for example, the replication factor n_val
). If you
conceived of Riak as a big distributed Hash
, it might look like
this:
{
(bucket + key) => value
}
What's missing from that picture:
- Each pair is given a specific spot within a cluster of Riak nodes, so any node in the cluster can find it without having to ask. This is also known as "consistent hashing".
- There are 3 copies of the pair by default.
- The
value
has metadata that you can manipulate as well as the raw value. - A key may have multiple values in the case of race-conditions and network errors. (Don't worry about this for now, but do read Resolving Conflicts when you're ready.)
Enough with the exposition, let's look at some data!
Since our keys are all grouped into buckets, in the Ruby client, we
get a Riak::Bucket
object before doing any key-value operations.
Here's how to get one:
client.bucket('guides')
# or
client['guides']
# => #<Riak::Bucket {guides}>
This gives a Riak::Bucket
object with the name 'guides'
that is
linked to the Riak::Client
instance.
"But wait", you say, "doesn't that bucket need to exist in Riak already? How do we know which bucket to request?" Buckets are virtual namespaces as we mentioned above; they have no schema, no manifest other than any properties you set on them (see below) and so you don't need to explicitly create them. In fact, the above code doesn't even talk to Riak! Generally, we pick bucket names that have meaning to our application, or are chosen for us automatically by another framework like Ripple or Risky.
If you are just starting out and don't know which buckets have data in them, you can use the "list buckets" feature. Note that this will give you a warning about its usage with a backtrace. You shouldn't run this operation in your application.
client.buckets
# Riak::Client#buckets is an expensive operation that should not be used in production.
# (irb):5:in `irb_binding'
# ... (snipped)
#
# => []
Looks like we don't have any buckets stored. Why? We haven't stored any data yet! Riak gets the list of buckets by examining all the keys for unique bucket names.
You can also list the keys that are in the bucket to know what's there. Again, this is another operation that is for experimentation only and has horrible performance in production. Don't do it.
client['guides'].keys
# Riak::Bucket#keys is an expensive operation that should not be used in production.
# (irb):10:in `irb_binding'
# ... (snipped)
#
# => []
You can "stream" keys through a block (where the block will be passed an Array of keys as the server sends them in chunks), which is slightly more efficient for large key lists, but we'll skip that for now. Check out the API docs for more information.
Earlier we alluded to bucket properties. If you want to grab the
properties from a bucket, call the props
method (which is also
aliased to properties
).
pp client['guides'].props
{"name"=>"guides",
"allow_mult"=>false,
"basic_quorum"=>false,
"big_vclock"=>50,
"chash_keyfun"=>{"mod"=>"riak_core_util", "fun"=>"chash_std_keyfun"},
"dw"=>"quorum",
"last_write_wins"=>false,
"linkfun"=>{"mod"=>"riak_kv_wm_link_walker", "fun"=>"mapreduce_linkfun"},
"n_val"=>3,
"notfound_ok"=>true,
"old_vclock"=>86400,
"postcommit"=>[],
"pr"=>0,
"precommit"=>[],
"pw"=>0,
"r"=>"quorum",
"rw"=>"quorum",
"small_vclock"=>50,
"w"=>"quorum",
"young_vclock"=>20}
There are a lot of things in this Hash that we don't need to care
about. The most commonly-used properties are detailed on the
Riak wiki. Let's set the
replication factor, n_val
.
client['guides'].props = {"n_val" => 5}
# or
client['guides'].n_val = 5
A number of the most common properties are exposed directly on the
Bucket
object like shown above. Note that you can pass an incomplete
Hash
of properties, and only the properties that are part of the
Hash
will be changed.
pp client['guides'].props
{"name"=>"guides",
"allow_mult"=>false,
"basic_quorum"=>false,
"big_vclock"=>50,
"chash_keyfun"=>{"mod"=>"riak_core_util", "fun"=>"chash_std_keyfun"},
"dw"=>"quorum",
"last_write_wins"=>false,
"linkfun"=>{"mod"=>"riak_kv_wm_link_walker", "fun"=>"mapreduce_linkfun"},
"n_val"=>5, # Here's the one we changed above
"notfound_ok"=>true,
"old_vclock"=>86400,
"postcommit"=>[],
"pr"=>0,
"precommit"=>[],
"pw"=>0,
"r"=>"quorum",
"rw"=>"quorum",
"small_vclock"=>50,
"w"=>"quorum",
"young_vclock"=>20}
The other bucket property we might care about is allow_mult
, which
allows your application to
detect and resolve conflicting writes. It is
also exposed directly:
client['guides'].allow_mult = true
Now let's fetch a key from our bucket:
client['guides'].get('key-value')
# or
client['guides']['key-value']
Depending on which protocol and backend you chose when connecting, you'll get an exception:
Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. not found
This means that the object does not exist (if you rescue the
Riak::FailedRequest
exception, its not_found?
method will return
true
, tell you that the error represents a missing key). If you want
to avoid the error for a key you're not sure about, you can check for
its existence explicitly:
client['guides'].exist?('key-value')
# => false
If you don't care whether the key exists yet or not, but want to start
working with the value so you can store it, use get_or_new
:
kv_guide = client['guides'].get_or_new('key-value')
# => #<Riak::RObject {guides,key-value} [application/json]:nil>
This gives us a new
[[Riak::RObject
|http://rdoc.info/gems/riak-client/Riak/RObject]] to
work with, which is a container for the value. In Riak's terminology,
the combination of bucket, key, metadata and value is called an
"object" -- please do not confuse this with Ruby's concept of an
Object. All "Riak objects" are wrapped by the Riak::RObject
class. Since this is a new object, the client assumes we want to store
Ruby data as JSON in Riak and so sets the content-type for us to
"application/json"
, which we can see in the inspect output. The
default value of the object is nil
. Let's set the data to something
useful:
kv_guide.data = [1,2,3,4,5]
# => [1, 2, 3, 4, 5]
Now we can persist that object to Riak using the store
method.
kv_guide.store
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]>
If we list the keys again, we can see that the key is now part of the
bucket (this time we use the bucket
accessor on the object instead
of going from the client object):
kv_guide.bucket.keys
# => ["key-value"]
Now let's fetch our object again:
client['guides']['key-value']
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]>
Assuming we're done with the object, we can delete it:
kv_guide.delete
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]>
client['guides'].exist?('key-value')
# => false
*Note: Deleting an RObject
will freeze the object, making
modifications to it impossible.
We mentioned before that every value in Riak also has metadata, and
the Riak::RObject
lets you manipulate it. The only one we've really
seen so far is the content type metadata, so let's examine that more
closely.
For the sake of interoperability and ease of working with your data, Riak requires every value to have a content-type. Let's look at our previous object's content type:
# We deleted the object earlier, so let's store in Riak again. Skip
# this step if you didn't do the deletion.
kv_guide = kv_guide.dup.store
# Now, onto the content type.
kv_guide.content_type
# => "application/json"
Under the covers, the Ruby client will automatically convert that to and from JSON when storing and retrieving the value. If we wanted to serialize our Ruby data as a different type, we can just change the content-type:
kv_guide.content_type = "application/x-yaml"
Now our object will be serialized to YAML. The Ruby client automatically supports JSON, YAML, Marshal, and plain-text serialization. (If you want to add your own, check out the Serializers guide.)
But what if the data we want to store is not a Ruby data type, but
some binary chunk of information that comes from another system. Not
to worry, you can bypass serializers altogether using the raw_data
accessors. Let's say I want to store a PNG image that I have on my
desktop. I could do it like so:
client['images'].new('riak.png').tap do |robject|
robject.content_type = 'image/png'
robject.raw_data = File.read('~/Desktop/riak.png')
robject.store
end
# => #<Riak::RObject {images,riak.png} [image/png]:(100294 bytes)>
When the client doesn't know how to deserialize the content type, it will simply display the byte size on inspection. Now here's a fun part: since I just stored an image, I can open it with my browser:
node = client.nodes.first
# Use open on Mac OS/X, your system may have something else, like `gopen`
system "open http://#{node.host}:#{node.http_port}/buckets/images/keys/riak.png"
You can also specify a bunch of free-form metadata on an RObject
using the meta
accessor, which is simply a Hash
. For example, if
we wanted to credit the PNG image we stored above to a specific
person, we could add that and it would not affect the value of the
object:
png = client['images']['riak.png']
png.meta['creator'] = "Bryan Fink"
png.meta['tool'] = 'Inkscape'
png.store
Now the next time we fetch the object, we'll get back that metadata too:
client['images']['riak.png'].meta
# => {"tool"=>["Inkscape"], "creator"=>["Bryan Fink"]}
The values come back as Arrays because HTTP allows multiple values per header, and user metadata is sent as HTTP headers.
The Vector clock is
Riak's means of internal accounting; that is, tracking different
versions of your data and automatically updating them where
appropriate. You don't usually need to worry about the vector clock,
but it is accessible on the RObject
as well:
png.vclock
# => "a85hYGBgzGDKBVIcypz/fvpP21GawZTInMfK4NHAfZIvCwA="
That vector clock will automatically be threaded through any
operations you perform directly on the RObject
(like store
,
delete
, and reload
) so that you don't have to worry about it.
Especially if you're using the HTTP interface, the last_modified
and
etag
are useful. When
reloading your object,
they will be used to prevent full fetches when the object hasn't
changed in Riak. They can also be used as a form of optimistic
concurrency control (with very weak guarantees, mind you) by setting
the prevent_stale_writes
flag:
png # previously loaded
png.etag
# => "\"3z8BJENgFyE9nYtWaLNcQR\""
png.last_modified
# => 2012-04-24 15:31:57 -0400
png2 = client['images']['riak.png']
png2.meta['date'] = Time.now.to_s
png2.store
# => #<Riak::RObject {images,riak.png} [image/png]:(100294 bytes)>
png.meta['date'] = Time.now.to_s
png.prevent_stale_writes = true
png.store
# Riak::HTTPFailedRequest: Expected [200, 204, 300] from Riak but received 412.
Riak prevented the stale write by sending a 412 Precondition Failed
response over HTTP.
You can also access the Secondary Indexes and
Links directly from the RObject
, but we won't cover
those here.
Congratulations, you finished the "Key-Value Operations" guide! After this guide, you can go beyond into more advanced querying methods, or take advantage of extended features of the client. Secondary Indexes are a very popular feature, and as all good Rubyists have thorough test suites, the Test Server is also a good next step.