A Ruby-on-Rails interface to Datastax Enterprise (specifically DSE Search nodes). Replaces the majority of ActiveRecord functionality.
This gem is based heavily on the excellent CassandraObject gem (github.com/data-axle/cassandra_object) as well as some work I initially did in the form of SolandraObject (github.com/jasonmk/solandra_object). We made the decision to move away from Solandra and to Datastax Enterprise, thus datastax_rails was born.
Significant changes from SolandraObject:
-
Cassandra communication is now entirely CQL-based
-
Solr communication is now handled directly via RSolr
-
Bifurcation of data is no longer necessary as DSE writes data to SOLR automatically
As of 2.4.0, DSE 4.7 is supported and live indexing is enabled by default. If you are using any version prior to DSE 4.7 or do not wish to use live indexing, you will want to override the following setting:
DatastaxRails::Base.live_indexing = false
Before using this gem, you should probably take a strong look at the type of problem you are trying to solve. Cassandra is primarily designed as a solution to Big Data problems. This gem is not. You will notice that it still carries a lot of relational logic with it. We are using DSE to solve a replication problem more so than a Big Data problem. That’s not to say that this gem won’t work for Big Data problems, it just might not be ideal. You’ve been warned…
First add it to your Gemfile:
gem 'datastax_rails'
Configure the config/datastax.yml file:
development: servers: ["127.0.0.1"] port: 9042 ssl: true keyspace: "<my_app>_development" strategy_class: "org.apache.cassandra.locator.NetworkTopologyStrategy" strategy_options: {"DC1": "1"} connection_options: timeout: 10 solr: port: 8983 path: /solr
The above is configured to use NetworkTopologyStrategy. If you go with this, you’ll need to configure Datastax to use the NetworkTopologySnitch and set up the cassandra-topology.properties file. See the Datastax documentation for more information.
For a more simple, single datacenter setup, something like this should probably work:
development: servers: ["127.0.0.1"] port: 9042 ssl: false keyspace: "datastax_rails_development" strategy_class: "org.apache.cassandra.locator.SimpleStrategy" strategy_options: {"replication_factor": "1"} connection_options: timeout: 10 solr: port: 8983 path: /solr
See DatastaxRails::Connection::ClassMethods for a description of what options are available.
Create your keyspace:
rake ds:create rake ds:create RAILS_ENV=test
Once you’ve created some models, the following will upload the solr schemas and create the column families:
rake ds:migrate rake ds:migrate RAILS_ENV=test
It is safe to run ds:migrate over and over. In fact, it is necessary to re-run it any time you change the attributes on any model. DSR will only upload schema files if they have changed.
Create a sample Model. See Base documentation for more details:
class Person < DatastaxRails::Base uuid :id string :first_name string :user_name text :bio date :birthdate boolean :active timestamps end
In addition to the simple model above, there are several special case models that take advantage of special features of Cassandra. For examples of using those, see the following classes:
-
{DatastaxRails::WideStorageModel}
-
{DatastaxRails::PayloadModel}
-
{DatastaxRails::DynamicModel}
When doing a grouped query, the total groups reported is only the total groups included in your query, not the total matching groups (if you’re paginating).
This is due to the way solr sharding works with grouped queries.
The documentation for {DatastaxRails::Base} and {DatastaxRails::SearchMethods} will give you quite a few examples of things you can do. You can find a copy of the latest documentation on line at rdoc.info/github/jasonmk/datastax_rails/master/frames.