-
Notifications
You must be signed in to change notification settings - Fork 0
Requirements
The main point in Logly (log crawler using ruby and NoSql as a backend) is a lack of free or open source solutions. Splunk is the best what you could find but it cost enormous amount of money.
Preliminary requirements:
- Cross platform (work on Windows and Solaris)
- Provide an ability to search through logs for particular string
- Store log files somewhere and provide an ability to view not only current log files but also files which were already rotated
It should be something like Splunk – you should be able to view logs right from web frontend without logging into actual boxes and also you should be able to view logs which were cleaned up (like yesterday logs). Also there should be a search functionality which should search for inputted string across all logs (actual and historical) and show a list of found log lines and links to those log lines to investigate.
Some additional requirements:
- Search index and log storage should be distributed in order to make use of all available boxes and space (we have for instance 3 development boxes with 10Gb space and we need to use them all because or logs are 20Gb each day).
Technical aspect:
- Make use of Ruby?
- Make use of NoSql?
- Work on Windows and Solaris
- Sharding of data across nodes
- Storing data in compact format
MongoDB is a document-oriented database written in C++.
Pros:
- Has good and quality drivers for Ruby and other languages.
- Supports sharding (via collections sharding).
- Ability to create capped data.
- Supports sorting.
- Fast on inserts.
- As a document-oriented database is well suited for saving various log messages.
Cons:
- (showstopper) Doesn’t work on Solaris.
- Has 2Gb size limitation on 32 bit OS’es.
CouchDB is a document-oriented database written in Erlang.
Pros:
- Possibly works on Solaris.
- Supports querying for ranges
- Supports sorting (sorts only by index keys).
- Has frameworks for Ruby.
- As a document-oriented database is well suited for saving various log messages.
Cons:
- Possibly a disk space hog.
- Possibly best suited for small amount of data.
- (showstopper) Unofficial sharding support (via smart proxy, collections are distributed randomly across shards).
- Slow on inserts.
Cassandra is a BigTable variation written in Java.
Pros:
- Supports sharding (via partitioning).
- Supports querying for ranges.
- Supports sorting (sorting is defined in schema).
- Has frameworks for Ruby.
Cons:
- Absence of a good documentation.
- Disk space hog (200Mb of logs are stored in Cassandra as 550Mb files)
- Semi-automatic load balance (requires loadbalance tool to be run frequently to rebalance the cluster).
- Schema driven.
HBase is a BigTable variation written in Java running on Hadoop.
Pros:
- Supports compression (via hadoop).
- Supports querying for ranges (via scanners).
- Supports sorting (sorting is defined in schema).
- Supports sharding (via regions).
- Flexible schema (very similar to document-oriented database).
Cons:
- Absence of a good documentation.
- Requires cygwin on Windows.
- Lack of good Ruby frameworks.
- Keys are sorted as bytes (log10 is lower that log1).
- Requires many efforts to setup and deploy.
HDFS is a distributed files storage written in Java as part of Hadoop.
Pros:
- Perfectly fits for storing unstructured data in a distributed environment.
- Supports compression.
- Supports querying for ranges (via File API).
- Supports sharding (via regions).
Cons:
- Absence of a good documentation.
- Requires cygwin on Windows.
- API exists only for Java.
- Requires some efforts to setup and deploy.