Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

Commit

Permalink
Merge pull request #1 from eladamitpxi/dev-logstash2
Browse files Browse the repository at this point in the history
A new version for logstash 2.x
  • Loading branch information
eladamitpxi committed Apr 24, 2016
2 parents d7b9ec7 + ed85604 commit 2c11b76
Show file tree
Hide file tree
Showing 20 changed files with 1,701 additions and 316 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
Gemfile.lock
.bundle
vendor
/nbproject/private/
/nbproject/private/
.idea
coverage
tmp
.sonar
24 changes: 5 additions & 19 deletions CONTRIBUTORS
Original file line number Diff line number Diff line change
@@ -1,21 +1,7 @@
The following is a list of people who have contributed ideas, code, bug
reports, or in general have helped logstash along its way.
The following is a list of people who have contributed (in chronological order) ideas, code, bug
reports, or in general have helped this plugin along its way.

Contributors:
* Aaron Mildenstein (untergeek)
* Graham Bleach (bleach)
* John E. Vincent (lusis)
* Jordan Sissel (jordansissel)
* Kevin Amorin (kamorin)
* Kevin O'Connor (kjoconnor)
* Kurt Hurtado (kurtado)
* Mathias Gug (zimathias)
* Pete Fritchman (fetep)
* Pier-Hugues Pellerin (ph)
* Richard Pijnenburg (electrical)
* bitsofinfo (bitsofinfo)

Note: If you've sent us patches, bug reports, or otherwise contributed to
Logstash, and you aren't on the list above and want to be, please let us know
and we'll make sure you're here. Contributions from folks like you are what make
open source awesome.
* Oleg Tokarev (otokarev)
* Elad Amit (eladamitpxi, amitelad7)
* Valentin Fischer (valentinul)
4 changes: 3 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
#ruby=jruby
#ruby-gemset=logstash-output-cassandra
source 'https://rubygems.org'
gemspec
gemspec
144 changes: 92 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,104 @@
# Logstash Cassandra Output Plugin

This is a plugin for [Logstash](https://github.com/elasticsearch/logstash).
This is a plugin for [Logstash](https://github.com/elastic/logstash).

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

It was originally a fork of the [logstash-output-cassandra](https://github.com/otokarev/logstash-output-cassandra) plugin by [Oleg Tokarev](https://github.com/otokarev), which has gone unmaintained and went through a major re-design in this version we built.

## Usage

<pre><code>
output {
cassandra {
# Credentials of a target Cassandra, keyspace and table
# where you want to stream data to.
username => "cassandra"
password => "cassandra"
hosts => ["127.0.0.1"]
keyspace => "logs"
table => "query_log"
# List of Cassandra hostname(s) or IP-address(es)
hosts => [ "cass-01", "cass-02" ]

# The port cassandra is listening to
port => 9042

# The protocol version to use with cassandra
protocol_version => 4

# Cassandra consistency level.
# Options: "any", "one", "two", "three", "quorum", "all",
# "local_quorum", "each_quorum", "serial", "local_serial",
# "local_one"
# Options: "any", "one", "two", "three", "quorum", "all", "local_quorum", "each_quorum", "serial", "local_serial", "local_one"
# Default: "one"
consistency => "all"

# Where from the event hash to take a message
source => "payload"

# if cassandra does not understand formats of data
# you feeds it with, just provide some hints here
consistency => 'any'

# The keyspace to use
keyspace => "a_ks"

# The table to use (event level processing (e.g. %{[key]}) is supported)
table => "%{[@metadata][cassandra_table]}"

# Username
username => "cassandra"

# Password
password => "cassandra"

# An optional hints hash which will be used in case filter_transform or filter_transform_event_key are not in use
# It is used to trigger a forced type casting to the cassandra driver types in
# the form of a hash from column name to type name in the following manner:
hints => {
id => "int"
at => "timestamp"
resellerId => "int"
errno => "int"
duration => "float"
ip => "inet"}

# Sometimes it's usefull to ignore malformed messages
# (e.x. source contains nothing),
# in the case set ignore_bad_messages to True.
# By default it is False
ignore_bad_messages => true

# Sometimes it's usefull to ignore problems with a convertation
# of a received value to Cassandra format and set some default
# value (inet: 0.0.0.0, float: 0.0, int: 0,
# uuid: 00000000-0000-0000-0000-000000000000,
# timestamp: 1970-01-01 00:00:00) in the case set
# ignore_bad_messages to True.
# By default it is False
ignore_bad_values => true

# Datastax cassandra driver supports batch insert.
# You can define the batch size explicitely.
# By default it is 1.
batch_size => 100

# Every batch_processor_thread_period sec. a special thread
# pushes all collected messages to Cassandra. By default it is 1 (sec.)
batch_processor_thread_period => 1

# max max_retries times the plugin will push failed batches
# to Cassandra before give up. By defult it is 3.
max_retries => 3

# retry_delay secs. between two sequential tries to push a failed batch
# to Cassandra. By default it is 3 (secs.)
retry_delay => 3
ip => "inet"
}

# The retry policy to use (the default is the default retry policy)
# the hash requires the name of the policy and the params it requires
# The available policy names are:
# * default => retry once if needed / possible
# * downgrading_consistency => retry once with a best guess lowered consistency
# * failthrough => fail immediately (i.e. no retries)
# * backoff => a version of the default retry policy but with configurable backoff retries
# The backoff options are as follows:
# * backoff_type => either * or ** for linear and exponential backoffs respectively
# * backoff_size => the left operand for the backoff type in seconds
# * retry_limit => the maximum amount of retries to allow per query
# example:
# using { "type" => "backoff" "backoff_type" => "**" "backoff_size" => 2 "retry_limit" => 10 } will perform 10 retries with the following wait times: 1, 2, 4, 8, 16, ... 1024
# NOTE: there is an underlying assumption that the insert query is idempotent !!!
# NOTE: when the backoff retry policy is used, it will also be used to handle pure client timeouts and not just ones coming from the coordinator
retry_policy => { "type" => "default" }

# The command execution timeout
request_timeout => 1

# Ignore bad values
ignore_bad_values => false

# In Logstashes >= 2.2 this setting defines the maximum sized bulk request Logstash will make
# You you may want to increase this to be in line with your pipeline's batch size.
# If you specify a number larger than the batch size of your pipeline it will have no effect,
# save for the case where a filter increases the size of an inflight batch by outputting
# events.
#
# In Logstashes <= 2.1 this plugin uses its own internal buffer of events.
# This config option sets that size. In these older logstashes this size may
# have a significant impact on heap usage, whereas in 2.2+ it will never increase it.
# To make efficient bulk API calls, we will buffer a certain number of
# events before flushing that out to Cassandra. This setting
# controls how many events will be buffered before sending a batch
# of events. Increasing the `flush_size` has an effect on Logstash's heap size.
# Remember to also increase the heap size using `LS_HEAP_SIZE` if you are sending big commands
# or have increased the `flush_size` to a higher value.
flush_size => 500

# The amount of time since last flush before a flush is forced.
#
# This setting helps ensure slow event rates don't get stuck in Logstash.
# For example, if your `flush_size` is 100, and you have received 10 events,
# and it has been more than `idle_flush_time` seconds since the last flush,
# Logstash will flush those 10 events automatically.
#
# This helps keep both fast and slow log streams moving along in
# near-real-time.
idle_flush_time => 1
}
}
</code></pre>
Expand Down Expand Up @@ -106,4 +138,12 @@ bin/logstash -e 'output {cassandra {}}'
```

## TODO

* Fix the authentication bug (no user;pass in cassandra plugin?!)
* Finish integration specs
* it "properly works with counter columns"
* it "properly adds multiple events to multiple tables in the same bulk"
* Improve retries to include (but probably only handle Errors::Timeout and Errors::NoHostsAvailable):
* \#get_query
* \#execute_async
* Upgrade / test with logstash 2.3
* Upgrade / test with cassandra 3
Loading

0 comments on commit 2c11b76

Please sign in to comment.