Skip to content

Event driven groupware archival system for e-discovery and data loss prevention .

Notifications You must be signed in to change notification settings

kolab-groupware/egara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Egara is an event driven groupware archival system for e-discovery and data loss prevention. The name means "storehouse" in Sumerian.

It is designed to be used with the Kolab Groupware server and is written (primarily) in Erlang.

== Functional Overview

Egara is a parallelized pipeline that accepts events, normalizes them by standardizing and finalizing the data passed in, and then stores the resulting data in a centralized key/value store for later processing and retrieval.

This implies a three-stage system:

0. Listen for incoming events and normalize data if needed
1. Gather any additional data implied by the event and add it to the data
2. Store the final dataset

Incoming events are accepted by "incoming handlers" which subsequently process those into a normalized set of Erlang terms. For example, the cyrus imap handler listens for notifications on a local dgram socket, parses the included json into Erlang terms and passes the results on to the egara notifications receiver process. At this point the events are required to be in a normalized form. The receiver process then inserts the event into a queue for further processing.

Worker processes claim items in the queue, identify any missing data (such as user information), add the missing data and finally store the results. A worker may also decide to defer processing of the event or even drop the event as uninteresting.

Unless dropped by the worker, the event is removed from the incoming queue. This is also handled by the worker process, allowing for a linear pipeline of processing per event.

This system can be extended at the following points:

* additional sources of events can be added as incoming handlers
* workers can have new processing logic added

Access to certain resources, along with the workers themselves, are handled by processes kept in stateful pools. Currently these pools include:

* IMAP backchannel to cyrus imap
* LDAP connections
* Riak connections
* Workers

Once the data is stored in the key/value store it can be queried via a web service or direct access to the Riak cluster. Additional processing may occur at this point, however event data stored is considered immutable: once written they may never be changed. The only alteration to events that is allowed for is deletion.

This paves the way for collecting events from multiple sources to be used for eDiscovery, data loss prevention and usage analytics.

Currently, Egara focuses on capturing the following event data:

* new messages in the Kolab IMAP store (email, calendaring, etc.)
* deletions and modifications of message in the Kolab IMAP store
* folder creation, deletion and modifications
* user sessions

== Building

On the first build run make deps-up then apply the patches in the deps_diffs directory. (Mostly this adjusts the dependencies in these applications to be in line with each other. Eventually this will be either automated or fixed in upstream repositories; probably the latter but for now this is convenient as dependencies are still shifting at this stage of development.)

Note that the riak diff requires using git apply rather than the usual patch command line utility. Also, you may need to manually go into deps/lager and perform a make there before the top-level make will work. Ah, dependency hell. ;)

Next, simply run make. This should get you all the dependencies required and build the application which can be run from in the base directory with make run.

After the first build, make egara is faster and should be all that is needed unless the dependencies need updating.

== Configuration

=== Event Selection

=== LDAP

=== Riak

deployment/riak.sh

=== Event Receivers

==== Cyrus

== Usage

About

Event driven groupware archival system for e-discovery and data loss prevention .

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages