Skip to content

Set of Hadoop, Spark and Storm based tools for web and customer analytic

Notifications You must be signed in to change notification settings

pranab/visitante

Repository files navigation

Introduction

The original goal of visitante was to calculate various web analytic metric as defined by Avinash Kaushik (http://www.kaushik.net/avinash/) on the Hadoop, Spark and Storm platform. However, it has evolved into a general purpose log analytic and mining solution, beyond web server logs.

It also includes customer or marketing analytic solution. Since customer behavior data is mostly captured in logs, there is a close relationship between customer analytics and log analytics. Recently search analytics solutions have also been added

Philosophy

  • Simple and easy to use batch and real time web analytic
  • Highly configurable

Blogs

The following blogs of mine are good source of details of visitante

Solutions

  • Hadoop based batch analytic for

    • Num of pages visited
    • Total time spent
    • Last page visited
    • Flow status (e.g., whether checkout flow was entered, entered but not completed or completed)
    • Incident detection
    • Pattern based event detection with context
    • Customer life time value
  • Storm based real time analytic for

    • Bounce rate
    • Visit depth distribution

Build

For Hadoop 1

  • mvn clean install

For Hadoop 2 (non yarn)

  • git checkout nuovo
  • mvn clean install

For Hadoop 2 (yarn)

  • git checkout nuovo
  • mvn clean install -P yarn

For spark

  • Build chombo first in master branch with
    • mvn clean install
    • sbt publishLocal
  • Build chombo-spark in chombo/spark directory
    • sbt clean package

Need help?

Please feel free to email me at [email protected]

Contribution

Contributors are welcome. Please email me at [email protected]

About

Set of Hadoop, Spark and Storm based tools for web and customer analytic

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published