Skip to content

Latest commit

 

History

History
53 lines (34 loc) · 1.36 KB

README.org

File metadata and controls

53 lines (34 loc) · 1.36 KB

Approximate frequency counts over data streams for Go

Countish implements two approximate counting algorithms outlined in “Approximate Frequency Counts over Data Streams”.

http://www.vldb.org/conf/2002/S10P03.pdf

Use cases

Have you ever needed to do something like calculate the top URLs or top ips from an infinite stream? This package provides probabalistic frequency counters, with accuracy guarantees and low memory usage.

countish provides an extremely simple interface consisting of an “Observe” method and a “ItemsAboveThreshold” method.

Example:

cat urls.txt | go run ./cmd/countish/main.go -threshold .3
0.428671 /

3 counting implementations are provided.

  1. Naive: exact counts are held in a map
  2. Lossy: corresponding to “lossy counting”
  3. Sticky: corresponding to “sticky sampling”

Example:

counter := countish.NewLossyCounter(.01, .01)
for i:=0;i<100;i++ {
    counter.Observe("value")
}
counter.Observe("another value")
// print all items which *might* occur more than 90% of the time,
// along with their estimated frequency
entries := counter.ItemsAboveThreshold(.9)
fmt.Println(entries)
[{value 1.00009900990099}]

examples showing memory usage comparisons