Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burst Detection Monoid #162

Open
sritchie opened this issue May 31, 2013 · 4 comments
Open

Burst Detection Monoid #162

sritchie opened this issue May 31, 2013 · 4 comments

Comments

@sritchie
Copy link
Collaborator

I'd like to see a monoid for detecting bursts in streams of data:

http://www.cs.cornell.edu/home/kleinber/bhs.pdf

/cc @johnynek

@avibryant
Copy link
Contributor

Neat paper. Is this different from fitting a piecewise-constant model to the rate at which the emails come? (I guess, to the parameter of a poisson process generating the emails?) I've had good luck with that kind of model in the past, but not sure how to implement it as a monoid...

@johnynek
Copy link
Collaborator

johnynek commented Jun 2, 2013

Actually that's an excellent point, Avi. I bet you can equate this algorithm to fitting piecewise linear with a regularization parameter (since the exponential model gives a constant rate of arrival between each transition).

This probably can be done as a moniod in the same way that IO is a monad: you use the list Monoid along with two data types: BurstHistory, ArrivalTime. So you combine ArrivalTimes into a list, but when the list gets long enough, we fit a history to it and replace the fitted points with a history of rate changes.

I think something like this could work, but you don't really want to keep the whole history of slope changes, instead you want to output the ones that can't change the future anymore, and only keep the last point for further fitting.

Avi: have you seen (even approximate) methods for doing a streaming fit to the data to the piecewise linear fit?

@avibryant
Copy link
Contributor

Re constant rate of arrival between each transition - that's why I said piecewise-constant, not piecewise-linear. Right?
My experience is that this kind of piecewise-constant model is great on short time scales (like looking at a day or week of traffic), or sometimes on single keys (ie, all traffic from referrer X), but bad for long time scales of aggregate traffic (where you might see steady or exponential growth and this is going to model it as a staircase).

I've never seen a true streaming fit - it always involves buffering up a window, like you describe... once you have a window there are a million ways to do the fit.... this paper's cost function looks pretty plausible (in particular I like the asymmetry of it being expensive to change the rate upwards but free to change the rate downwards).

Somewhat related and maybe interesting: http://www2.research.att.com/~sen/pub/nad-imc03.pdf keeps a Count-Min-Sketch for each time period, and then shows how to combine a series of historical CMS to produce a CMS of the moving average, or the Holt-Winters forecast, etc; then you can use that as the basis for alerting on the current CMS.

(Actually they don't use min() as their estimator so it's not quite a CMS, they do something slightly fancier, but I think the data structure is identical).

@sritchie
Copy link
Collaborator Author

Here's a working link for that approach: https://www.cs.utexas.edu/~yzhang/papers/nad-imc03.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants