Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement standard pattern for updateInto #291

Open
avibryant opened this issue Mar 20, 2014 · 5 comments · May be fixed by #294
Open

implement standard pattern for updateInto #291

avibryant opened this issue Mar 20, 2014 · 5 comments · May be fixed by #294

Comments

@avibryant
Copy link
Contributor

HyperLogLog has an updateInto method which updates a mutable buffer representing its current state, instead of doing immutable operations. This allows for an efficient implementation of sumOption. It would be helpful to standardize this so that we could use it from eg SummingCache. see #290

https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L241

@jnievelt
Copy link
Contributor

Something like this?

trait Accumulator[T] {
  def apply(t: T): Unit
  def get: T
}

trait Semigroup[T] { sg =>
  ...
  def accumulator(initial: T): Accumulator[T] = new Accumulator[T] {
    var item = initial
    def apply(t: T) {
      item = sg.plus(item, t)
    }
    def get = item
  }
  def sumOption(s: Seq[T]): Option[T] = s match {
    case head :: tail => {
      val acc = accumulator(head)
      tail foreach acc
      Some(acc.get)
    }
    case _ => None
  }
}

@ianoc
Copy link
Collaborator

ianoc commented Mar 20, 2014

I think we might want ...
trait Accumulator[T] {
def apply(t: T) : Accumulator[T]
def get: T
}

to account for the case your just using another data structure and don't need to have it all immutable. returning the mutable this if its mutable is fine then too.

@avibryant
Copy link
Contributor Author

@ianoc: but you could always just use a mutable binding inside the Accumulator for that, like in @jnievelt's default example. I think I like the explicitly mutable API on the trait.

@jnievelt
Copy link
Contributor

jnievelt commented Apr 3, 2014

Looked over this, and there are a few related concepts in this space, IMO:

  1. Alternate mode of storage for optimizing individual addition (HLL -> Array, and then Array + HLL is faster)
  2. Alternate way of adding a buffered set of elements by #sumOption on sub-elements
  3. Means of buffering a stream of elements to leverage 2 above

1 above is done concretely in HLL but is abstracted by StatefulSummer. 2 and 3 are done in SketchMap but are also abstract in ArrayBufferedOperation.

I'm thinking it might make sense to a) put HLL#updateInto more into a StatefulSummer context, b) let Semigroup own the summing of buffered elements, via a #bufferedSumOption (default to current #sumOption implementation), c) let Semigroup own defining a specialized #statefulSummer (default to SumAll()(this)), and d) implement Semigroup#sumOption in terms of #statefulSummer.

But I'm a little worried that this over-complicates Semigroup. What do you guys think?

@sritchie
Copy link
Collaborator

Seems like this is a duplicate of #515 .

@sritchie sritchie changed the title Standard pattern for updateInto implement standard pattern for updateInto Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants