Eliminate use of stdlib rand #13

kentquirk · 2023-06-30T22:44:36Z

I've been using this project in one of mine as a way to track traceIDs that have traversed our sampling proxy. In doing some profiling under heavy load, we've realized that there's a significant performance block through the use of the default global random number generator provided by Go's standard library. The global math.rand takes a lock on every invocation and can be needlessly expensive.

PR #12 addresses that issue as well as others, but the submitter has been asked to break it up into smaller pieces. In particular, it modifies Encode and Decode for better performance, but as my use case doesn't use those, I haven't touched them here.

In a similar fashion to that, this PR uses a fast, well-distributed hashing algorithm (wyhash) as a pseudorandom number generator, and attaches it to the filter object so that no additional locks are needed. The particular version of wyhash used here is different from that in #12 and was mainly chosen because I have experience with it and am already using it elsewhere in my project.

For now, we're going to depend on our own fork, but I'd much rather see the issue addressed upstream, either by merging this, #12, or some other variant.

Another option would be to allow an RNG matching an interface to be injected rather than depending on the global one. This would also make certain kinds of testing easier.

This PR includes:

Move the RNG into a member of the filter
Implement Intn and CoinFlip functions on it
Write some probabalistic tests for them to verify they're reasonably well-distributed
Call them in place of math.rand functions
Replace custom bitbanging in getNextPow2 with Go's standard library for it

Thank you for this library!

panmari

Thanks for the contribution, this is looking pretty good! Please also post a benchmark comparison in the PR.

panmari · 2023-09-05T19:19:52Z

cuckoofilter.go

 	return &Filter{
 		buckets:         buckets,
 		count:           0,
 		bucketIndexMask: uint(len(buckets) - 1),
+		rng:             &rng,


Inline here as &wyhash.Rng(time.Now().UnixNano())

panmari · 2023-09-05T19:20:57Z

cuckoofilter.go

+// means the bias is on the order of 10^-13. For our use case, that's well below
+// the noise floor.
+func (cf *Filter) Intn(n int) int {
+	// we need to make sure it's strictly positive, so mask off the sign bit


Casting to uint would make this more straight-forward.

panmari · 2023-09-05T19:21:58Z

cuckoofilter.go

+// purposes since n is on the order of 10^6 and our rng is 63 bits (10^19); this
+// means the bias is on the order of 10^-13. For our use case, that's well below
+// the noise floor.
+func (cf *Filter) Intn(n int) int {


lowercase, no need to make this public

panmari · 2023-09-05T19:22:30Z

cuckoofilter.go

+
+// Coinflip returns either i1 or i2 randomly by examining the least significant
+// bit of the RNG.
+func (cf Filter) Coinflip(i1, i2 uint) uint {


lowercase, no need to make this public.

panmari · 2023-09-05T19:23:06Z

cuckoofilter.go

+}
+
+// Coinflip returns either i1 or i2 randomly by examining the least significant
+// bit of the RNG.


Comment nit:

Coinflip returns either i1 or i2 randomly with about equal chance.

The rest is an implementation detail.

panmari · 2023-09-18T10:44:49Z

I've been doing some digging as well. The runtime internally has a fastrand which is thread local, i.e. doesn't use any mutex. Here's how the change would look like: fcadf94

There's an open question if the dependency on a runtime internal doesn't make code too brittle.

panmari · 2023-09-22T16:08:29Z

Thanks for the suggestion. I want to avoid adding additional dependencies, hence I'm going with #15 as alternative to the std lib random.

I'd still be interested in merging the following change:

Replace custom bitbanging in getNextPow2 with Go's standard library for it

Could you split this off into a separate PullRequest?

kentquirk added 2 commits June 30, 2023 17:47

Eliminate use of stdlib rand

7068c71

Cleanup

cf48793

panmari reviewed Sep 5, 2023

View reviewed changes

panmari closed this Sep 22, 2023

kentquirk mentioned this pull request Sep 22, 2023

Faster thread local random for inserts #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate use of stdlib rand #13

Eliminate use of stdlib rand #13

kentquirk commented Jun 30, 2023 •

edited

Loading

panmari left a comment

panmari Sep 5, 2023

panmari Sep 5, 2023

panmari Sep 5, 2023

panmari Sep 5, 2023

panmari Sep 5, 2023

panmari commented Sep 18, 2023

panmari commented Sep 22, 2023

Eliminate use of stdlib rand #13

Eliminate use of stdlib rand #13

Conversation

kentquirk commented Jun 30, 2023 • edited Loading

panmari left a comment

Choose a reason for hiding this comment

panmari Sep 5, 2023

Choose a reason for hiding this comment

panmari Sep 5, 2023

Choose a reason for hiding this comment

panmari Sep 5, 2023

Choose a reason for hiding this comment

panmari Sep 5, 2023

Choose a reason for hiding this comment

panmari Sep 5, 2023

Choose a reason for hiding this comment

panmari commented Sep 18, 2023

panmari commented Sep 22, 2023

kentquirk commented Jun 30, 2023 •

edited

Loading