Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffered PRNG #45

Open
obazoud opened this issue Nov 23, 2024 · 2 comments
Open

Buffered PRNG #45

obazoud opened this issue Nov 23, 2024 · 2 comments

Comments

@obazoud
Copy link

obazoud commented Nov 23, 2024

Hello,

I'm using ulidx in my project and was concerned about its performance based on benchmarks. While the built-in functionality seemed slow (compared to uuid v7 for example), I decided to investigate further.

Local Machine Benchmark

Here is the result of the benchmark on my local machine, a Mac M1 16 Go

Simple ulid: 26,597 ops/sec ±0.79% (95 runs)
ulid with timestamp: 26,823 ops/sec ±2.37% (74 runs)

Profiling and Bottleneck Identification:

I profiled ulidx (for Node.js only with --prof) to pinpoint the performance bottleneck. The results indicated that the encodeRandom method was the primary culprit. Upon closer inspection, I observed that this method calls crypto.getRandomValues 16 times per ulid generation, which seemed excessive.

Proposed Solution: Buffered PRNG

To address this, I implemented a custom PRNG that reduces the number of calls to crypto.getRandomValues while using a larger buffer size. Here's the code:

class BufferedPRNG {
  constructor(size) {
    this.buffer = new Uint8Array(size);
    this.cursor = 0xffff;
  }

  next = () => {
    if (this.cursor >= this.buffer.length) {
      crypto.getRandomValues(this.buffer);
      this.cursor = 0;
    }
    return this.buffer[this.cursor++] / 0xff;
  }
}

This custom PRNG generates random values from a pre-filled buffer, reducing the number of calls to crypto.getRandomValues.

Integration with Benchmark

I incorporated the BufferedPRNG class into the benchmark to compare performance:

const bufferedPRNG_2 = new BufferedPRNG(2);
const bufferedPRNG_4 = new BufferedPRNG(4);
const bufferedPRNG_16 = new BufferedPRNG(16);
const bufferedPRNG_32 = new BufferedPRNG(32);
const bufferedPRNG_64 = new BufferedPRNG(64);

suite.add("ulid with bufferedPRNG 2", function () {
  ulid(undefined, bufferedPRNG_2.next);
});

// ... similar tests for other buffer sizes

Benchmark Results:

Simple ulid: 27,561 ops/sec ±0.76% (90 runs)
ulid with timestamp: 26,556 ops/sec ±0.85% (93 runs)
ulid with bufferedPRNG 2: 74,244 ops/sec ±0.69% (93 runs)
ulid with bufferedPRNG 4: 137,098 ops/sec ±0.38% (97 runs)
ulid with bufferedPRNG 16: 381,865 ops/sec ±0.35% (97 runs)
ulid with bufferedPRNG 32: 531,374 ops/sec ±0.52% (96 runs)
ulid with bufferedPRNG 64: 668,991 ops/sec ±0.20% (98 runs)

The results demonstrate significant performance improvements. Using a buffer size of 16 (which is the minimum required for a single ulid) yields a roughly 14x speedup.

Request for Feedback

Please let me know if there are any errors in my implementation or approach. I'm open to any suggestions because I plan to use this optimization in my project.

@MarcelWaldvogel
Copy link
Contributor

I also looked into the PRNG behavior, as the necessity of an == comparison here looked like a bias in the resulting character frequencies

ulidx/source/ulid.ts

Lines 283 to 289 in bb685ba

export function randomChar(prng: PRNG): string {
let rand = Math.floor(prng() * ENCODING_LEN);
if (rand === ENCODING_LEN) {
rand = ENCODING_LEN - 1;
}
return ENCODING.charAt(rand);
}

… together with the use of 0xff instead of 0x100 here:

return () => crypto.randomBytes(1).readUInt8() / 0xff;

But it turned out just to be an unusual construct for someone used to either

  1. doing integer arithmetic for this or
  2. thinking of the standard Math.rand() behavior of returning a value strictly smaller than 1.

For anyone interested in the frequency analysis, here is a branch supporting it: https://github.com/MarcelWaldvogel/ulidx/tree/frequency-analysis

(I am not creating a PR for this, as it would increase the public API unnecessarily. Even though such a test might be a good one to have included.)

@MarcelWaldvogel
Copy link
Contributor

This is the benchmark output on my development machine after applying #46 :

Simple ulid x 23,766 ops/sec ±1.57% (84 runs sampled)
ulid with timestamp x 20,757 ops/sec ±0.71% (75 runs sampled)

I compared a few pieces of code which would result in relatively small code changes, but big performance improvements. The code run by npm run bench is here; ulid() using "randomness wasteful bits" (using 8 bits of randomness for each char) with "string concatenation from back" (normal append)..

Simple ulid x 335,049 ops/sec ±0.46% (94 runs sampled)
ulid with timestamp x 318,268 ops/sec ±1.00% (91 runs sampled)
string concatenation from front x 7,057,222 ops/sec ±0.78% (94 runs sampled)
string concatenation from back x 7,484,660 ops/sec ±1.02% (87 runs sampled)
string concatenation from back with forEach x 3,676,662 ops/sec ±0.75% (95 runs sampled)
string concatenation by map/join x 998,182 ops/sec ±0.56% (93 runs sampled)
randomness 1by1 x 23,247 ops/sec ±4.41% (77 runs sampled)
randomness wasteful bits x 307,968 ops/sec ±0.85% (72 runs sampled)
randomness more computation x 380,385 ops/sec ±2.15% (90 runs sampled)

The result is about a 14-fold improvement with comparable code and complexity as before.

According to the benchmark, the "more computation" alternative would be some 20% faster. However, the difference was particularly high for this benchmark run; it is often only half the difference and sometimes even the other way round:

randomness wasteful bits x 352,064 ops/sec ±0.83% (83 runs sampled)
randomness more computation x 308,060 ops/sec ±2.42% (73 runs sampled)

Given that the extra complexity would require additional careful testing of the logic (that each random bit really only used once and that always all 5 bits are fresh from the random pool), I do suggest sticking to the simple code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants