Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement XORoshiro-256 PRNG for Enhanced Random Number Generation Efficiency #555

Merged
merged 6 commits into from
Mar 21, 2024

Conversation

Knogle
Copy link
Contributor

@Knogle Knogle commented Mar 15, 2024

This pull request introduces the implementation of the XORoshiro-256 pseudorandom number generator (PRNG), a significant enhancement to our project's capabilities in generating high-quality random numbers. XORoshiro-256, standing for "XOR/rotate/shift/rotate", is part of the well-regarded XORoshiro family of PRNGs. This PR aims to leverage the algorithm's advantages, including speed, efficiency, and the quality of randomness, to benefit various non-cryptographic applications within our project.

Key Benefits of XORoshiro-256:

1. High Performance:
XORoshiro-256 is designed for speed, making it an excellent choice for applications requiring a high volume of random numbers with minimal computational overhead. Its efficient use of XOR, shift, and rotate operations ensures fast execution on modern hardware. Around 1.8GB/s when writing to a ramdisk.

2. Excellent Statistical Properties:
The algorithm has been rigorously tested and exhibits excellent statistical properties for a wide range of applications. Its randomness quality meets the standards of most non-cryptographic use cases, including simulations, gaming, and randomized algorithms.

3. Low Memory Footprint:
With a state size of only 256 bits, XORoshiro-256 maintains a minimal memory footprint, making it suitable for applications with limited memory resources or those requiring multiple independent PRNG instances.

4. Easy to Implement and Use:
The simplicity of the XORoshiro family's design philosophy is preserved in XORoshiro-256, ensuring an easy implementation and integration process. This PR includes both the core algorithm implementation and utility functions for initialization and random number generation, providing a ready-to-use solution for immediate application benefits.

Integration Details:

The integration of XORoshiro-256 into our project is straightforward, with the PR including all necessary source and header files. Developers can initialize the PRNG with a seed and then generate random numbers as needed, with functions provided for both individual number generation and filling buffers with random data.

This addition represents a strategic enhancement of our project's foundational components, supporting a broad spectrum of future development opportunities and performance optimizations.
dd15dfc8-e977-4711-ba53-8d799289213a

A total of 188 tests (some of the 15 tests actually consist of multiple sub-tests)
were conducted to evaluate the randomness of 32 bitstreams of 1048576 bits from:

	/dev/loop0

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The numerous empirical results of these tests were then interpreted with
an examination of the proportion of sequences that pass a statistical test
(proportion analysis) and the distribution of p-values to check for uniformity
(uniformity analysis). The results were the following:

	186/188 tests passed successfully both the analyses.
	2/188 tests did not pass successfully both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Here are the results of the single tests:

 - The "Frequency" test passed both the analyses.

 - The "Block Frequency" test passed both the analyses.

 - The "Cumulative Sums" (forward) test passed both the analyses.
   The "Cumulative Sums" (backward) test passed both the analyses.

 - The "Runs" test passed both the analyses.

 - The "Longest Run of Ones" test passed both the analyses.

 - The "Binary Matrix Rank" test FAILED both the analyses.

 - The "Discrete Fourier Transform" test passed both the analyses.

 - 148/148 of the "Non-overlapping Template Matching" tests passed both the analyses.

 - The "Overlapping Template Matching" test passed both the analyses.

 - The "Maurer's Universal Statistical" test passed both the analyses.

 - The "Approximate Entropy" test passed both the analyses.

 - 8/8 of the "Random Excursions" tests passed both the analyses.

 - 17/18 of the "Random Excursions Variant" tests passed both the analyses.
   1/18 of the "Random Excursions Variant" tests FAILED the proportion analysis.

 - The "Serial" (first) test passed both the analyses.
   The "Serial" (second) test passed both the analyses.

 - The "Linear Complexity" test passed both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The missing tests (if any) were whether disabled manually by the user or disabled
at run time due to input size requirements not satisfied by this run.

Screenshot from 2024-03-14 18-19-12

@Knogle
Copy link
Contributor Author

Knogle commented Mar 15, 2024

In evaluating this algorithm, I observe significant advantages, especially for legacy computing environments. Notably, CPUs from the Pentium 4 era and earlier stand to gain from the algorithm's simplicity and efficiency. The algorithm not only produces statistically high-quality numbers but also impressively passes the NIST test in 186 out of 188 instances. This level of performance suggests it is more than adequate for tasks like HDD wiping. In terms of speed, it is estimated to operate approximately 4-5 times faster than the MT19937 algorithm, presenting a potential area for further exploration on older hardware platforms. The suggestion here is that in the context of pseudorandom number generation for these applications, the bottleneck might more likely be the computational capacity of the CPU rather than the HDD's capabilities. It would be interesting to see this algorithm's performance tested on such older systems to validate these observations.

Please verify as well for thread safety.

Screenshot from 2024-03-14 19-17-31

@Knogle
Copy link
Contributor Author

Knogle commented Mar 15, 2024

Another thing we might take into consideration. It is possible to speed up this code by around 100% again when using AVX2 instructions. But it will make the code much less portable. Just wanted to mention this. My current implementation uses simple operations, so it will work regardless of the platform.

@PartialVolume
Copy link
Collaborator

Initial four drive test passes. I'll run a 22 drive test next.

Screenshot_20240316_103257

@PartialVolume
Copy link
Collaborator

@Knogle xoroshiro256 passes, no verification errors on a 18 drive wipe. Looking good! 👍

Only problem regarding merging this, is that the AES-CTR code is present in the xoro branch. I should have really got you to put the AES-CTR code into it's own branch as well. That's why you should always create a branch, not make changes directly to the master (except when you are working alone, but even then I think it's a good idea to always create a branch).

Personally, what I would do is make a backup of the xoro and AES code somewhere. Delete your fork, refork nwipe then create a AES-CTR branch git checkout -b AES-CTR and a xoroshiro256 branch git checkout -b xoroshiro256 then switch to each branch in turn, add the relevant code then push each branch to you github fork. Then do a new pull request from each branch.

Your master will be zero commits ahead of the upstream master. Once I merge them you can then switch to your local master, delete your branches and git pull upstream master to update your local copy to nwipe upstream. Then git push to push the updated local copy to your github fork.

There are no doubt cleverer ways of doing this like stashing & popping stashes but I tend to only do that from within the Kdevelop IDE rather than on the command line. However separating AES and xoro in separate branches makes it so much easier to merge and keep different features separate just in case something needs to be reverted due to some unforeseen problem.

@PartialVolume
Copy link
Collaborator

The problem you have is that your master contains the non merged AES-CTR code and your xoro branch contains both the non merged AES-CTR and xoro code.

Your master should match the upstream master, it can be x commits behind but not x commits ahead of upstream and then the two branches. That's why I suggest reforking. You could probably revert your master back but it might be cleaner and quicker to delete you github fork and refork it.

@Knogle
Copy link
Contributor Author

Knogle commented Mar 19, 2024

Hey, i am back from my vacation, i will try to take a look today or tomorrow in order to fix that :)

@Knogle
Copy link
Contributor Author

Knogle commented Mar 19, 2024

Did an interactive rebase now, can you check if it works?
Resolved all conflicts to not interfere with AES.

@Knogle
Copy link
Contributor Author

Knogle commented Mar 19, 2024

I'm very satisfied with those performance results!

Screenshot from 2024-03-19 19-45-05

@Knogle
Copy link
Contributor Author

Knogle commented Mar 19, 2024

What i've found out. I'm currently printing all random numbers into a text file.
When we are doing 2 rounds without blanking, genrand will be called 4 times, when veryfying all passes.

ROUND 1 + ROUND 1 VERIFICATION: Both files match, everything OK!
ROUND 2 + ROUND 2 VERIFICATION: Both files differ, something doesn't work here for the last pass! Even though the keys seem to match.

@Knogle
Copy link
Contributor Author

Knogle commented Mar 20, 2024

What i've found out. I'm currently printing all random numbers into a text file. When we are doing 2 rounds without blanking, genrand will be called 4 times, when veryfying all passes.

ROUND 1 + ROUND 1 VERIFICATION: Both files match, everything OK! ROUND 2 + ROUND 2 VERIFICATION: Both files differ, something doesn't work here for the last pass! Even though the keys seem to match.

Sorry for creating confusion. This was related to AES.
I think i'm done now on my end. Do you think there is anything we can still test for Xoro?
It does seem to work similar to the other PRNGs, doesn't have issues with it's state.

@PartialVolume
Copy link
Collaborator

PartialVolume commented Mar 20, 2024

I think xoro is looking good, I ran simultaneously on 18 discs without any verification issues so should be able to merge that soon.

What are your plans for AES? Are you going to use openssl's thread locking functions or write your own AES prng?

@Knogle
Copy link
Contributor Author

Knogle commented Mar 20, 2024

Sounds great!
For AES, i've found out, the issue isn't related to the thread locking itself.
It fails also when running PRNG on a single drive, using 2 rounds.
The reason is: it uses 3 params in order to initialize and generate random numbers

  1. Random key, dervied from the nwipe key.

  2. ecount

  3. init-vector

  4. and 3. are not derived from the seed itself, so they differ after the PRNG has been run at least one time.
    So i have to make sure, ecount, and init-vector for AES will give the same result, when using the same seed, like it is the case for verify.

CRYPTO_ctr128_encrypt(bufpos, bufpos, 16, &state->aes_key, state->ivec, state->ecount, &state->num, (block128_f) AES_encrypt );

Means: Let's say the key is 12345678, ivec is 0, ecount is 0, num 0.

CRYPTO_ctr128_encrypt(bufpos, bufpos, 16,12345678, 0, 0, 0 (block128_f) AES_encrypt );

result will be 12345678

Now nwipe want's to verify the stream, but after the function was called, ivec, ecount and num will be incremented by 1.

CRYPTO_ctr128_encrypt(bufpos, bufpos, 16,12345678, 1, 1, 1 (block128_f) AES_encrypt );

result will be 45678910

The AES key which is derived form the seed by nwipe.c is still the same, but the other values differ. So i have to make sure somehow, that the complete state is the same and deterministic.

@Knogle
Copy link
Contributor Author

Knogle commented Mar 20, 2024

As requested by others, performance of ISAAC-64 regarding the NIST suite.

A total of 188 tests (some of the 15 tests actually consist of multiple sub-tests)
were conducted to evaluate the randomness of 32 bitstreams of 1048576 bits from:

	/dev/loop0

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The numerous empirical results of these tests were then interpreted with
an examination of the proportion of sequences that pass a statistical test
(proportion analysis) and the distribution of p-values to check for uniformity
(uniformity analysis). The results were the following:

	184/188 tests passed successfully both the analyses.
	4/188 tests did not pass successfully both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Here are the results of the single tests:

 - The "Frequency" test passed both the analyses.

 - The "Block Frequency" test passed both the analyses.

 - The "Cumulative Sums" (forward) test passed both the analyses.
   The "Cumulative Sums" (backward) test passed both the analyses.

 - The "Runs" test passed both the analyses.

 - The "Longest Run of Ones" test passed both the analyses.

 - The "Binary Matrix Rank" test passed both the analyses.

 - The "Discrete Fourier Transform" test passed both the analyses.

 - 148/148 of the "Non-overlapping Template Matching" tests passed both the analyses.

 - The "Overlapping Template Matching" test passed both the analyses.

 - The "Maurer's Universal Statistical" test passed both the analyses.

 - The "Approximate Entropy" test passed both the analyses.

 - 8/8 of the "Random Excursions" tests passed both the analyses.

 - 15/18 of the "Random Excursions Variant" tests passed both the analyses.
   3/18 of the "Random Excursions Variant" tests FAILED the proportion analysis.

 - The "Serial" (first) test passed both the analyses.
   The "Serial" (second) test FAILED the proportion analysis.

 - The "Linear Complexity" test passed both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The missing tests (if any) were whether disabled manually by the user or disabled
at run time due to input size requirements not satisfied by this run.

@PartialVolume PartialVolume merged commit 204a82f into martijnvanbrummelen:master Mar 21, 2024
2 checks passed
@PartialVolume
Copy link
Collaborator

Checking a 16 drive wipe on Xoro tonight, then I'll work on merging the sha-hmac tomorrow.

nwipe_xoro_test-2024-03-21_23.04.18.mp4

@PartialVolume
Copy link
Collaborator

XORoshiro-256 16 drive test. No issues, verification passed. Confirmed drive is being overwritten with prng by zeroing drive first and the prng data looks good.

Screenshot_20240322_154406

@Knogle
Copy link
Contributor Author

Knogle commented Mar 24, 2024

Sounds great! Is the data throughput also acceptable?

@PartialVolume
Copy link
Collaborator

Yes, throughput is very good. Thanks for adding those two prngs. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants