-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RC4 PRNG with AVX2 and SSE4.2 Optimizations #604
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re
AM_CFLAGS = -mavx2 -msse4.2
On processors that don't support these instructions what happens? Does nwipe crash with an illegal instruction?
If they could crash the program then it may be better to leave them out and make a note that on specific supported processors that RC4 may be faster by enabling this compiler option. It's then upto the user to build nwipe from source if they are that concerned about speed.
Problem is, if it does crash on non supported processors then it will generate a stream of issues from people wiping older hardware.
Ahoy.
|
Should now compile and work perfectly fine, even from older platforms :) |
This commit introduces a high-performance RC4-based pseudorandom number generator (PRNG) optimized for modern CPU architectures. Key changes and improvements over the traditional RC4 implementation include: - **CTR Mode**: Added a counter-based mode to ensure unique pseudorandom streams and prevent repetition. - **RC4-Drop**: Discarded the first 256 bytes of the stream to mitigate known biases in the initial output of RC4. - **SIMD Optimizations**: Leveraged SSE4.2 and AVX2 instructions to process data in parallel, improving throughput by handling 16 bytes (SSE4.2) or 32 bytes (AVX2) per iteration. - **Hardware Prefetching**: Implemented prefetching to optimize memory access to the S-Box, reducing cache misses and latency. - **PRNG Purpose**: Designed specifically as a pseudorandom number generator (PRNG) for non-cryptographic purposes. This RC4 PRNG is now faster and more suitable for generating large volumes of random data, taking full advantage of modern hardware capabilities. It is **not** intended for cryptographic security purposes.
What are your thoughts on this? @PartialVolume
|
I don't have a problem with assembler unless it reduces the number of CPUs that nwipe can be built for, which I guess is very likely?. Nwipe must be able to run on Intel or AMD processors prior to the AVX2 instructions, right back to circa 2000 and Pentium 4s. Also nwipe currently runs on ARM processors so anything that makes it less compatible with a wide range of systems is not something I would want. I'd rather it was slightly slower but could be used on any computer rather than tuned and compatible with a subset of systems. As for rust, there is nothing stopping anybody writing some wipe software in rust but I really don't really see any point rewriting an application like nwipe in rust. It just to seems to me it would be a whole load of work possibly resulting in a buggy implementation, after all, memory safety is not the only place bugs might exist. But anybody wanting to re-write nwipe in rust, go for it but it's highly unlikely I would accept rust code so they would need to run their own separate project. Same goes for C++, once I might have accepted C++ code, I have written a few QT based C++ programs in the past but nwipe is C and that's the way it will probably stay at least while I'm still working on it. |
I agree with this stance. Nwipe needs to interface and/or borrow from low-level code, usually written in C, and portability across CPUs is a must as servers are adding ARM processors.
…--
Mike
________________________________
From: PartialVolume ***@***.***>
Sent: Thursday, September 19, 2024 2:40:37 PM
To: martijnvanbrummelen/nwipe ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [martijnvanbrummelen/nwipe] Implement RC4 PRNG with AVX2 and SSE4.2 Optimizations (PR #604)
I don't have a problem with assembler unless it reduces the number of CPUs that nwipe can be built for, which I guess is very likely?. Nwipe must be able to run on Intel or AMD processors prior to the AVX2 instructions, right back to circa 2000 and Pentium 4s. Also nwipe currently runs on ARM processors so anything that makes it less compatible with a wide range of systems is not something I would want. I'd rather it was slightly slower but could be used on any computer rather than tuned and compatible with a subset of systems.
As for rust, there is nothing stopping anybody writing some wipe software in rust but I really don't really see any point rewriting an application like nwipe in rust. It just to seems to me it would be a whole load of work possibly resulting in a buggy implementation, after memory safety is not the only place bugs might exist. But anybody wanting to re-write nwipe in rust, go for it but it's highly unlikely I would accept rust code so they would need to run their own separate project.
Same goes for C++, once I might have accepted C++ code, I have written a few QT based C++ programs in the past but nwipe is C and that's the way it will probably stay at least while I'm still working on it.
—
Reply to this email directly, view it on GitHub<#604 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANGK2PRW26O2V2ZU7DMY23TZXMSDLAVCNFSM6AAAAABOB6DEDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGAZTONJVHE>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
This commit introduces a high-performance RC4-based pseudorandom number generator (PRNG) optimized for modern CPU architectures. Key changes and improvements over the traditional RC4 implementation include:
This RC4 PRNG is now faster and more suitable for generating large volumes of random data, taking full advantage of modern hardware capabilities. It is not intended for cryptographic security purposes.
It also provides insanely high entropy by dropping the first biased 256-bits, and introducing the CTR mode.
This algorithm, in comparison to the others, can still be massively optimized.
0 0x0 Rising entropy edge (0.999979)