Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OsRng crate? (or optional PRNG dependencies) #648

Closed
tarcieri opened this issue Nov 21, 2018 · 40 comments
Closed

OsRng crate? (or optional PRNG dependencies) #648

tarcieri opened this issue Nov 21, 2018 · 40 comments
Labels
E-question Participation: opinions wanted

Comments

@tarcieri
Copy link

rand 0.6 seems to have split all of the PRNGs into their own separate crates.

However, these crates are mandatory dependencies of the rand crate, which is still home to OsRng.

I think it'd be great if one of two things happened:

  • OsRng were split into its own crate which is only dependent on rand_core
  • The rand crate provided cargo features for the numerous PRNGs so I don't need to pull them into my project when all I want is OsRng.

Sidebar: I'm a little confused why rand and rand_core both exist. It would personally make more sense to me if rand contained the core traits, and anyone wanting an RNG can pick a crate with the one they want which only depends on rand. It seems like right now rand is both a facade and the sole home of OsRng

@naftulikay
Copy link

Just my two cents on this issue: I think a safe default would be for rand to only provide CSPRNGs by default. If you need an insecure PRNG, it should be really explicit.

I don't know about the current crate hierarchy, but if it's not secure by default, if it's not blatantly obvious when using an insecure PRNG, we should move in that direction. My opinion, of course, but I have laid out why I feel this way.

A crate name like insecure_random would be nice :)

Security is one of the many selling points about Rust, so we should try to make it hard to mess up.

@dhardy
Copy link
Member

dhardy commented Nov 21, 2018

See #643. We may move forward with this but I'd like us to explore moving the OsRng interface itself into the std lib.

There's a lot of history involved and not wanting to break rand too much. But moving more functionality out and making rand a facade may be the way to go (though I suspect we'll keep things like the Distribution trait and Rng in rand).

@naftulikay there are so many ways to mess up security. We now have a CryptoRng marker trait and I regret it because that alone is not enough to guarantee security. rand is probably more about stats/sampling than anything else. But at least we make efforts to make thread_rng secure (though without forward secrecy, and possibly not using the most trusted algorithm; IIRC @tarcieri volunteered to write a faster ChaChaRng for us at one point).

@newpavlov
Copy link
Member

newpavlov commented Nov 21, 2018

I think we should move forward with OsRng crate. Even if std will include it later (which I doubt, because a simple getrandom-like function is a more suitable candidate for inclusion) we always can deprecate this crate.

We now have a CryptoRng marker trait and I regret it because that alone is not enough to guarantee security.

CryptoRng never was about guaranteeing security (after all you always can use wrapper with CryptoRng implementation), but about communicating expectations. In other words it's a way to indicate that application expect CSRNG and not a weak PRNG.

@naftulikay
Copy link

I'm 💯 percent behind OsRng being part of std.

On my i7 8650U laptop CPU on Linux 4.15 I can read /dev/urandom at around 112MB/s. As you described above, I do imagine that we could build some stream or block cipher based RNG which could be a lot faster; generate a large random key from /dev/urandom, use that with a symmetric cipher, and the upper bound on performance would then be the initialization cost (once) plus the speed that the cipher can run at. I could be misinformed (there are actually cryptographers around these parts @tarcieri), but using a symmetric cipher as a CSPRNG has been done in the past pretty well as long as your input key has a strong randomness guarantee.

The only use case that isn't covered here is if you are concerned about renewing randomness. IIRC the kernel is constantly resampling randomness sources and mixing them into the RNG, whereas this contract is dependent on /dev/urandom being secure at the time of key generation.

One thing I've always wondered is how fast PRNGs can be if they aren't CSPRNGs. ChaCha20 is pretty damn fast for a stream cipher, the original white paper claims that it's around one CPU clock cycle per byte, and I'm not sure if this is using SIMD.

I hope that my comment above didn't come across as accusatory or anything, I'm grateful for all the work the rand team does on this incredibly useful crate.

@dhardy
Copy link
Member

dhardy commented Nov 22, 2018

@naftulikay run cargo bench --bench generators to get some numbers 😃 (though apparently ChaCha should be faster; you can see though that CSPRNGs are not slow)

Our thread_rng does periodic reseeding, and also has fork protection on UNIX. As most things security, there are compromises to make, and we choose to compromise such that performance is barely affected.

@tarcieri
Copy link
Author

I'm sorry to say (and perhaps this is a bad forum to say it), but I'm kind of "losing faith" in this crate for the purposes I want it for, which is a thin abstraction over the OS-provided cryptographically secure random number generator.

There's an awful lot of complexity and code surface I don't want, which is pretty much anything besides the thinnest possible wrapper over the OS-provided cryptographically secure random number generator.

Looking over the comments in #643, and in this thread, I'm confused what plan there is, if any, to get any parts of this crate into std.

If this crate isn't prototyping functionality that is eventually destined for std, I feel like perhaps my particular problem would be better solved by a crate solely focused on providing the thinnest possible wrapper to the operating system's cryptographically secure random number generator, as opposed to an abstraction layer over both cryptographically secure and insecure random number generators, or bespoke userspace "CSPRNGs".

(As I write this, I'm presently investigating a critical RNG-related vulnerability in a project I contribute to, and the extant churn around rand is already making that difficult)

@dhardy
Copy link
Member

dhardy commented Nov 25, 2018

@tarcieri I appreciate the criticism. This project doesn't really get enough input from the security/crypto perspective, so your input is definitely useful.

Does #643 meet your requirements? Do you have other expectations?

I don't believe there has been any serious effort to get any more random-number functionality into std yet. There are several unanswered questions. It would be useful if you could write a design proposal for this (either an RFC or an issue here sketching the requirements). Questions like:

  • do we want an RNG (with trait like RngCore) or a simple "fill bytes" function?
  • do we want some kind of fallback like JitterRng or not at all?
  • what behaviour do we want if getrandom would block?
  • do we try to support no_std by allowing "lang items" to override the backend?

@tarcieri
Copy link
Author

So first, apologies, at the time I wrote my previous post I thought rand might be implicated in a critical bug, but that turned out to be a red herring. I've since fixed an unrelated problem elsewhere, so I'm in a better state now at least.

Regarding #643 (and perhaps we should just take the conversation back over there), I'm with @newpavlov on this comment:

If users want only the basic RNG functionality why make their live harder?

I would like for the API to be as straightforward as possible, so abstractions which aren't necessary for purely CSRNG-cases to stay out of the way. #643 mostly accomplishes that but per your first question, and the linked comment:

I think providing a direct interface to the OS's secure random facility is a reasonable primitive to provide in the standard library personally

This is what I want the overwhelming majority of the time personally, and would love to see it in std. It seems like rand_os could still exist in such a case, but would implement the relevant rand_core traits using this API.

There are a handful of cases where I do want the trait, namely when I'm testing cryptographic primitives which internally rely on randomness (e.g. ECDSA) and need to have something which implements CryptoRng but supplies a fixed test vector in lieu of true randomness.

do we want some kind of fallback like JitterRng or not at all?

I would personally prefer no fallback, at least for something called OsRng.

what behaviour do we want if getrandom would block?

Ideally I'd like to see the knobs getrandom() has exposed if possible, i.e. GRND_RANDOM vs GRND_NONBLOCK. You could imagine a blocking: bool option or thereabouts for that.

do we try to support no_std by allowing "lang items" to override the backend?

That would be ideal, yes. I say this having written a #![no_std] app which wrapped a proprietary/nonstandard RNG API.

@burdges
Copy link
Contributor

burdges commented Nov 25, 2018

There is an interface much like what you want internally to OsRng so maybe an rand_os crate should expose it along with defining OsRng? An option might be OS specific constructors for OsRng?

Just noticed clicking [src] from OsRng gives an empty file.

As an aside, we should probably adopt the BlockRng machinery for stream ciphers because it's much better than anything else currently available there. It'd suffice to add a RngCore::xor_bytes function really.

@newpavlov
Copy link
Member

I wonder if we should cover GRND_RANDOM + GRND_NONBLOCK combination. It will make safe API more complex, as we'll have to keep in mind that buffer can be filled partially, which AFAIK can not happen on other platforms.

@dhardy
Copy link
Member

dhardy commented Nov 26, 2018

Thanks @tarcieri for the feedback. Sounds like we should proceed with #643 as is, and possibly explore adding a function like pub fn secure_random(buf: &mut [u8], blocking: bool) -> Result<(), Error> (but behind a feature-gate because it probably would be removed after integration into std).

But @newpavlov has a point — a correct non-blocking API might partially fill a buffer. IIRC the current internal API may do this, but simply assumes no data was collected and retries?

@burdges adding RngCore::xor_bytes is an interesting idea, and probably deserves its own issue. I do have some reservations though. This is a primitive, not a complete encryption scheme, and DIY encryption is not generally encouraged (to put it mildly). E.g., a naive user might seed a CSPRNG directly from a key without nonce and use the same stream to encrypt multiple pieces of data — this is unsafe.

@burdges
Copy link
Contributor

burdges commented Nov 26, 2018

Agreed this is not the issue for that question, but.. I said "stream cipher", which is a low level notion. while normally "modes" deal with iv, macs, etc.

I mostly wanted to highlight the BlockRng machinery in the contest of stream ciphers. It's also likely the right way to approach what the RustCrypto Digest traits do.

@dhardy
Copy link
Member

dhardy commented Nov 26, 2018

Wouldn't xor_bytes just be a wrapper around fill_bytes anyway? Perhaps it could have a specialised implementation for BlockRng, but later we can achieve that with specialization.

There may still be an issue with using RngCore for ciphers though; the specification allows implementations to drop extra bytes as they see fit, e.g. if a byte-buffer does not use an exact number of words. This isn't really a problem in stochastic simulations since there are easier ways to affect reproducibility, but may be a problem for ciphers (if the text may be split into chunks arbitrarily).

@burdges
Copy link
Contributor

burdges commented Nov 26, 2018

We discussed this before, but even PRNGs should lend themselves to specification, meaning if you write typical code using rand then you can easily later write a specification, including any the corner cases of next_*, etc. It's okay that next_* pulls out aligned data. It's not really okay if fill_bytes provided by BlockRng throws away bytes, but individual PRNGs could specify that behavior.

I think BlockRngs could provide an xor_bytes that avoids the copy required by a wrapper around fill_bytes. If you are not using trait objects, then I'd expect LLVM to optimize this away anyways, so not really an issue.

We might eventuality use combinators on StreamingIterators for this stuff, but not today.

@naftulikay
Copy link

Is there an active RFC or RFCs that suggest moving things into std and core?

I'd propose putting an Rng trait into std and core and OsRng into std, something like:

  • std::random::Rng
  • core::random::Rng
  • std::random::OsRng

This puts the CSPRNG provided by the OS into the standard library, and traits defining what an Rng is into core and std.

As far as the actual methods exposed by Rng, the docs have a few. The methods I'd care about being in the Rng trait would be:

  • random float types between 0 and 1
  • random float types across entire possible float range (basically random bytes and 'instantiate' a float with it)
  • random signed and unsigned integer (u8/i8 .. u64/i64)
  • random bytes of an arbitrary length
  • the current choose implementation

I know that putting things into std and core means it's harder to iterate on them, but here's me wondering whether there's much else to put into the Rng trait.

Again, just thinking out loud. Getting a CSPRNG should be easy and possible from std, and building other Rng implementations should likewise be straightforward.

@newpavlov
Copy link
Member

@naftulikay
As was discussed earlier, a better choice for inclusion into std is probably a getrandom-like function which will be an overwritable weak lang item, and in turn it will be used by rand_os to add convenience wrapper with RngCore implementation. This way std RNG API will be really small and we will decouple std from the rest of rand infrastructure.

One of the questions to solve here is how to handle errors, we probably don't want to use io::Error, because we want it to be usable on no_std platforms, and we can't use rand_core::Error approach with std feature gate. I think the best solution will be to define non-exhaustive RngError enum with data-less variants. It will be somewhat limiting, but also the simplest solution.

@dhardy
Copy link
Member

dhardy commented Nov 28, 2018

I am not aware of an active RFC, but it sounds like we have enough to go on now to start one. Who would like to write it?

@tarcieri
Copy link
Author

@dhardy I can at least bring up the idea of an RFC for this to the Secure Code WG

@dhardy
Copy link
Member

dhardy commented Dec 21, 2018

I had a go at writing an RFC: https://github.com/dhardy/rfcs/blob/system-random/text/0000-system-random.md

No PR yet; I'll let the readers here make a first pass.

Personally I'm not fully convinced this is the best course of action; especially since supporting WASM properly may be hard (and maybe other platforms).

@naftulikay
Copy link

naftulikay commented Dec 21, 2018

Actually, WASM shouldn't be too hard, as IIRC there is a JavaScript API to a secure RNG: https://www.w3.org/TR/WebCryptoAPI/

Other embedded contexts may be difficult, but these environments usually don't have std either. If we provide a trait in core and an implementation in std, this would probably be the most flexible approach.

@tarcieri
Copy link
Author

I, for one, would really like a lang item for this (which is talked about in @dhardy's RFC). In the past I've used Rust in unusual embedded environments (HSMs) where I needed to call out to a hardware RNG through a C library. It'd be nice to be able to leverage the randomness it produces everywhere in a Rust program, including for things in std.

In my observation it's pretty common for embedded SDKs to support some sort of proprietary hardware-based RNG (often one which might require a little work to support a getrandom-ish API). In these cases, it'd be nice for crates wrapping those SDKs to be able to be able to export the RNG functionality in a way that is ubiquitous and even std can consume.

@dhardy
Copy link
Member

dhardy commented Dec 30, 2018

Actually, WASM shouldn't be too hard, as IIRC there is a JavaScript API to a secure RNG: https://www.w3.org/TR/WebCryptoAPI/

That's what both current impls use if used within a browser; otherwise they try Node's equivalent. The issue is simply requirement to use a JS bindings framework. @alexcrichton does std already have a solution for JS bindings when targetting WASM?

Edit: updated the WASM section

@alexcrichton
Copy link
Contributor

@dhardy unfortunately the standard library itself doesn't have the ability to acquire random numbers, only the ecosystem like wasm-bindgen has the ability to do that

@Pauan
Copy link

Pauan commented Jan 2, 2019

@alexcrichton I believe the suggestion was to add a new getrandom (or similar) function into Rust's stdlib.

Then each wasm runtime (wasm-bindgen, stdweb, etc.) would provide an implementation of getrandom (similar to how they provide implementations of round, acos, etc. right now).

@newpavlov
Copy link
Member

newpavlov commented Jan 2, 2019

I would say a much better example will be a core::alloc::GlobalAlloc. IIRC you can't overwrite acos implementation or add your own if target lacks it. In this regard allocations and platform "default" RNG are very similar to each other.

@dhardy
Copy link
Member

dhardy commented Jan 3, 2019

I know @alexcrichton. Sorry, I take it your answer means "no". Which is a problem for this PR — we don't have a solution to support WASM properly, unless std can depend on wasm-bindgen or equivalent.

Yes and no @newpavlov — the difference is that having an allocator is one of the defining features of std (vs core). Likely in any case there will be some std platforms without support for an RNG and I don't think we can say std should be unavailable in that case.

@alexcrichton
Copy link
Contributor

Sorry I didn't mean to answer the wrong question, this is a long thread and I just got back from holidays...

In any case we don't really have a process or way right now for std to require a new symbol on the wasm32-unknown-unknown target. Doing something like this would require broader discussion than me replying on an issue and may require a new target.

@dhardy
Copy link
Member

dhardy commented Jan 3, 2019

Thanks Alex. I guess that gives me enough to write an RFC at least.

@naftulikay
Copy link

@dhardy keep us posted, happy to contribute to that 👍

@tarcieri
Copy link
Author

tarcieri commented Jan 3, 2019

In any case we don't really have a process or way right now for std to require a new symbol on the wasm32-unknown-unknown target.

Would using a new lang item for a system CSRNG address this issue? (while also providing e.g. embedded platforms a way to leverage proprietary vendor-specific RNGs, possibly through closed-source SDKs)

@dhardy
Copy link
Member

dhardy commented Jan 3, 2019

Good question — can lang items be specified multiple times (with placeholder and full implementations) or do they need to be specified exactly once?

I don't fully understand how they work; it looks like they need hooks within the compiler, which isn't ideal just for a library function (though this was also part of the motivation for this RFC, since external linkage is only available for C FFI functions and all other options are run-time).

@dhardy
Copy link
Member

dhardy commented Jan 3, 2019

@naftulikay I updated the RFC text but am still not happy with the no_std / WASM / lang item part. Although it may be worth making the RFC PR soon anyway.

@naftulikay
Copy link

naftulikay commented Jan 3, 2019

@dhardy love it! I don't really have much to add, all of this seems perfectly reasonable, only to ask for some clarifications.

Before I dump this wall of text, would it be helpful for us to all get together on IRC or something so we can discuss in real time? I'm not sure what the process is around previous RFCs which are much bigger than this, and long issue comment histories on GitHub are hard to weed through. We could produce our findings after the meeting in a summary comment here or on the RFC. Just a thought.

API interface

As far as API naming is concerned, I'm ambivalent about the different options.

  • secure_random is nice as it serves as a reminder of the CSPRNG quality, but there are non-system CSPRNGs so IDK.
  • system_random is nice because it's the system's RNG, but it may not be immediately clear that its contract is a CSPRNG.
  • I don't favor os_random because system_random expresses this same idea.

Both secure_random and system_random are both good choices, but perhaps somebody has more insight on this than I do.

Error handling

The error enum is a great idea, returning ErrorKind::Unavailable when unimplemented or ErrorKind::NotReady when it would block.

  • Implementors should always choose a non-blocking backend if at all possible, as you have noted.
  • In the case where something would block (even though as above it really shouldn't), there's a race condition.

I'll be speaking to Linux because it's the system CSPRNG that I'm most familiar with.

NOTE: here is some background on Linux's cryptographically-secure random number generator (abbreviated as CSPRNG). Somewhat surprisingly, /dev/random and /dev/urandom both source their randomness from the same randomness pool, which is fed by a bunch of different sources, many around I/O and non-deterministic things like that. The difference between the two, besides /dev/random blocking and /dev/urandom not blocking, is that /dev/random reads an integer within the kernel which keeps track of how much "calculated entropy" in bits is available at the time. When the bits of calculated entropy are exhausted, it blocks. "Calculated entropy" increases over time as the randomness pool is sourced with new randomness data. They're both the same data, just with one being spread out over time.

So here's the race condition. Implementors, as noted above, should always choose a non-blocking randomness source, but if they don't, there's a problem. If for some reason, /dev/random on Linux is used as the randomness implementation (don't do it!), std::random::is_ready can return true and then std::random::{system,secure}_random is called and immediately returns ErrorKind::WouldBlock. This can happen because of other processes/threads reading from the device file and reducing the calculated amount of available entropy.

Let's say that we want to read two bytes of randomness:

let mut a = [0u8; 2];
let result = std::random::secure_random(&mut a);

What happens here if there is only one byte available of randomness? It returns ErrorKind::WouldBlock, yet std::random::is_ready will return true. It won't read that one byte, in fact it won't read anything until at least n bytes are available where n is the size of your array. Other processes/threads/things can keep exhausting the randomness and we don't have a way of preventing blocking.

Of course, the goal is to not have blocking implementations. If we put it in core::random though, we might have to account for blocking implementations at the API level. If we can guarantee non-blocking always for std implementations, then the race condition doesn't exist anymore.

Also, I hope I'm understanding properly, this is about /dev/random vs /dev/urandom as on Linux, not O_NONBLOCK, right?

This whole thing could be a non-issue, but it should at least be considered.

Core vs Std

The RFC does a great job of handling core::random not being implemented by returning ErrorKind::Unavailable. IIRC Linux at least does some interesting things where there's a point in time during boot where the RNG is considered "ready." I'm not sure how this actually works though; does the RNG just block until it's declared ready internally? Does the device file not exist until it's "ready?"


@dhardy excellent work on the RFC, seriously A+.

Again, if it'd be beneficial to get all of us in the same room (virtually of course, hangouts or IRC or something), it might save a lot of discussion time 👍

@briansmith
Copy link

  • secure_random is nice as it serves as a reminder of the CSPRNG quality, but there are non-system CSPRNGs so IDK.

I think this is clearly the better name because it emphasizes that if you swap out the PRNG for a different one, it better be secure.

  • system_random is nice because it's the system's RNG, but it may not be immediately clear that its contract is a CSPRNG.
  • I don't favor os_random because system_random expresses this same idea.

Neither of these names really make sense in the case of having a lang item that allows one to swap in a non-system allocator.

@briansmith
Copy link

We propose that libcore provide an implementation of secure_random which always fails, but that libstd uses lang items to override this with a real implementation, depending on the platform. Users of no_std must provide an implementation themselves (likely via a third-party crate) or avoid dependence on this function.

It would be nice to guarantee that one can only swap in a new implementation in the top-level crate, not in a library. Also it would be nice to guarantee that the built-in implementation for a given target will always be used when it is available, so that only platforms that don't have any system PRNG would be able to use the lang item.

@dhardy
Copy link
Member

dhardy commented Jan 4, 2019

@naftulikay please check the linux source: we use /dev/random only to check this has been seeded, since some versions of Linux may be deterministic during early boot. So for us it's either "blocking" or "never blocking again". I believe getrandom works the same way. So I don't care about the case when it might block mid-read.

As for other systems — I don't know; there are a lot and also embedded devices with some type of RNG.

We already have a Gitter room but don't use it much. Potentially we could piggyback https://rust-lang.zulipchat.com/# or make our own, or IRC....

It would be nice to guarantee that one can only swap in a new implementation in the top-level crate, not in a library. Also it would be nice to guarantee that the built-in implementation for a given target will always be used when it is available, so that only platforms that don't have any system PRNG would be able to use the lang item.

Stuff I'm not sure we can do with lang items... perhaps we should go back to @pitdicker's original idea of an extern(C) function as a fallback for EntropyRng (prioritised over JitterRng but under OsRng).

@newpavlov
Copy link
Member

I had a go at writing an RFC: https://github.com/dhardy/rfcs/blob/system-random/text/0000-system-random.md

How about creating an issue for it in the getrandom repo?

@dhardy
Copy link
Member

dhardy commented Mar 4, 2019

I don't see any point prioritising this until after getrandom is published and a dependency of Rand. Even after that, I'm not convinced whether "more code in std" is the right approach. std depends on several external crates now, and could potentially depend on getrandom.

As for no_std support, since we have such a simple function propotype now, using an extern "C" function is a very easy solution and I don't believe lang-items would help, even if neither solution is perfect. See here: rust-random/getrandom#4 (comment)

@newpavlov
Copy link
Member

newpavlov commented Mar 4, 2019

I still believe having getrandom in std/core is the right call. Not only it will allow HashMap and co to use the same entropy source as the rest of an application, but also will make switching source more straightforward and less hack-y. And Rust std already contains code very similar to getrandom (though it does not have to work with large buffers, so code a bit simpler), so including it into std will somewhat help with code de-duplication as well.

But I guess this discussion could indeed wait until the crate gets published and used a bit.

@dhardy
Copy link
Member

dhardy commented Apr 19, 2019

There is a lot of good discussion here, but this isn't the right place to continue it, and long GitHub threads are a pain to read. Lets move future discussion to rust-random/getrandom#21 for now.

@dhardy dhardy closed this as completed Apr 19, 2019
@dhardy dhardy added the E-question Participation: opinions wanted label Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-question Participation: opinions wanted
Projects
None yet
Development

No branches or pull requests

8 participants