Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should getrandom() really fall back to File I/O when the getrandom syscall returns EPERM? #229

Closed
briansmith opened this issue Oct 1, 2021 · 9 comments · Fixed by #396
Closed
Labels
bug Something isn't working

Comments

@briansmith
Copy link
Contributor

If the getrandom syscall returns EPERM then it's probably because (a) the OS is buggy, e.g. QNAP apparently, or (b) the syscall is blocked by seccomp filtering or similar.

In the case where the syscall is blocked by a sandbox, I think it doesn't make sense to fall back to File I/O; why would a sandbox intentionally block getrandom but also intentionally allow file I/O that/s equivalent or worse (w.r.t security)? This indicates to me that the sandbox is buggy and needs to be fixed.

In the case of the QNAP situation, it is purely a bug on their part and it seems like it should be fixed on their side, instead of making security worse on our side. Note that we don't fall back on EFAULT which is another buggy result that sometimes occurs for this syscall.

At the very least, I'd love to have a way to opt out of falling back to File I/O on EPERM, but ideally we'd only fall back on ENOSYS.

@josephlr
Copy link
Member

The problem here is that this crate is used in a very wide range of environments, and we don't want to break them. I also don't see how falling back to the /dev/urandom implementation hurts security. It might hurt reliability but if you've hit this codepath, then something is already unreliable.

Looking at other implementaitons, boringssl doesn't check for EPERM it just fails, openssl falls back on any sort of failure from the gerandom syscall, and the Rust standard library checks for EPERM.

So overall, I think we should either:

  • Only fallback on ENOSYS
  • Fallback on any sort of unexpected error from the getrandom syscall
  • Have some (off by default) control that just says "don't ever fall back to file-based interfaces". This is basically what BORINGSSL_FIPS does.

@briansmith
Copy link
Contributor Author

I also don't see how falling back to the /dev/urandom implementation hurts security. It might hurt reliability but if you've hit this codepath, then something is already unreliable.

All file I/O interfaces are risky to use because it is too easy to close the /dev/random. file descriptor, and then cause the random bytes to be read from whatever file happens to be opened after closing it. See the issues in Python that forced them to jump through a lot of hoops to try to mitigate those risks.

  • Have some (off by default) control that just says "don't ever fall back to file-based interfaces". This is basically what BORINGSSL_FIPS does.

That's probably because their FIPS validation requires an operating system that has getrandom, so perhaps not as relevant here.

Rather than having an off-by-default feature that says "don't ever fall back," getrandom could have a default feature that enables the fallback. This is what ring does with default feature dev_urandom_fallback.

Don't the newest versions of libstd now require a kernel that is new enough to have getrandom? So in the configuration that is being used by libstd, we should definitely not do the fallback. (This would then allow getrandom to use safe std::fs APIs instead of unsafe code to do the file I/O, IIUC.)

@newpavlov
Copy link
Member

Don't the newest versions of libstd now require a kernel that is new enough to have getrandom?

No. On x86 Rust supports kernel 3.2+, while getrandom was introduced only in 3.17.

Have some (off by default) control that just says "don't ever fall back to file-based interfaces".

If wea re to introduce such control, I think we should use configuration flags for it, not crate features, i.e. #[cfg(getrandom_only)] instead of #[cfg(feature = "getrandom_only")]. Such crate feature would subtract crate functionality and some argue that it's an incorrect use of crate features.

@briansmith
Copy link
Contributor Author

My goal here is to limit the fallback logic to "Use File I/O if and only if getrandom isn't implemented in the kernel," instead of "Use File I/O if there is any issue at all trying to use getrandom."

I believe the main concern here is that some devices shipped with broken kernels that return EPERM (or some other error) instead of ENOSYS. I think there was one specific brand of home NAS that was known to have this issue, for example.

I think that at least Android has not been known to have that problem though, so I think at least on Android we can be stricter and require that the error be ENOSYS. Further, using the same logic as #376, we can become stricter based on the target_arch based on the platform support policy. Taking these baby steps would be a material improvement.

@newpavlov
Copy link
Member

I think that at least Android has not been known to have that problem though, so I think at least on Android we can be stricter and require that the error be ENOSYS

Sounds good to me.

@briansmith
Copy link
Contributor Author

My goal here is to limit the fallback logic to "Use File I/O if and only if getrandom isn't implemented in the kernel," instead of "Use File I/O if there is any issue at all trying to use getrandom."

First of all, the above mischaracterizes the current fallback strategy. The fallback only happens on ENOSYS and EPERM, not "any issue at all." I am sorry about that exaggeration.

The product that seemed to force libraries to accept EPERM was QNAP Container Station, which was running applications in Docker Containers, using a seccomp policy that blocked getrandom (probably it used an allowlist approach and hadn't added getrandom to its allowlist). I looked at docker's default seccomp policies and for at least 9 years getrandom appears to have been in the default policy. So I think it is very likely that very few systems really need the EPERM support. And also the issues with QNAP Container Station were several years ago, so maybe that product doesn't even need this workaround anymore as I imagine by now it has been forced to update its allowlist of syscalls.

In PR #396 @newpavlov proposes to follow my suggestion above and remove the EPERM fallback on Android. I think that is a good change, whether it is in that PR or on a new one. I think the change should include the comment "// The fallback on EPERM is intentionally not done on Android since this workaround seems to be needed only for specific Linux-based products that aren't based on Android. See #229."

I also think we could have a feature flag that allows users to disable the fallback on PERM for target_os = "linux" as well, to allow for experimentation with the goal of making that behavior the default. Especially if we could find somebody from QNAP to help us.

@briansmith
Copy link
Contributor Author

Another minor detail: As https://www.qnap.com/en/how-to/faq/article/how-do-i-find-out-if-my-nas-uses-an-arm-or-x86-processor indicates, perhaps the EPERM fallback is only needed for aarch64 and x86_64 (and arm and/or x86?) target architectures.

@newpavlov
Copy link
Member

perhaps the EPERM fallback is only needed for aarch64 and x86_64 (and arm and/or x86?) target architectures.

I would prefer to keep it simple and more or less consistent. I don't think a new feature flag would pull its weight (especially considering introduction of the disable_urandom_fallback flag). Hopefully, Rust will bump min kernel versions to 3.17+ for all targets relatively soon and we will be able to drop the file fallback completely.

@the8472
Copy link

the8472 commented Mar 7, 2024

why would a sandbox intentionally block getrandom but also intentionally allow file I/O that/s equivalent or worse (w.r.t security)?

Because syscall filters are often written as whitelists. If the whitelist was written at a time where getrandom didn't exist or was still new then it wouldn't have been included.
This is often a problem with new syscalls and old docker installations, though I don't know if it affects getrandom specifically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants