Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error handling #54

Merged
merged 7 commits into from
Jul 27, 2019
Merged

Improve Error handling #54

merged 7 commits into from
Jul 27, 2019

Conversation

josephlr
Copy link
Member

@josephlr josephlr commented Jul 5, 2019

Edit: Significantly revised to account for earlier feedback. Depends on #57 (for fill_exact)

The PR allows for error handling to work without depending on libstd, it also cleans up the Error type. Specifically:

  • The Error type maintains its NonZeroU32 representation. But now instead of holding a single code, it holds one of two values:
    • An OS Error: represented by a positive i32 value
    • A custom Error: represented by a u32 value larger than i32::MAX.
  • We improve the Debug and Display implementations for OS Errors (on UNIX) by using libc::strerror_r.
  • We improve the Debug and Display implementations for Custom Errors (on some platforms) by defining additional error constants (instead of just returning Error::UNKNOWN)
  • util_libc::last_os_error() replaces std::io::Error::last_os_error().

Now the only things that depend on libstd are:

  • Conversions enabled by the std feature
  • File-based implementations (to be fixed in a followup PR).

@dhardy
Copy link
Member

dhardy commented Jul 5, 2019

This change to the error type looks like it would interfere with rand_core. Specifically we still need a NonZeroU32 code, unless you want to chip in here with another idea.

@newpavlov
Copy link
Member

We issue a compiler_error! on unsupported platforms

I think we should not rush this decision and possibly postpone it. (ideally we would make this change together with dropping wasm32-unknown-unknown support)

The public Error type is now an opaque struct, wrapping a implementation-specific data structure.

What is rationale behind this change?

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

I think we should not rush this decision and possibly postpone it. (ideally we would make this change together with dropping wasm32-unknown-unknown support)

Agreed, I put this change in this PR, but it really should be in a separate PR. My initial thought was to put all of the breaking changes together, but I think it makes sense for them to be split up.

What is rationale behind this change?

My basic reasoning was that currently getrandom::Error initially looks like it has values can be matched on. However, when I was trying to use it, I quickly realized that the only useful thing I can do with these types is print them out with Display or Debug. The main reason for this problem is that (depending on the target) the reasons for failure can be wildly different. This change just makes this implicit behavior explicit.

This change to the error type looks like it would interfere with rand_core. Specifically we still need a NonZeroU32 code, unless you want to chip in here with another idea.

I guess I'm surprised that rand_core::Error even exists. I has just assumed that the RngCore trait would just have a type Error parameter, and each generator would chose its own type. is there a reason things are not done that way?

@newpavlov I saw that you were against such a change in rust-random/rand#837 (adding an associated type). But if getrandom implemented this change, and rand_core implemented the associated types change, we would get no_std printable errors, as each generator would just specify it's own type.

@newpavlov
Copy link
Member

newpavlov commented Jul 5, 2019

My basic reasoning was that currently getrandom::Error initially looks like it has values can be matched on. However, when I was trying to use it, I quickly realized that the only useful thing I can do with these types is print them out with Display or Debug.

But you can retrieve NonZeroU32 error code via code method and match on it. This will allow you to trap some platform-specific error codes if needed, so I think the current design is better than the completely opaque type.

I will answer regarding rand_core::Error in the dedicated thread.

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

This will allow you to trap some platform-specific error codes if needed

While this is possible, there's not a good way to actually go about doing it (as it's not clear what a user would even match on, or what types of values can be returned). This PR tries to address this by implementing From<Error> for std::io::Error which seems better suited for matching, printing, etc... This requires the std feature, but even without it, you still get good messages.

There's also the bigger issue that certain implementations can't return a meaningful error code.
Windows and iOS really shouldn't use std::io::Error::last_os_error() or errno; their implementations don't specify how those functions interact with errno. But by ditching error codes for an opaque type, we actually get better error messages and a more efficient implementation (as the Error types can be smaller). It also lets certain platforms (like Fuchsia) have an Error that's Infallible.

@newpavlov
Copy link
Member

newpavlov commented Jul 5, 2019

This PR tries to address this by implementing From for std::io::Error

Nothing forbids us from adding conversion to io::Error for current design as well and use it in Debug/Display impls, so I don't think it's a valid argument in favor of making error type opaque.

There's also the bigger issue that certain implementations can't return a meaningful error code.
Windows and iOS really shouldn't use std::io::Error::last_os_error() or errno

While it's indeed may be a valid critique of the current implementation (though it also applies to the current std implementation as well). But it's not an argument in favor of your error type design.

But by ditching error codes for an opaque type, we actually get better error messages and a more efficient implementation (as the Error types can be smaller).

But for most platforms you use io::Error which is 16 bytes on 64-bit platforms instead of the current 4. I don't see how it's more efficient. While an ability to indicate infallibility is indeed nice, I don't think it will matter performance wise. The current Result<(), Error> fits into a single register, so in the most cases it will not even hit memory! (well, assuming we use 32 or 64 bit CPU) So 1 byte vs 4 will not make any difference here and infallibility can be optimized with LTO or by adding #[inline] attribute. Thus efficiency argument is also weak in my opinion

@josephlr josephlr force-pushed the error branch 2 times, most recently from b9cca59 to 84c5782 Compare July 5, 2019 13:18
@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

I agree with most of the above, as the changes I made here could be made with the old implementation. I've also updated this PR to make a few of those changes (specifically fixing Windows and iOS). Most the the efficiency arguments don't matter as long as Result<(), Error> fits in a usize (which should be possible for all platforms).

If a user needs raw OS error codes, the current implementation allows that by converting to std::io::Error and calling raw_os_error(). However, this doesn't work on no_std.

So @newpavlov what about a mix of what we've been talking about? Each target can still define the type of error that makes sense, essentially one of:

  • io::Error (soon to be just an errno implementation inside of getrandom)
  • Infallible
  • StaticError (if all you can indicate is just a static str)
  • WebError (for stdweb)

This Error type implements Debug, Display, and can be converted into a io::Error. However, it also implements a raw_os_error() method returning an Option<i32> that gives the raw OS error code. This would seem to give us the best of everything:

  • We don't have to deal with the weird artificial error codes that currently exist for the Error type
  • We get better error messages with simpler code.
  • Users now know if they hit an OS error or not.
  • no_std users can use raw_os_error().
  • std users can just use io::Error (useful for composing getrandom with other IO operations).

@josephlr josephlr force-pushed the error branch 2 times, most recently from a4fc20f to 6b07e4f Compare July 5, 2019 13:46
@newpavlov
Copy link
Member

newpavlov commented Jul 5, 2019

I am certainly against using io::Error instead of a simple error code. Not only it ties us to std which goes contrary to the std integration work, but also simply feels wasteful. Swapping io::Error with error code without enabled std feature looks like a needless complication. If you want reuse formatting functionality you can do it simply inside cfg'd Debug and Display impls. And I don't see much uses for matching on io::ErrorKind, especially considering that you propose using io::Error only for some platforms.

Swapping error type based on the target does not look like a simplification to me, even worse I have a hunch that it will only cause weird issues in future. Also it looks like you assume that rand_core will use associated error type, as you can see I am not sure if it's a better design compared to the current one.

Overall, motivation of this rework looks a bit weak to me. Looks like the main driving factor is desire to improve Debug and Display impls, but you already can improve them via error_msg_inner.

@dhardy
What do you think?

@newpavlov
Copy link
Member

Meanwhile can you please submit cfg-if and Windows/iOS changes as a separate PR?

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

I am certainly against using io::Error instead of a simple error code.

So there are two issues here:

  • Using io::Error as the error type on Unix targets.
    - This is just a temporary artifact until my next PR removes the std::io reqirement for all targets (by just using libc errno). Then it will be an i32 or something.
  • Having io::Error: From<getrandom::Error>. This is very valuable as it lets you call getrandom and then other io functions, and easily use ? to have your parent function return a io::Result.

The switching on io::Error type is not ideal I agree. This is why I proposed having raw_os_error() that allows users on all platforms to see if/what the underlying OS error was.

Also it looks like you assume that rand_core will use associated error type, as you can see I am not sure if it's a better design compared to the current one.

So this is actually why I think having a raw_os_error() method directly on getrandom::Error for all platforms is a good idea. It allows getrandom to be updated without resolving the issue of associated types in rng_core. The rng_core crate can just store the i32 returned by raw_os_error, and it even works on no_std.

@dhardy
Copy link
Member

dhardy commented Jul 5, 2019

I agree with most of what @newpavlov says; from the API perspective this isn't a simplification, and you haven't attempted to explain how this works with rand_core (which currently supports getrandom in no_std mode for all platforms due to the weird way feature flags work).

This will allow you to trap some platform-specific error codes if needed

While this is possible, there's not a good way to actually go about doing it (as it's not clear what a user would even match on, or what types of values can be returned).

Nor do I see why we should do so — the source code is public, and the only reason I can think of why anyone would need to do this is to work around some weird issue. (We don't even report "transient" failures any more.)

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

Meanwhile can you please submit cfg-if and Windows/iOS changes as a separate PR?

Sure!

After that, should I put that changes that call errno directly (and remove the std::io dependancy) in this PR or another one.

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

Nor do I see why we should do so — the source code is public, and the only reason I can think of why anyone would need to do this is to work around some weird issue. (We don't even report "transient" failures any more.)

That's fair, but if we don't want users to match on the type, then shouldn't the return type be more opaque, not less?

@dhardy
Copy link
Member

dhardy commented Jul 5, 2019

So this is actually why I think having a raw_os_error() method directly on getrandom::Error for all platforms is a good idea

How does this work on platforms not able to use io::Error directly?

After that, should I put that changes that call errno directly (and remove the std::io dependancy) in this PR or another one.

We can't accept this PR as is for a number of reasons, one of which is the additional dependency on std, so if it helps resolve that it makes sense to do so in this PR or before this one.

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

How does this work on platforms not able to use io::Error directly?

So like the one's that use RDRAND? Then raw_os_error() would return None indicating that the failure was not due to an OS error. This is essentially what we're already doing with Error::UNKNOWN just more explicit.

@newpavlov
Copy link
Member

newpavlov commented Jul 5, 2019

Another argument against using target-dependent type is potential future addition of getrandom to Rust in the form of lang item à la #[global_allocator]. Having a concrete error type shared across all targets will make it significantly easier.

Why users need to know that it was an OS error? Assuming we will have a good error description, I don't think it should matter.

@josephlr
Copy link
Member Author

josephlr commented Jul 5, 2019

Another argument against using target-dependent type is potential future addition of getrandom to Rust in the form of lang item à la #[global_allocator]. Having a concrete error type shared across all targets will make it significantly easier.

That's a really good point I had not thought of.

If that's the case then it might be even easier to just ditch having any information at all in the error type. Error's in getrandom would just be a single value indicating that the OS could not provide randomness (and maybe a TARGET_UNSUPPORTED value, if we want to not use compiler_error). This is how ring handles this sort of problem.

Edit: Additional error context would be provided via the log crate with error!. We sort of do this currently.

It also makes it so much easier to remove std dependancies. Otherwise we have to essentially re-implement std::io::Error::last_os_error.

@newpavlov
Copy link
Member

newpavlov commented Jul 5, 2019

I think ring approach is a bit too extreme and in our case will make debugging of potential errors much harder. Was it descriptor exhaustion? Incorrectly implemented syscall? Did someone screw with /dev/urandom? How exactly did hardware RNG misbehave? Having an error code will significantly reduce scope of potential problems.

None of the 3 reasons stated in the Unspecified documentation apply to our case in my opinion.

@josephlr
Copy link
Member Author

josephlr commented Jul 7, 2019

@newpavlov @dhardy I significantly reworked this PR to incorporate your recommendations. The PR description has been updated. PTAL

This change now focuses more on removing libstd deps for error handling, and only makes minor changes to Error.

@josephlr
Copy link
Member Author

@dhardy @newpavlov I've rebased this and changed util_libc.rs to use the changes from rust-lang/libc#1432, so that no raw extern declarations are needed anymore. Is there anything else you wanted me to do on this PR?

@dhardy
Copy link
Member

dhardy commented Jul 16, 2019

Sorry for the delay — I'd like to review again, but I think I'm happy with it. I am however waiting for @newpavlov to reply to my previous comment (about reserving blocks).

@newpavlov
Copy link
Member

newpavlov commented Jul 17, 2019

Sorry for the late reply!

I still don't see raison d'etre for from_os_error, custom_error, internal, raw_os_error and try_os_error instead of the simple From<NonZeroU32> impl and code method. I think specifying in documentation that errors with code smaller than 1 << 31 should be interpreted as OS errors will be good enough. Even worse, you add a potential panic point by using assert in from_os_error. Compiler can't prove that you will not get a negative errno, so this check will be always present, while the current NonZeroU32::new-based code is totally panic-free.

What exact problem are you trying to solve here? Formatting can be improved without any issues separately from these API changes.

@dhardy
Copy link
Member

dhardy commented Jul 17, 2019

The way I see it, @newpavlov has a good point about avoiding the need for panic and eliminating from_os_error and internal in favour of From<NonZeroU32>, however there is value in having structure here: at least documentation suggesting what kind of errors RNGs should return and how error codes can (if necessary) be interpreted. custom_error is useful sugar for documenting this and making usage convenient, so I think it might as well stay.

The raw_os_error and try_os_error functions only really serve one use: converting an error code to the typical i32. raw_os_error may be useful to to hook into strerror / OS-specific variants, but I think try_os_error has no other use — we should handle formatting of "custom" errors ourselves (probably something like 0x2F09-3).

@josephlr
Copy link
Member Author

I still don't see raison d'etre for from_os_error, custom_error, internal, raw_os_error and try_os_error instead of the simple From<NonZeroU32> impl and code method. I think specifying in documentation that errors with code smaller than 1 << 31 should be interpreted as OS errors will be good enough

This makes sense, I removed from_os_error, custom_error, internal, and try_os_error. This makes the diff smaller and the code cleaner. I left in raw_os_error as it's a useful helper and could be used to disambiguate error types (as @dhardy mentioned).

I also just did away with the whole block nonsense. After looking at it with a fresh set of eyes it seems unnecessarily complex. The reservations are now just:

  • [0, 1 << 31) - positive i32s - OS Errors
  • [1 << 31, 1 << 31 + 1 << 30) - codes for getrandom, rand, etc...
  • [1 << 31 + 1 << 30, 1 << 32) - user definable codes

The PR description and docs comments have also been updated/clarified. The Debug and Display implementations are slightly different, and the RTL_GEN_RANDOM_FAILED constants are now just error constants.

src/wasi.rs Outdated Show resolved Hide resolved
@josephlr
Copy link
Member Author

Note that adding in the Error constants required bumping the min rustc version to 1.33. Is that OK?

@dhardy
Copy link
Member

dhardy commented Jul 26, 2019

Is there an easy work-around for rustc 1.32? While most Rand users are happy to use the latest stable, there are a few less happy about it (for versions < 6 months old — 1.33 is almost that old now).

@newpavlov
Copy link
Member

newpavlov commented Jul 26, 2019

What feature do we need from Rust 1.33? Is it about making internal_error a const function? In that case we can use explicit error codes for the time being, i.e. instead of const UNKNOWN_IO_ERROR: Error = internal_error(1) you will write:

const UNKNOWN_IO_ERROR: Error = Error(unsafe {
    NonZeroU32::new_unchecked(Error::INTERNAL_START + 1)
});

Also I think you could use a macro if you don't like such explicitness.

I will try to review changes on this weekend.

@dhardy
Copy link
Member

dhardy commented Jul 26, 2019

I won't do anything before @newpavlov has a chance to review, but mostly I'm happy about this. Integration into rand_core looks fairly easy, but I have one question: which code range should we use for Rand errors?

Perhaps we should consider them internal and use from 0xD000_0000 = 1<<31 + 1<<30 + 1<<28 for Rand libs; this leaves plenty for getrandom usage and avoids confusion (since getrandom is still a rust-random lib for now).

@josephlr
Copy link
Member Author

I won't do anything before @newpavlov has a chance to review, but mostly I'm happy about this. Integration into rand_core looks fairly easy, but I have one question: which code range should we use for Rand errors?

Perhaps we should consider them internal and use from 0xD000_0000 = 1<<31 + 1<<30 + 1<<28 for Rand libs; this leaves plenty for getrandom usage and avoids confusion (since getrandom is still a rust-random lib for now).

My thought was to have all the codes in [1 << 31, 1 << 31 + 1 << 30) (i.e. between Error::INTERNAL_START and Error::CUSTOM_START) to be used by all of the Rand libs, getrandom, rand-core, rand-*, etc...

I don't know how explicit you want to make the split between getrandom error code ranges and rand error code ranges. If you want to make it explicit, we can just split the "Internal" range at 1 << 31 + 1 << 29, and say 0x8000_0000 - 0xA000_0000 is for getrandom and 0xA000_0000 - 0xC000_0000 is for the rand libs.

@newpavlov good idea, I'll just turn the function into a macro. The unsafe in const fn is the only thing this needs from 1.33.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several nitpicks in addition to the MSRV issue, but otherwise looks great!

src/openbsd.rs Show resolved Hide resolved
src/util_libc.rs Outdated Show resolved Hide resolved
src/error.rs Outdated Show resolved Hide resolved
src/error.rs Show resolved Hide resolved
@newpavlov
Copy link
Member

Great! Thank you!

@newpavlov newpavlov merged commit 00c3cff into rust-random:master Jul 27, 2019
@newpavlov
Copy link
Member

newpavlov commented Jul 27, 2019

One small thing which bothers me a bit is $n as u16 as u32 cast in the internal_error macro. Is it intentional or just a leftover from the old code?

@dhardy
Copy link
Member

dhardy commented Jul 27, 2019

I guess it's just to ensure than no constants are too large? It's harmless anyway.

@josephlr josephlr deleted the error branch July 28, 2019 04:47
@josephlr
Copy link
Member Author

josephlr commented Jul 28, 2019

I guess it's just to ensure than no constants are too large? It's harmless anyway.

Yes, before the function just took a u16, to ensure that the NonZeroU32::new_unchecked call is sound. This was just the easiest way to do the same thing in a macro.

zrj-rimwis added a commit to DragonFlyBSD/DeltaPorts that referenced this pull request Sep 6, 2019
It looks like that rust-random/getrandom#54 has
wrongly assigned us to Darwin case.  The __error() is not libc callable
function, but rather inline helper implemented at headers level to assist
code that was using errno incorrectly.
It seems that thread_local has still not become stable feature in rust?
If this very useful language feature does not get official support soon
enough in rust, it would be trivial for us to add simple helper function
in libc using the most common __errno_location() variant out there.

Since I could not figure out how to disable -Werror on this warning when
using "extern { #[thread_local] static errno: c_int; }", just bringing
back the previous crate helper that worked (using google search).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants