Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Provide try_get_* methods #254

Open
oblique opened this issue Apr 14, 2019 · 14 comments · May be fixed by #277
Open

Request: Provide try_get_* methods #254

oblique opened this issue Apr 14, 2019 · 14 comments · May be fixed by #277

Comments

@oblique
Copy link

oblique commented Apr 14, 2019

Some network protocols have dynamic length packets, so we can not have a single buf.remaining() check before starting parsing them. In that case before get_* we need to check the remaining length to avoid assertions.

I believe try_get_* methods that do not assert they will help us write cleaner code.

@caelunshun caelunshun linked a pull request Jul 30, 2019 that will close this issue
@carllerche carllerche added this to the v0.6 milestone Oct 16, 2020
@camshaft
Copy link
Contributor

camshaft commented Oct 16, 2020

I agree this would be helpful to have. If you look at the rust-fuzz trophy case, many of the bugs in the oor category could have been prevented with this type of interface.

I think we need to answer a couple of questions, first.

advance and copy_to_slice

Ideally, we should include a non-panicking version advance and copy_to_slice. This would keep things consistent with not wanting to panic and require explicitly opting in to panicking, rather than opting out.

Change existing APIs

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

Pros

  • Considering a lot of the use cases for Buf handle untrusted input, I think this is the best direction long term, as it forces implementations to consider the possibility of a EOF.

  • Adding a try_ method per T would essentially double the already-lengthy trait API and docs page.

  • It would still be possible to panic, if desired:

    buf.get_u8().expect("i've already checked the length");

    Compared to the amount of boilerplate it currently requires to not panic:

    // this also isn't guaranteed to be the length of `T` - easy to drift from the actual impl
    if buf.remaining() < 4 {
        return Err(EOF);
    }
    let v = buf.get_u32();

Cons

  • It would most likely break a lot of existing usage of Buf (but now would be the time to do it before 1.0)
  • Panicking would require an additional .unwrap()

Return type

In my mind, we have 3 options:

  • Option<T>
    This option is attractive, as it doesn't introduce any new public types. My biggest problem with it, though, is it becomes kind of tedious to use in a function that returns a Result:

    fn decode<B: Buf>(buf: B) -> Result<(), MyError> {
        let a = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
        let b = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
        let c = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
        Ok(())
    }

    whereas it's a bit easier to deal with the inverse:

    fn decode<B: Buf>(buf: B) -> Option<()> {
        let a = buf.try_get_u8().ok()?;
        let b = buf.try_get_u8().ok()?;
        let c = buf.try_get_u8().ok()?;
        Some(())
    }

    You could implement From<NoneError> for MyError but that could happen anywhere for various reasons.

  • Result<T, struct UnexpectedEnd>
    Assuming all of the methods only check that buf.remaining() > size_of::<T>() then this is a fine option. If at some point in the future, there's another error condition we want to signal, it would be a breaking change to use an enum instead.

  • Result<T, #[non_exhaustive] enum TryGetError>
    This is probably the most stable option long term, although may be overkill if it'll only ever be a single variant.

@Matthias247
Copy link
Contributor

Result<T, struct UnexpectedEnd>

That's what I would lean towards. Since the APIs here are just about reading, I don't see a need for more error variants. Buf doesn't cover serialization/deserialization of higher level data structures which could produce additional errors. Code which does that can define their own errors, and implement From<UnexpectedEnd> for call to Buf

@Ralith
Copy link

Ralith commented Oct 17, 2020

Result<T, struct UnexpectedEnd>

We use this pattern heavily in quinn and it works great. We even have an extension trait for all Buf implementers to provide methods of that form for various integer types, plus other assorted trivially-represented stuff.

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

I'm a fan of this approach, since it's overwhelmingly the way we use the Buf API.

@carllerche
Copy link
Member

After much bikeshedding in Discord, I'm leaning back towards providing try_get_* variants.

The reasoning is, while it makes a lot of sense for Buf to provide getters that return Result, it makes far less sense for BufMut to provide setters to return Result. The buffer is usually pre-allocated to contain the full message.

It also doesn't feel great for BufMut to provide setters that panic and Buf getters return Result. To maintain symmetry, Buf::get_[T] can panic and we can add Buf::try_get_[T] to return a result.

@carllerche carllerche removed this from the v0.6 milestone Dec 15, 2020
@KizzyCode
Copy link

KizzyCode commented Dec 20, 2020

I'm also a big fan "no-panic by default". IMO, out-of-bounds panicking is one of the biggest design mistakes in std; similar to Java's NullPointerExceptions.
There are situations where panicking is the only reasonable default to handle errors, but OOB is usually none of them.

As far as I can tell, bytes is often used in crates with highly untrusted input where you almost never want to panic just because some foreign input data is truncated. So the default API should be failsafe, because you can always still expect a value if appropriate – but explicit is better than implicit here.

A try_-API has two disadvantages:

First, it feels like it's a bit against Rust's design philosophy IMO? Currently most of Rust is like: "Ah nope, this is dangerous… You sure you wanna do this?", whereas an x and try_x dualism is more like: "Ok, will do… (Oh, and if you want me to perform some checks – tell me, ok?)".
I'm a big fan of types like Option<T> or Result<T, E> for fallible operations, because they allow/force me to notice that something can go wrong here (saved my ass numerous times 😅).

Also in this case, try_get_*-APIs feel like 2nd-class citizens and are easy to forget – if you accidentally use get_* instead of try_get_*?, there is no compiler warning or lint etc., so it's easy to miss. Furthermore, there is already a get and get_mut-API for slices/sliceables which returns an Option<T>, and it's pretty easy to confuse it with get_* if you work with both types.


When it comes to the specific API, I'm in favor of returning an Option<T>:

  1. It is consistent with std's slice::get and slice::get_mut.
  2. Not having any bytes left may not even be an error (e.g. if I parse an indefinite list of concatenated u32s).
  3. If it is an error, you'd probably want to use your own error type anyway, because the errors are usually context specific (i.e. in some parts I'd e.g. want to return InvalidHeader whereas in other parts I might want sth. like NeedMoreInput as error).
    So something like impl From<UnexpectedEnd> for MyError is not very useful except in some edge cases IMO?

@camshaft
Copy link
Contributor

As mentioned in discord, I think something like this could work pretty well: https://godbolt.org/z/hjYbeEf5P.

We could also do a blanket impl: impl<B: TryBuf> Buf for B { ... } and delegate calls with an expect(...). It does mean implementations would only be able to pick one trait to implement and if they've implemented Buf, they wouldn't be able to implement TryBuf, at least until specialization is stabilized.

@danieleades
Copy link

in the meantime, this might do what you're looking for - https://github.com/danieleades/safer-bytes

@danieleades
Copy link

@danburkert Your crate is exactly what I was looking for! Thank you so much for making it public.

i'm please to hear it might help. That inspired me to tidy it up a little

@ghost
Copy link

ghost commented Mar 1, 2022

Currently I'm having to use two dependencies for working with bytes, bytes for writing and byteorder for reading, so I'm hoping this request gets fulfilled because it would simplify things :) byteorder returns io::Results containing ErrorKind::UnexpectedEof, which comes from Read::read_exact.

@wheird-lee
Copy link

wheird-lee commented Aug 13, 2022

Another implementation is

use bytes::Buf;
use std::mem::size_of();

#[derive(Clone, Copy, Debug)]
pub enum ErrorKind {
    EOF
}

pub trait TryBuf {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind>;
    // ...other try_get APIs
}

impl<T: Buf> TryBuf for T {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind> {
        if (size_of::<u8>() <= self.remaining()) {
            Ok(self.get_u8())
        } else {
            Err(ErrorKind)
        }
    }
    // ...other impls
}

This is efficient and will never break any existing usage of Buf.

I implement this in try_buf

@ten3roberts
Copy link

ten3roberts commented May 4, 2023

The issue with first checking that it is enough, and then reading, is that the reading part can very easily be changed by yourself or someone without updating the above check, causing a possible panic point. It is also quite error prone and unclear by hand calculating the size, E.g; having a literal 4 or 32 and not knowing why, which will be outdated if someone comes aloing and changes a get_u8 to a get_u16 or similar.

In other words, panics become a surprise because some code changed further down, which goes against the general design of Rust of not upholding invariants yourself, E.g parse and not validate, or using types rather than checks to commicate impossibilites and guarantees.

Furthermore, the manual method makes it very hard to validate if the size depends on data that has already been read, like branching code, which requires making each "section" do its own validating and manual calculation of what the code already knowns and encode. Similar in a way to hand calculating stack offsets based on function bodies, which we all know how well that goes as son as there is a branch.

Though this is just my take on the matter when writing networking code and attempting to uphold and remember implicit assumptions as functions may panic otherwise with no clear indication that they will based on signature.

@tombanksme
Copy link

Hello,
Pretty new to Rust and I just ran into this issue in late 2023; would be great to get these methods into the bytes crate. For now I've added the trait suggested by @wheird-lee into my project.

It's worked a treat, thanks 👍

Happy to help with this where I can

@hpenne
Copy link

hpenne commented Oct 22, 2024

I think this is really needed. If you're decoding a non-trivial protocol that takes untrusted input, then it will be far superior to today's API. The type system will in effect end up enforcing safe and correct behaviour, while today's API is error prone and can lead to code that panics on bad input. You will perhaps loose a little speed, but on the other hand modern optimizing compilers are amazing at removing unnecessary checks, so this might not actually be the case.

This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?

@hpenne
Copy link

hpenne commented Oct 25, 2024

This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?

I missed the open PR on this. Is there any chance of that getting merged? It looks like just what many of us have wished for. It looks complete at first glance (with the possible exception of tests), but I'll be happy to help out if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

12 participants