Request: Provide try_get_* methods #254

oblique · 2019-04-14T12:55:17Z

Some network protocols have dynamic length packets, so we can not have a single buf.remaining() check before starting parsing them. In that case before get_* we need to check the remaining length to avoid assertions.

I believe try_get_* methods that do not assert they will help us write cleaner code.

The text was updated successfully, but these errors were encountered:

camshaft · 2020-10-16T20:49:34Z

I agree this would be helpful to have. If you look at the rust-fuzz trophy case, many of the bugs in the oor category could have been prevented with this type of interface.

I think we need to answer a couple of questions, first.

`advance` and `copy_to_slice`

Ideally, we should include a non-panicking version advance and copy_to_slice. This would keep things consistent with not wanting to panic and require explicitly opting in to panicking, rather than opting out.

Change existing APIs

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

Pros

Considering a lot of the use cases for Buf handle untrusted input, I think this is the best direction long term, as it forces implementations to consider the possibility of a EOF.
Adding a try_ method per T would essentially double the already-lengthy trait API and docs page.

It would still be possible to panic, if desired:

buf.get_u8().expect("i've already checked the length");

Compared to the amount of boilerplate it currently requires to not panic:

// this also isn't guaranteed to be the length of `T` - easy to drift from the actual impl
if buf.remaining() < 4 {
    return Err(EOF);
}
let v = buf.get_u32();

Cons

It would most likely break a lot of existing usage of Buf (but now would be the time to do it before 1.0)
Panicking would require an additional .unwrap()

Return type

In my mind, we have 3 options:

Option<T>
This option is attractive, as it doesn't introduce any new public types. My biggest problem with it, though, is it becomes kind of tedious to use in a function that returns a Result:

fn decode<B: Buf>(buf: B) -> Result<(), MyError> {
    let a = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    let b = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    let c = buf.try_get_u8().ok_or(MyError::UnexpectedEnd)?;
    Ok(())
}

whereas it's a bit easier to deal with the inverse:

fn decode<B: Buf>(buf: B) -> Option<()> {
    let a = buf.try_get_u8().ok()?;
    let b = buf.try_get_u8().ok()?;
    let c = buf.try_get_u8().ok()?;
    Some(())
}

You could implement From<NoneError> for MyError but that could happen anywhere for various reasons.

Result<T, struct UnexpectedEnd>
Assuming all of the methods only check that buf.remaining() > size_of::<T>() then this is a fine option. If at some point in the future, there's another error condition we want to signal, it would be a breaking change to use an enum instead.
Result<T, #[non_exhaustive] enum TryGetError>
This is probably the most stable option long term, although may be overkill if it'll only ever be a single variant.

Matthias247 · 2020-10-16T23:54:47Z

Result<T, struct UnexpectedEnd>

That's what I would lean towards. Since the APIs here are just about reading, I don't see a need for more error variants. Buf doesn't cover serialization/deserialization of higher level data structures which could produce additional errors. Code which does that can define their own errors, and implement From<UnexpectedEnd> for call to Buf

Ralith · 2020-10-17T01:07:13Z

Result<T, struct UnexpectedEnd>

We use this pattern heavily in quinn and it works great. We even have an extension trait for all Buf implementers to provide methods of that form for various integer types, plus other assorted trivially-represented stuff.

Would it be better, instead, to change the existing APIs to return Result<T, Error>?

I'm a fan of this approach, since it's overwhelmingly the way we use the Buf API.

carllerche · 2020-10-20T17:00:01Z

After much bikeshedding in Discord, I'm leaning back towards providing try_get_* variants.

The reasoning is, while it makes a lot of sense for Buf to provide getters that return Result, it makes far less sense for BufMut to provide setters to return Result. The buffer is usually pre-allocated to contain the full message.

It also doesn't feel great for BufMut to provide setters that panic and Buf getters return Result. To maintain symmetry, Buf::get_[T] can panic and we can add Buf::try_get_[T] to return a result.

KizzyCode · 2020-12-20T02:55:37Z

I'm also a big fan "no-panic by default". IMO, out-of-bounds panicking is one of the biggest design mistakes in std; similar to Java's NullPointerExceptions.
There are situations where panicking is the only reasonable default to handle errors, but OOB is usually none of them.

As far as I can tell, bytes is often used in crates with highly untrusted input where you almost never want to panic just because some foreign input data is truncated. So the default API should be failsafe, because you can always still expect a value if appropriate – but explicit is better than implicit here.

A try_-API has two disadvantages:

First, it feels like it's a bit against Rust's design philosophy IMO? Currently most of Rust is like: "Ah nope, this is dangerous… You sure you wanna do this?", whereas an x and try_x dualism is more like: "Ok, will do… (Oh, and if you want me to perform some checks – tell me, ok?)".
I'm a big fan of types like Option<T> or Result<T, E> for fallible operations, because they allow/force me to notice that something can go wrong here (saved my ass numerous times 😅).

Also in this case, try_get_*-APIs feel like 2nd-class citizens and are easy to forget – if you accidentally use get_* instead of try_get_*?, there is no compiler warning or lint etc., so it's easy to miss. Furthermore, there is already a get and get_mut-API for slices/sliceables which returns an Option<T>, and it's pretty easy to confuse it with get_* if you work with both types.

When it comes to the specific API, I'm in favor of returning an Option<T>:

It is consistent with std's slice::get and slice::get_mut.
Not having any bytes left may not even be an error (e.g. if I parse an indefinite list of concatenated u32s).
If it is an error, you'd probably want to use your own error type anyway, because the errors are usually context specific (i.e. in some parts I'd e.g. want to return InvalidHeader whereas in other parts I might want sth. like NeedMoreInput as error).
So something like impl From<UnexpectedEnd> for MyError is not very useful except in some edge cases IMO?

camshaft · 2021-09-20T23:40:49Z

As mentioned in discord, I think something like this could work pretty well: https://godbolt.org/z/hjYbeEf5P.

We could also do a blanket impl: impl<B: TryBuf> Buf for B { ... } and delegate calls with an expect(...). It does mean implementations would only be able to pick one trait to implement and if they've implemented Buf, they wouldn't be able to implement TryBuf, at least until specialization is stabilized.

danieleades · 2021-10-19T10:54:36Z

in the meantime, this might do what you're looking for - https://github.com/danieleades/safer-bytes

danieleades · 2021-11-29T17:40:03Z

@danburkert Your crate is exactly what I was looking for! Thank you so much for making it public.

i'm please to hear it might help. That inspired me to tidy it up a little

ghost · 2022-03-01T03:05:58Z

Currently I'm having to use two dependencies for working with bytes, bytes for writing and byteorder for reading, so I'm hoping this request gets fulfilled because it would simplify things :) byteorder returns io::Results containing ErrorKind::UnexpectedEof, which comes from Read::read_exact.

wheird-lee · 2022-08-13T10:54:30Z

Another implementation is

use bytes::Buf;
use std::mem::size_of();

#[derive(Clone, Copy, Debug)]
pub enum ErrorKind {
    EOF
}

pub trait TryBuf {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind>;
    // ...other try_get APIs
}

impl<T: Buf> TryBuf for T {
    fn try_get_u8(&mut self) -> Result<u8,ErrorKind> {
        if (size_of::<u8>() <= self.remaining()) {
            Ok(self.get_u8())
        } else {
            Err(ErrorKind)
        }
    }
    // ...other impls
}

This is efficient and will never break any existing usage of Buf.

I implement this in try_buf

ten3roberts · 2023-05-04T13:55:32Z

The issue with first checking that it is enough, and then reading, is that the reading part can very easily be changed by yourself or someone without updating the above check, causing a possible panic point. It is also quite error prone and unclear by hand calculating the size, E.g; having a literal 4 or 32 and not knowing why, which will be outdated if someone comes aloing and changes a get_u8 to a get_u16 or similar.

In other words, panics become a surprise because some code changed further down, which goes against the general design of Rust of not upholding invariants yourself, E.g parse and not validate, or using types rather than checks to commicate impossibilites and guarantees.

Furthermore, the manual method makes it very hard to validate if the size depends on data that has already been read, like branching code, which requires making each "section" do its own validating and manual calculation of what the code already knowns and encode. Similar in a way to hand calculating stack offsets based on function bodies, which we all know how well that goes as son as there is a branch.

Though this is just my take on the matter when writing networking code and attempting to uphold and remember implicit assumptions as functions may panic otherwise with no clear indication that they will based on signature.

tombanksme · 2023-10-17T21:08:14Z

Hello,
Pretty new to Rust and I just ran into this issue in late 2023; would be great to get these methods into the bytes crate. For now I've added the trait suggested by @wheird-lee into my project.

It's worked a treat, thanks 👍

Happy to help with this where I can

hpenne · 2024-10-22T17:02:50Z

I think this is really needed. If you're decoding a non-trivial protocol that takes untrusted input, then it will be far superior to today's API. The type system will in effect end up enforcing safe and correct behaviour, while today's API is error prone and can lead to code that panics on bad input. You will perhaps loose a little speed, but on the other hand modern optimizing compilers are amazing at removing unnecessary checks, so this might not actually be the case.

This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?

hpenne · 2024-10-25T17:47:11Z

This issue has been open for some time. If I just went ahead and wrote a "TryBuf" as a blanket implementation for Buf, what are the chances that such a PR would get accepted?

I missed the open PR on this. Is there any chance of that getting merged? It looks like just what many of us have wished for. It looks complete at first glance (with the possible exception of tests), but I'll be happy to help out if needed.

caelunshun linked a pull request Jul 30, 2019 that will close this issue

Provide try_get_* methods for Buf #277

Open

carllerche added this to the v0.6 milestone Oct 16, 2020

carllerche removed this from the v0.6 milestone Dec 15, 2020

taiki-e mentioned this issue Apr 24, 2023

Faillably get data from the Buf #537

Closed

ctz mentioned this issue Oct 9, 2023

Has there been consideration to use bytes::Bytes without std feature for no-std support? rustls/rustls#1530

Closed

stackinspector mentioned this issue Dec 25, 2023

Provide APIs or mode that have similar non-panic guarantees to the untrusted crate #645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Provide try_get_* methods #254

Request: Provide try_get_* methods #254

oblique commented Apr 14, 2019 •

edited

Loading

camshaft commented Oct 16, 2020 •

edited

Loading

Matthias247 commented Oct 16, 2020

Ralith commented Oct 17, 2020 •

edited

Loading

carllerche commented Oct 20, 2020

KizzyCode commented Dec 20, 2020 •

edited

Loading

camshaft commented Sep 20, 2021

danieleades commented Oct 19, 2021

danieleades commented Nov 29, 2021

ghost commented Mar 1, 2022

wheird-lee commented Aug 13, 2022 •

edited

Loading

ten3roberts commented May 4, 2023 •

edited

Loading

tombanksme commented Oct 17, 2023

hpenne commented Oct 22, 2024 •

edited

Loading

hpenne commented Oct 25, 2024 •

edited

Loading

Request: Provide try_get_* methods #254

Request: Provide try_get_* methods #254

Comments

oblique commented Apr 14, 2019 • edited Loading

camshaft commented Oct 16, 2020 • edited Loading

advance and copy_to_slice

Change existing APIs

Pros

Cons

Return type

Matthias247 commented Oct 16, 2020

Ralith commented Oct 17, 2020 • edited Loading

carllerche commented Oct 20, 2020

KizzyCode commented Dec 20, 2020 • edited Loading

camshaft commented Sep 20, 2021

danieleades commented Oct 19, 2021

danieleades commented Nov 29, 2021

ghost commented Mar 1, 2022

wheird-lee commented Aug 13, 2022 • edited Loading

ten3roberts commented May 4, 2023 • edited Loading

tombanksme commented Oct 17, 2023

hpenne commented Oct 22, 2024 • edited Loading

hpenne commented Oct 25, 2024 • edited Loading

oblique commented Apr 14, 2019 •

edited

Loading

camshaft commented Oct 16, 2020 •

edited

Loading

`advance` and `copy_to_slice`

Ralith commented Oct 17, 2020 •

edited

Loading

KizzyCode commented Dec 20, 2020 •

edited

Loading

wheird-lee commented Aug 13, 2022 •

edited

Loading

ten3roberts commented May 4, 2023 •

edited

Loading

hpenne commented Oct 22, 2024 •

edited

Loading

hpenne commented Oct 25, 2024 •

edited

Loading