Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame::parse() input type is confusing #41

Open
Shnatsel opened this issue Jul 10, 2018 · 7 comments
Open

Frame::parse() input type is confusing #41

Shnatsel opened this issue Jul 10, 2018 · 7 comments

Comments

@Shnatsel
Copy link
Contributor

Frame::parse() accepts the input to be parsed as the following type: &mut Cursor<Vec<u8>>

First, the use of vector instead of slice is confusing. A slice should be sufficient since the function is not expected to append anything to the input. Unless it has an undocumented side effect, this should be a slice. The use of vector actually creates a superfluous copy in something I'm currently writing.

Second, the use of &mut Cursor is not obvious to me. If the Cursor is there to mark some bytes as already processed, then the function should just accept &mut [u8] and reduce the length of the input slice, like write! macro from the standard library does.

@agalakhov
Copy link
Member

agalakhov commented Jul 10, 2018

It is not that simple. The input comes from a network buffer that has to receive proper feedback. Please take a look at input_buffer crate.

Looks like the proper input type for it is Input from the untrusted crate and not Cursor or slice.

Probably the Frame just should be made completely private to avoid confusion. It is only intended to be reused in implementation that don't want to use the high-level WebSocket for some reason, but it seems that nobody really needs this.

@Shnatsel
Copy link
Contributor Author

FWIW I'm happily using Frame::parse() for fuzzing the frame parser right now. https://github.com/Shnatsel/tungstenite-afl

I've found no crashes or even debug mode panics, but it did autogenerate 15 frames that cover just about every possible path it can take. They're committed to the repo, you might want to use them as unit tests or some such.

I'll see if I can use cargo-fuzz instead, it might or might not fail because of linked C dependencies. If it works, I'll probably open a pull request with several different fuzzing targets. For one, equivalence fuzzing for masking would be interesting - both to verify that naive and optimized implementation return the same result, and that the unsafe code doesn't do horribly wrong things to memory.

@Shnatsel
Copy link
Contributor Author

The input comes from a network buffer that has to receive proper feedback.

After reading the implementation of parse() again I see it doing two things: advancing the cursor forward and rewinding it on any error. The cursor communicates how much data has been actually used.

At no point this function extends the underlying input vector. So at the very least it should not be using &mut Cursor<Vec<u8>>, it should be using &mut Cursor<&[u8]>.

I see you went with &mut Cursor<impl AsRef<[u8]>> in the rewrite, is there any particular reason for that? AFAIK AsRef<[u8]> is implemented only so that compiler would auto-dereference things like Vec<u8> for functions that accept &[u8]; the proper way to specify type here is &mut Cursor<&[u8]> but that would require changes to the API of input_buffer crate, where as_cursor_mut() returns &mut Cursor<Vec<u8>> when it probably should be returning &mut Cursor<&[u8]>.

You could also omit Cursor because you can communicate how much of the input has been consumed by accepting a mutable slice and shrinking it to only contain unprocessed input; I've coded an example from the standard library in Rust playground. But hey, if Cursor is more convenient for whatever reason - go for it. Unlike the vec/slice situation I'm not aware of any cases where mutating a Cursor instead of mutating a slice is a problem.

Looks like the proper input type for it is Input from the untrusted crate and not Cursor or slice.

The problem with exposing types from other crates API of which has not been stabilized (untrusted currently has version 0.6.2) is that when the crate bumps its API version to e.g. 0.7.0 all the types will automatically become incompatible with such types from 0.6.2, requiring a synchronized upgrade throughout the ecosystem - unless special care is taken. So I'd refrain from doing that if possible.

In other news, cargo-fuzz works with tungstenite just fine, so I'll probably open a PR for it I see the function has changed a lot and my fuzzing harness is no longer applicable, I'll rebase it upon devel branch.

@bluetech
Copy link
Contributor

Probably the Frame just should be made completely private to avoid confusion. It is only intended to be reused in implementation that don't want to use the high-level WebSocket for some reason, but it seems that nobody really needs this.

FWIW I agree with this - I think tungstenite should expose public API is there is actual demand for it, the default approach should be to keep the API minimal.

@Shnatsel Very nice work with the fuzzing. I think it'd be even more interesting to fuzz at a higher level than the Frame level, basically fuzzing at the WebSocket level should be possible; this way you catch more code paths. Also, I don't know if Rust works with ubsan, msan etc. but it'd be interesting to fuzz with these enabled as well.

@Shnatsel
Copy link
Contributor Author

Address, memory, leak and thread sanitizers all work. Undefined behavior sanitizer does not, it seems to be too closely coupled with C/C++. See https://github.com/japaric/rust-san

I have left this target to fuzz overnight with address sanitizer just in case, it found no issues. Fuzzing mask.rs under sanitizers could be interesting, I'll see if I can get a harness for that going.

Do you have any specific functions to fuzz in mind? The fuzzing targets are fairly trivial, all they do is accept &[u8] as input and pass it to the function you want to fuzz. I'll also need at least one working input in a file to pass to the fuzzer as the initial sample. That's basically it.

@bluetech
Copy link
Contributor

Assuming Rust is good on its guarantees, the first thing to concentrate on is unsafe code. In tungstenite I believe there 2 places with unsafe (ignoring 3rd party libraries): in mask.rs and in InputBuffer (here).

For the masking code, I think the 3 main things to vary during fuzzing are:

  1. The size of the input buffer.
  2. The alignment of the input buffer.
  3. The alignment of the mask array.

and the result should be compared to what apply_mask_fallback gives.

Afterwards, should concentrate on panics. Integer overflows, out-of-bounds access, etc. For this I would fuzz WebSocket::read_message, write_message and friends.

Note: I personally only use the protocol code from this library so I always somewhat ignore the handshake parts.

@Shnatsel
Copy link
Contributor Author

I've fuzzed WebSocket::read_message in both client and server modes with afl.rs and cargo-fuzz, and found no crashes. cargo-fuzz was also run with address sanitizer.

This is literally the first crate I ever see that didn't turn up at least panics when fuzzed for the first time. Kudos to the authors!

I will contribute the fuzzing harness and the generated inputs to tungstenite repository shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants