Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint: dangerous use of vec.set_len() #4483

Closed
Shnatsel opened this issue Sep 1, 2019 · 19 comments
Closed

New lint: dangerous use of vec.set_len() #4483

Shnatsel opened this issue Sep 1, 2019 · 19 comments
Labels
A-lint Area: New lints E-medium Call for participation: Medium difficulty level problem and requires some initial experience. L-correctness Lint: Belongs in the correctness lint group

Comments

@Shnatsel
Copy link
Member

Shnatsel commented Sep 1, 2019

I commonly see people call vec.set_len() to grow a vector and then fill the uninitialized data as a slice. This is dangerous because it exposes uninitialized memory to be read and dropped on panic. See Here's My Type, So Initialize Me Maybe for more details - anything that applies to mem::uninitialized also applies to regions exposed via vec.set_len().

Here are some problematic examples:

Creating Vec to create a slice

Wrong way:

let mut v: Vec<u8> = Vec::with_capacity(32);
unsafe {v.set_len(32)}; // definitely UB for most types, possibly UB for u8
write_something_to_slice(v.as_mut_slice());

Right way:

let mut v: Vec<u8> = vec![0; 32];
write_something_to_slice(v.as_mut_slice());

Doing this for any type where not every bit pattern is valid (like bool) or for Drop types is instant UB.

For types with any valid bit pattern it's less clear-cut, but still very dangerous and has already led to security vulnerabilites in real code. It's also a common pattern to pass such slices to std::io::Read::read(), which is unsound.

Note that for u8 and friends the use of vec![0; len] is already encouraged through some lints, see #3237, but there are currently no lints for unsafe variants of the code. Using vec![0; len] and reusing the vector for subsequent operations should be as fast as using uninit, but safe.

Appending to vector while borrowing its contents

Wrong way:

let mut v: Vec<bool> = vec![false, false, true, true];
v.reserve(4);
unsafe {v.set_len(8)};
for i in 0..4 {
    v[i+4] = ! v[i];
}

Right way:

let mut v: Vec<bool> = vec![false, false, true, true];
v.reserve(4);
let v_ptr = v.as_mut_ptr();
unsafe {
    for i in 0..4 {
        std::ptr::write(v_ptr.add(i+4), ! v[i]);
    }
    v.set_len(8);
}

Here's a real-world example of such code, albeit with u8: https://github.com/image-rs/inflate/blob/1a810dba8467fc9cf1e8c60155d3266790072220/src/lib.rs#L663

This is instant UB for types where not all bit patterns are valid. This is also an exploitable security vulnerability if this is done for Drop types, or if *v_ptr = value is used instead of ptr::write(v_ptr, value)

Sadly the wrong pattern is the best available pattern for types such as u8 (at least in the stdlib). Using ptr::write is not needed for such types, and you lose the bounds checks too if you use pointers. I've opened an RFC to address this, but it's not getting much attention.

Other patterns

I'm probably missing something, so please add other problematic examples.

cc @RalfJung

@RalfJung
Copy link
Member

RalfJung commented Sep 1, 2019

unsafe {v.set_len(32)}; // definitely UB for most types, possibly UB for u8

This is not UB. You are not producing a value of said types at this point.
All the stuff about slices and drops and panics still is a problem, though.

@Shnatsel
Copy link
Member Author

Shnatsel commented Sep 1, 2019

So it only becomes UB when we create a slice that points to it, or start reading/writing by index?

@Lokathor
Copy link

Lokathor commented Sep 1, 2019

@RalfJung in the example for "adding to a vector while borrowing its contents", in the "right way" example, isn't that still UB under stacked borrows?

You have a *mut to some vec contents, but then you index the vec (which makes a shared reference), but doesn't that invalidate the *mut from further use?

Or maybe my brain has become too defensive about pointer invalidation

@Manishearth
Copy link
Member

Are we planning to have raw pointer vec APIs so that deferred initialization can be done safely here without requiring us to start with a zeroed buffer?

@Shnatsel
Copy link
Member Author

Shnatsel commented Sep 1, 2019

An abstraction that's like a Vec but over a fixed-size backing storage is sorely needed for cases where passing a fixed-size buffer to be filled. Particularly for Read trait which currently requires unbounded storage (Vec) or bounded initialized storage (slice), but this crops up all over the place.

I'm not aware of any RFCs working in that direction, but there's been some experimentation in https://crates.io/crates/uninit recently. Something very close to that is also implemented in https://crates.io/crates/buffer, but I cannot vouch for the quality of the crate.

@Manishearth
Copy link
Member

Cool, the lint should then suggest using the uninit crate here.

@Manishearth
Copy link
Member

Basically, for patterns like this there's little point to having a lint if there is no alternative for people to switch to.

@Shnatsel
Copy link
Member Author

Shnatsel commented Sep 1, 2019

Until we have a proper solution for all types, can we make a lint asking for proper initialization for generic types, reference types and types where some bit patterns are invalid, like bool or char?

Having uninitialized memory of those types is certainly undefined behavior and almost certainly an exploitable security vulnerability if anything can panic while uninitialized memory is exposed.

...nevermind, that would probably result in false positives on FFI calls like:

let v = Vec::with_capacity(10);
unsafe {
    let used = foreign_call(v.as_ptr(), v.capacity());
    v.set_len(used);
}

😞

@Lokathor
Copy link

Lokathor commented Sep 1, 2019

Actually when i typed that example on my phone earlier I left off that you should absolutely be using
let v: Vec<i32> // or whatever type and not letting inference do anything there.

@RalfJung
Copy link
Member

RalfJung commented Sep 2, 2019

(I'll come back to you after my vacation.)

@RalfJung
Copy link
Member

in the example for "adding to a vector while borrowing its contents", in the "right way" example, isn't that still UB under stacked borrows?

You have a *mut to some vec contents, but then you index the vec (which makes a shared reference), but doesn't that invalidate the *mut from further use?

Looks like the example got changed. But I think that's fine; taking a shared ref to v does not take a shared ref to the heap-allocate buffer backing v.

So it only becomes UB when we create a slice that points to it, or start reading/writing by index?

Basically, yes. Though even reading can be fine if this is a type that permits being uninitialized (like, imagine a Vec<MaybeUninit<u32>>). And writing is fine for non-dropping types.

Having uninitialized memory of those types is certainly undefined behavior and almost certainly an exploitable security vulnerability if anything can panic while uninitialized memory is exposed.

Producing values of such type form uninitialized memory is UB. The thing is, set_len does not produce anything.

Maybe "producing" is not the best terminology either. :/ I adapted that one from the Nomicon. My personal terminology is "typed copy": whenever you do a "typed copy", the value must be "valid" (which for the types you listed means in particular "must not be uninitialized memory"). Typed copies occuring on assignment (=), function call (for arguments) and function return (for the return value).

@nbraud
Copy link

nbraud commented Nov 26, 2019

I ran into that case in rust-secure-code/safety-dance#55 / hucsmn/qbsdiff#3, and it seems to be a somewhat-common pattern.

Could we at least lint on the following?

let v = Vec::with_capacity(n);
// Not doing anything with v before the next call:
unsafe { v.set_len(n); }
// Taking afterwards a slice from v

Longer-term, it might be good to characterise which parts of the Vec API can be used with uninitialised Vecs without resulting in UB, and add a corresponding lint, but this would already be super-useful to have IMO.

@flip1995 flip1995 added L-correctness Lint: Belongs in the correctness lint group E-medium Call for participation: Medium difficulty level problem and requires some initial experience. A-lint Area: New lints labels Nov 26, 2019
@Kixunil
Copy link

Kixunil commented Dec 10, 2019

FYI, the author of uninit seems to be busy, you might want to consider possibly_uninit or some other approach. Also uninit has soundness bug.

@RalfJung
Copy link
Member

@Kixunil is that soundness bug described somewhere? Unfortunately the crates.io page for uninit does not link to a repository / issue tracker.

@Kixunil
Copy link

Kixunil commented Dec 11, 2019

Yep: rust-lang/rust#66699 (comment) casting &mut [T] to &mut [MaybeUninit<T>]

@Shnatsel
Copy link
Member Author

Speaking of which, it would be great to have a lint for casting &mut T to &mut MaybeUninit<T>

@Kixunil
Copy link

Kixunil commented Dec 11, 2019

Good idea, filed #4896

@GnomedDev
Copy link
Contributor

Shouldn't this have been closed by #7681? The first example in the issue description now lints on the playground.

@Manishearth
Copy link
Member

#7682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: New lints E-medium Call for participation: Medium difficulty level problem and requires some initial experience. L-correctness Lint: Belongs in the correctness lint group
Projects
None yet
Development

No branches or pull requests

8 participants