Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default_union_representation lint #8289

Merged
merged 1 commit into from
Jan 29, 2022

Conversation

jubnzv
Copy link
Contributor

@jubnzv jubnzv commented Jan 15, 2022

Closes #8235

changelog: Added a new lint [default_union_representation]

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @camsteffen (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jan 15, 2022
@camsteffen
Copy link
Contributor

I think if we have a lint that unilaterally enforces repr(C) for unions (or other types), it would have to be restriction.

@jubnzv
Copy link
Contributor Author

jubnzv commented Jan 15, 2022

Well, this lint is implemented as suggested in #8235. It enforces the user to explicitly set any #[repr] attribute for the unions, to make sure he understands their memory layout.

I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)] on each union which have the default layout, because it is not possible to explicitly set #[repr(Rust) in the code. But how can it be implemented better?

@camsteffen
Copy link
Contributor

I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)] on each union which have the default layout

Agreed.

But how can it be implemented better?

Other than making it a restriction lint, I don't know.

Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work. Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?

@jubnzv
Copy link
Contributor Author

jubnzv commented Jan 16, 2022

Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?

Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?

Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work.

I think, this lint could be useful in two kinds of projects:

  • Embedded systems, in particular, the implementation of various communication protocols. If they use structures to describe packages, that are transmitted in the binary format, we always need to know their exact layout.
  • Something with FFI and interaction with C code.

Most likely, in other projects, this lint will almost always be globally suppressed.

I haven't written such things on Rust yet, so this is just my guess. Probably, @5225225 can provide a better rationale for this.

@5225225
Copy link
Contributor

5225225 commented Jan 16, 2022

My reasoning behind it is I see people assuming that unions are an okay way to transmute between two types, even without a #[repr(C)] annotation.

Which is wrong, you must never read from a variant that hasn't been written to at least once (and if your type has validity invariants, this statement might be stronger, you must never read from a variant that isn't the most recently written one, if the two variants overlap but not fully).

Though now that I look at it, simply warning on all unions that have no #[repr(C)] might be wrong (Since that would call MaybeUninit and other unions like it wrong).

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants. Since that would allow the definition of MaybeUninit, but not allow

union FloatOrInt {
    a: f64,
    b: u64,
}

(which is not correct)

One bug that this lint would have caught is https://github.com/ImageOptim/libimagequant/blob/e99d92ad9abc08c8c912f8d3dc19bc054afb468e/src/hist.rs#L78-L82 , where the union here must be laid out overlapping, and if you don't, it's UB.

@camsteffen
Copy link
Contributor

camsteffen commented Jan 16, 2022

I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.

I think you mean "fields" instead of "variants"? This would eliminate some cases from the lint, but largely the problem still remains.

Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?

In general the ones labelled good-first-issue are good (but I know you already tried that 😄). Some specific ones that look good to me: #8282, #8212, #8215.

Maybe others will have a different view. @rust-lang/clippy?

@5225225
Copy link
Contributor

5225225 commented Jan 16, 2022

I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.

Well, there could be something wrong with the code. It depends how the union is used.

In the PR above for libimagequant, there was definitely a bug there. I'm not sure clippy is powerful enough to determine stuff like "Oh, you wrote to this field of the union, and then you read from a different one, that's UB" (especially if it was cross function, as it might be).

It could maybe detect union fields that are never written to/have a pointer taken, but are read from? That might catch some cases, but feels like it might have too many false negatives.

And in any case, there's plenty of existing lints for "there might not be something wrong with this code, but there probably is something wrong here.", as I understand it, that's what suspicious is for. Though this lint might belong in pedantic.

Searching down the list of non-repr(C) unions found by doing https://github.com/search?o=desc&p=1&q=%22pub+union%22+language%3Arust&s=indexed&type=Code , it does find a few cases that are definitely unsound, and some that are probably fine

Non-repr(C) unions that have multiple non-ZST fields are rare, but I found these 4 cases

  1. https://github.com/Pathal/mips_rust_int/blob/e90d86d3979c83a2907bd993954989388678592a/src/register.rs#L267
  2. https://github.com/unic0rn9k/slas/blob/006d2184dcd2b05330e8872358f5c221c411c92e/src/lib.rs#L185
  3. https://github.com/dgoodlad/crabdac/blob/bd51ea819e8b163cc6e4efff223907598bb93cc6/stm32f4/src/stm32f413/mod.rs#L111
  4. https://github.com/mathias234/gray/blob/0db68d62288a5ef087eced535ff504cd9ed53425/crates/gray/src/compiler/compiler.rs#L41

1 is definitely unsound, simply create a register with the int field init, and then call .get_f32()
2 looks sound, it's externally tagged as to what variant is active. So this would be a false positive.
3.... looks unsound but not for this reason, they're using a non-#[repr(C)] thing in what looks to be FFI. So.... half positive? We're right to lint here, but for the wrong reasons.
4 looks like a similar case to 2, externally tagged.

So 1.5/4 on being correct, but also this lint really wouldn't fire that often. 2.5/5 if we count the issue I fixed.

@jubnzv
Copy link
Contributor Author

jubnzv commented Jan 17, 2022

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.

I agree with this point. Unions with a single non-ZST field have layout the same as the layout of that field. I updated PR to make the lint only checks unions with at least two non-ZST fields.

I also improved the lint to make it warn about any repr other than C, because we don't actually know the layout of the union, if it uses the default (#[repr(Rust)]) layout.

In general the ones labelled good-first-issue are good (but I know you already tried that smile). Some specific ones that look good to me:

Thank you! I managed to localize #8282 in the Clippy code. It takes some time to fix it, because I'm new to the rustc internals.

@5225225
Copy link
Contributor

5225225 commented Jan 17, 2022

By the way, the unsafe code guidelines is not what is promised.

It's what they think should be promised.

It says this both at the top of that page, and in the introduction.

It is currently legal for a compiler to add padding into a single field type of any kind, including unions.

@camsteffen camsteffen added the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label Jan 17, 2022
@5225225
Copy link
Contributor

5225225 commented Jan 19, 2022

I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)

granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.

@flip1995
Copy link
Member

Maybe others will have a different view.

I didn't read the whole thread, but I see parallels with the recently discussed non_send_fields_in_send_ty lint. You have to write unsafe code in order to do the thing the lint complains about.

In order to get the UB described in the lint description, you have to read from the union. And in order to do that you have to write unsafe code, which already is a barrier.

So all this lint would do is pointing to the documentation of unions in Rust. I agree with @camsteffen that this is not really the point of lints. If we would add this, it would have to be restriction, because having a union without a repr attr is totally fine by itself, which means that this lint would restrict the language.

@camsteffen camsteffen removed the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label Jan 19, 2022
@camsteffen
Copy link
Contributor

which means that this lint would restrict the language.

Oh! 🤯

@5225225
Copy link
Contributor

5225225 commented Jan 19, 2022

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

And you probably wouldn't have written the bugs if you knew to look out for them, and in order to look out for them using this lint if it was in restriction, you'd have to explicitly name this lint. (As in, the only case the lint would be useful is if you're in a union heavy crate and you know you messed this up in the past, but even then, simply being aware of it makes it pretty easy to spot)

After all, other lints for things that are often perfectly sound but might indicate a bug exist in pedantic, like cast_ptr_alignment

@camsteffen
Copy link
Contributor

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

Soundness bugs are indeed serious bugs. But severity is not enough. It has to be a likely bug in order to be pedantic or suspicious. To say that union by itself is "likely a bug" is a far reach.

We get a lot of opportunities to compromise on "drawing the line" like this. If we don't take a hard stance, Clippy gets very noisy very quickly.

And you probably wouldn't have written the bugs if you knew to look out for them

This is what documentations is for, not lints. Lints are for telling you when you've already made a detectable mistake. And like flip1995 said the coder is already given a warning with unsafe blocks.

After all, other lints for things that are often perfectly sound but might indicate a bug exist in pedantic, like cast_ptr_alignment

That lint detects a code pattern that is suspicious in its own right. But if the lint isn't correct at least for a majority of cases (preferably a very large majority) in practice, then it shouldn't be pedantic or suspicious.

@5225225
Copy link
Contributor

5225225 commented Jan 20, 2022

Okay, suggestion:

We land this lint as-is, in restriction. Still only warning on more likely to be incorrect usages (not linting for literally "you used a union", but a union with multiple non-ZST fields which aren't forced to overlap by a repr(C))

And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)

(Say, you make a union specifying field a, you pass a pointer to field b to some FFI function, and then you read field c, all in the same fn. Or you have a union that had any field that is not public, is never written to/has an offset taken/has a pointer taken, but is read from)


And in the mean time, I go fix the union docs warning people about this, because their example code has UB, and is doing the exact thing this lint complains about.

@flip1995
Copy link
Member

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

My issue with this lint is, that it doesn't find unsoundness bugs, but rather tells you that there might be or might not be unsoundness in your code. It doesn't even tell you where there might be unsoundness, it just spells out the documentation for you again and sends you off on a journey to find the unsoundness yourself.

And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)

That would also be my suggestion. We shouldn't lint on the union type, but on the union read. Only linting on something that is "definitely UB" might be very hard or even impossible (if it were possible, shouldn't the compiler error on this?). But I can see a lint in pedantic that lints all reads of non-repr(C) unions.

@jubnzv
Copy link
Contributor Author

jubnzv commented Jan 21, 2022

I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)
granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.

Thank you, that's a good idea. I added tests for #[repr(transparent)].

If we would add this, it would have to be restriction, because having a union without a repr attr is totally fine by itself, which means that this lint would restrict the language.

Well, that sounds reasonable, because we don't actually check if we the use of these unions in unsafe blocks. I updated the level of the lint to restriction.

The only thing that confuses me, is why the #[repr(Rust)] is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?

@jubnzv jubnzv force-pushed the unspecified-layout-union branch from 9de9caa to 75c919d Compare January 21, 2022 04:00
@camsteffen
Copy link
Contributor

Okay so I think we're on the same page that this lint would have to be restriction. But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense. But rather the motivation seems to be "let's make an attempt to catch unsoundness bugs". And we know that restriction lints are not so good for catching bugs. The former motivation sounds plausible, but I don't want to blindly assume that use case exists.

The only thing that confuses me, is why the #[repr(Rust)] is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?

That confused me too so it should probably be clarified. :) But yes I'm pretty sure it's intentional - it's like pseudo code for documentation purposes only.

@jubnzv
Copy link
Contributor Author

jubnzv commented Jan 26, 2022

But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense.

I see the motivation for this lint as: "I want to ensure that all unions have a defined layout, so I can't get unexpected behavior in unsafe blocks". We have already seen the examples of potential errors that could be fixed with this lint.

@flip1995
Copy link
Member

I have a pretty low bar for restriction lints like this. However, the lint documentation should not claim, that it finds unsoundness bugs, but rather that it enforces a repr for unions (which side effect may be to fix unsoundness bugs). I think the current documentation conveys this quite well.

If some people want a lint that restricts the language for them, like this lint does AND the lint helps with finding real world problems, I think it is fine to include it in Clippy.

Copy link
Contributor

@camsteffen camsteffen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead with this then. Just some minor feedback.

clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
clippy_lints/src/unspecified_layout_union.rs Outdated Show resolved Hide resolved
}
}

fn has_c_repr_attr(cx: &LateContext<'_>, hir_id: HirId) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lint should allow any repr, not just repr(C).

Copy link
Contributor Author

@jubnzv jubnzv Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the union has some alignment attribute (like align or packed), it still has no any guarantees about the layout, if the default Rust representation is used. I think, those cases more likely will contain bugs, because the user may want to set a specific layout using only these attributes.

Maybe we should leave this as it is in order to find such errors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Let's just add a comment somewhere that C is the only non-default representation for unions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the following description:

What it does

Displays a warning when a union is declared with the default representation (without a #[repr(C)] attribute)

Should we explain this in more detail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay but you're welcome to expand that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure how best to expand this so as not to annoy the user with unnecessary repetitions. I think, it's enough, because if the user wants to restrict the language, they at least understand which layouts are possible in Rust.

@jubnzv jubnzv changed the title Add unspecified_layout_union lint Add default_union_representation lint Jan 26, 2022
Copy link
Contributor

@camsteffen camsteffen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please squash commits.

@jubnzv jubnzv force-pushed the unspecified-layout-union branch from c07dfd8 to b7000b2 Compare January 29, 2022 03:59
@camsteffen
Copy link
Contributor

camsteffen commented Jan 29, 2022

@jubnzv and @5225225 thank you for your patience as we worked through this.

@bors r+

(edit: oops I had tagged the wrong person)

@bors
Copy link
Contributor

bors commented Jan 29, 2022

📌 Commit b7000b2 has been approved by camsteffen

@bors
Copy link
Contributor

bors commented Jan 29, 2022

⌛ Testing commit b7000b2 with merge 7ceffde...

@bors
Copy link
Contributor

bors commented Jan 29, 2022

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: camsteffen
Pushing 7ceffde to master...

@bors bors merged commit 7ceffde into rust-lang:master Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn on unions with no #[repr] attribute
6 participants