Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy for assumptions about the size of usize #1748

Open
durka opened this issue Sep 12, 2016 · 31 comments
Open

Policy for assumptions about the size of usize #1748

durka opened this issue Sep 12, 2016 · 31 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@durka
Copy link
Contributor

durka commented Sep 12, 2016

When in the course of human rusty events, something in core or std depends on the actual width of usize/isize, there are currently (at least) two policies in place:

  1. Conservatively assume that usize may be as narrow as 8 bits.
    • example: usize: From<u8> + !From<u16>
  2. Liberally assume that usize is at least 32 bits wide (as it is on all current officially supported platforms).
    • example: Range<u32>: ExactSizeIterator

Let me know if I missed any other corners of the standard library which make assumptions (identical to one of these or not).

As these policies are in conflict, it seems like one or both of them should be changed. In principle, we can't remove trait implementations from Range<u32> and the like, so we could just declare target_pointer_width-liberalism to be the law of the land. However, this will make it difficult to port Rust to a 16-bit system. In doing such porting, trait implementations like From<u32> for usize and ExactSizeIterator for Range<u32> would need to be gated by a #[cfg]. But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling (N.B. this is already potentially the case, because literals given for enum variants are interpreted as isize literals).

So, what should we do?

@briansmith
Copy link

Let's see if we can narrow the bounds just a little.

  • I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From<usize> for u64 and impl From<isize> for i64.
  • I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From<u16> for usize and impl From<i16> for isize.
  • I propose that there should be a goal, which we don't know how to achieve yet, that libcore and libstd MUST NOT use as for integer conversions, but instead must use only From, Into, TryFrom, and TryInto, etc. for such conversions. The achievement of this goal can then guide the rest of the decision making process.

@durka
Copy link
Contributor Author

durka commented Sep 13, 2016

Makes good sense to me. Those proposals still leave the question of what to do about impl ExactSizeIterator for Range<i32>. Options are:

  • check crater and attempt to phase it out
  • make it conditional on #[cfg(target_pointer_width >= 32)] (pretend that syntax works)
  • leave it in and allow (0..u32::max_value()).len() to panic on 16-bit systems

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 13, 2016
@petrochenkov
Copy link
Contributor

So, what should we do?

Gate impls on target_pointer_width for all currently supported values of target_pointer_width.
When a target with new value of target_pointer_width is added (16 bit, 128 bit, 8 bit, whatever), then new set of cfgs is added as well.

But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling

It would make porting simpler because incorrect range assumptions and overflows will be caught at compile time.

@durka
Copy link
Contributor Author

durka commented Sep 13, 2016

Caught at compile time when you're porting. If we put in #[cfg(target_pointer_width = "64")] impl ExactSizeIterator for Range<u64> {} then people will be confused when they release a crate, someone downloads it on a 32-bit machine, and Iterator::rposition randomly stops working.

@petrochenkov
Copy link
Contributor

petrochenkov commented Sep 14, 2016

@durka
This is a real problem, 32/64 bits are equally common and often ported between, unlike 16-bit used by very specialized hardware now.
@aturon (IIRC) suggested to add a special lint to avoid these 32-bit <-> 64-bit portability problems.

Impls like From<u64> for usize still need to conditionally exist because a lot of software is supposed to run, for example, on very specific 64-bit server hardware under some enterprise Linux and not going to be ported anywhere.

@durka
Copy link
Contributor Author

durka commented Sep 14, 2016

I like the idea of having a lint if an impl is selected that's tagged with #[cfg(target_pointer_width)] (or other target attributes maybe).

@oyvindln
Copy link

I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From for usize and impl From for isize.

I don't know about wider types, but From<u16> for usize sounds reasonable. C99 and newer recommends the closest equivalent (size_t) to be at least 16-bits.C99 Standard (see page 259). I would think a system where usize would be less than 16 bits (as @briansmith noted, a processor being 8-bit doesn't imply usize being that small) would require rather specialised code anyhow.

@comex
Copy link

comex commented Sep 14, 2016

Maybe a set of special purpose lints?

#[allow(assume_usize_ge_32_bits)]
#[allow(assume_usize_le_64_bits)]

The standard library really should provide some way to safely cast under such assumptions, whether From or something else. If it doesn't, most people won't avoid making them; they'll just hide them in as casts, which are evil.

@withoutboats
Copy link
Contributor

I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From for u64 and impl From for i64.

Are we actually confident this is a reasonable assumption over the next 50 years? I guess if it becomes untrue we can make a breaking change.

@aturon
Copy link
Member

aturon commented Sep 27, 2016

Nominated for lang team discussion.

@nikomatsakis
Copy link
Contributor

I wrote up the @rust-lang/lang team discussion in this internals thread.

@petrochenkov
Copy link
Contributor

cc #1868

@SimonSapin
Copy link
Contributor

@SimonSapin
Copy link
Contributor

  1. Conservatively assume that usize may be as narrow as 8 bits.

https://en.wikibooks.org/wiki/C_Programming/stdint.h#Integers_wide_enough_to_hold_pointers claims that uintptr_t is at least 16 bits.

@eternaleye
Copy link

@SimonSapin: I checked the C standards, because the linked page cites the manpage, which might have been overconstrained (both C and POSIX apply constraints to some types and constants).

  • C89 lacks intptr_t entirely
  • C99 section 7.18.2.4, "Limits of integer types capable of holding object pointers"
    • minimum value of pointer-holding signed integer type
      • INTPTR_MIN -(2¹⁵ - 1)
    • maximum value of pointer-holding signed integer type
      • INTPTR_MAX 2¹⁵ - 1
    • maximum value of pointer-holding unsigned integer type
      • UINTPTR_MAX 2¹⁶
  • C11 section 7.20.2.4, "Limits of integer types capable of holding object pointers"
    • minimum value of pointer-holding signed integer type
      • INTPTR_MIN -(2¹⁵ - 1)
    • maximum value of pointer-holding signed integer type
      • INTPTR_MAX 2¹⁵ - 1
    • maximum value of pointer-holding unsigned integer type
      • UINTPTR_MAX 2¹⁶

So yes, C's uintptr_t is at least 16 bits, as is its intptr_t. (Though it is legal for it to be unable to represent -2¹⁵, this is presumably as a concession to one's-complement machines, which I don't think Rust supports anyway.)

@SimonSapin
Copy link
Contributor

PR rust-lang/rust#49305 includes:

@scottjmaddox
Copy link

Perhaps all From andTryFrom impl's could be conditionally compiled with #[cfg(target_pointer_width=*)], and then some mechanism could be added to cargo check that verifies type checking for the desired supported pointer widths, as configured in Cargo.toml (and defaulting to 16, 32, and 64 bit)?

Making this work (or at least work efficiently) might require an extension to rustc, in order to override the target pointer width during a check pass.

@briansmith
Copy link

A possible way forward:

Define some new submodules, e.g. std::arch::at_least_32_bits, std::arch::at_most_64_bits. These modules would define the implementations of the u32 -> usize and usize <- u64 conversions. A program that needs these conversions must explicitly import those modules to get them. Those modules aren't available when the target platform doesn't meet the requirements for them. When compiling a crate that makes assumptions about conversions to/from usize, on a target for which those assumptions are invalid, the build will fail pointing directly to the use std::arch::at_least_32_bits; or use std::arch::at_most_64_bits; (or whatever) statements, which will make it obvious what the problem is.

No new language features would be required.

@durka
Copy link
Contributor Author

durka commented Dec 13, 2018

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

@briansmith
Copy link

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

Keep in mind that those modules wouldn't exist for targets that don't meet the limits.

@briansmith
Copy link

Oh, I see, you're saying that the conversions would still be possible even if the program didn't have the use statements. That's right. :(

@durka
Copy link
Contributor Author

durka commented Dec 13, 2018 via email

@briansmith
Copy link

I see that libc::size_t is defined as type size_t = usize; which allows implicit conversions between size_t and usize, which is an even bigger hazard than explicit conversions between usize and size_t. it's been argued that usize is defined to be equivalent to uintptr_t and not necessarily equivalent to size_t. I think we should have impl From<libc::size_t> for usize and impl From<usize> for libc::uintptr_t at least. However, I think we also need at least impl From<usize> for libc::size_t which, in the case where usize is larger than size_t, somehow knows how to truncate a usize that actually represents a size (vs one that represents a pointer) to a size_t losslessly.

Also note that there are attempts to define a "maximum object size" and so far many people have suggested that isize::max_value() or usize::max_value() are appropriate limits there. That would usually be incorrect in the case where uintptr_t is larger than size_t. Probably such limits need to be defined relative to ssize_t and size_t.

@SimonSapin
Copy link
Contributor

type size_t = usize; which allows implicit conversions between size_t and usize

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

@briansmith
Copy link

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

You and I are saying the same thing in different ways. The point is that this works for most, but not all, platforms:

fn foo(n: usize) -> libc::size_t { n }

In rust-lang/unsafe-code-guidelines#99 at least one person claimed that that code isn't guaranteed to work for all targets because sometimes size_t will not be an alias for usize. That we can use usize interchangeably with libc::size_t on some platforms but not every platforms is in conflict with the trend of the discussion in this issue above, where we don't even allow explicit conversions Into/From usize unless the conversion would work on every platform. It doesn't seem right that we are rejecting some explicit conversions to/from usize while refusing to provide similar explicit conversions. We should find some way to resolve that inconsistency. My preferred way of removing the consistency is to drop the requirement that usize is the same as uintptr_t and instead require usize is the same as size_t, which is a breaking change that's unlikely to happen. A more realistic change would be to replace type size_t = usize; with #[repr(transparent)] struct size_t(usize); in a new major version of libc.

@SimonSapin
Copy link
Contributor

sometimes size_t will not be an alias for usize

I agree that this is incompatible with the way the libc crate is currently defined.

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

@briansmith
Copy link

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

A 64-bit CHERI-based platform will have 256-bit or 128-bit pointers and 64-bit usize. Pointers are a composite of security information and the address. Similarly, any ABI that requires pointers to be represented as (&[T], size_t i) or equivalent would have uintptr_t different than usize.

(Also potentially the ordering of uintptr_t and usize is different for the same bit pattern even when they are the same size, because some new security technologies put authentication information in the high bits of pointers.)

I am particularly interested in Rust supporting these security-oriented ABIs in the future as they become practical.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 25, 2019

@briansmith

Note that we can only control the maximum allowed size of Rust objects (repr(Rust)). The maximum allowed size of C objects, which repr(C) types have to respect, is fixed by the C platform, and is outside our control.

That would usually be incorrect in the case where uintptr_t is larger than size_t.

AFAICT this would only mean that the maximum allowed size of repr(Rust) values can be greater or equal to the maximum allowed size of repr(C) values, which is perfectly fine. So what do you mean by "incorrect" ?

@briansmith
Copy link

So what do you mean by "incorrect" ?

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 25, 2019

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

The exact same can be argued of 2**64 - 1, right? AFAICT these limits only matter if they are small enough for normal Rust code to run into them (e.g. on 8, 16, 32 bit platforms). Once the limits become high enough (e.g. 48-bit or larger), do they still matter ? For example, there is unsafe code in std that ensures that these limits aren't reached on 32-bit platforms, but for 64-bit targets it is essentially dead-code that will never be reached in practice (EDIT: not only essentially, libstd just assumes it does not happen: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L735).

@daira
Copy link

daira commented Jan 6, 2021

I propose that Rust code that is targetting std (i.e. does not use #![no_std]) should be able to assume that usize is at least 32 bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests