Policy for assumptions about the size of `usize` #1748

durka · 2016-09-12T20:27:51Z

When in the course of ~~human~~ rusty events, something in core or std depends on the actual width of usize/isize, there are currently (at least) two policies in place:

Conservatively assume that usize may be as narrow as 8 bits.
- example: usize: From<u8> + !From<u16>
Liberally assume that usize is at least 32 bits wide (as it is on all current officially supported platforms).
- example: Range<u32>: ExactSizeIterator

Let me know if I missed any other corners of the standard library which make assumptions (identical to one of these or not).

As these policies are in conflict, it seems like one or both of them should be changed. In principle, we can't remove trait implementations from Range<u32> and the like, so we could just declare target_pointer_width-liberalism to be the law of the land. However, this will make it difficult to port Rust to a 16-bit system. In doing such porting, trait implementations like From<u32> for usize and ExactSizeIterator for Range<u32> would need to be gated by a #[cfg]. But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling (N.B. this is already potentially the case, because literals given for enum variants are interpreted as isize literals).

So, what should we do?

The text was updated successfully, but these errors were encountered:

briansmith · 2016-09-13T19:58:08Z

Let's see if we can narrow the bounds just a little.

I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From<usize> for u64 and impl From<isize> for i64.
I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From<u16> for usize and impl From<i16> for isize.
I propose that there should be a goal, which we don't know how to achieve yet, that libcore and libstd MUST NOT use as for integer conversions, but instead must use only From, Into, TryFrom, and TryInto, etc. for such conversions. The achievement of this goal can then guide the rest of the decision making process.

durka · 2016-09-13T21:52:08Z

Makes good sense to me. Those proposals still leave the question of what to do about impl ExactSizeIterator for Range<i32>. Options are:

check crater and attempt to phase it out
make it conditional on #[cfg(target_pointer_width >= 32)] (pretend that syntax works)
leave it in and allow (0..u32::max_value()).len() to panic on 16-bit systems

petrochenkov · 2016-09-13T22:01:04Z

So, what should we do?

Gate impls on target_pointer_width for all currently supported values of target_pointer_width.
When a target with new value of target_pointer_width is added (16 bit, 128 bit, 8 bit, whatever), then new set of cfgs is added as well.

But, this would make it difficult to port Rust code from, say, a 32-bit target to a 16-bit target, because some code would stop compiling

It would make porting simpler because incorrect range assumptions and overflows will be caught at compile time.

durka · 2016-09-13T22:20:02Z

Caught at compile time when you're porting. If we put in #[cfg(target_pointer_width = "64")] impl ExactSizeIterator for Range<u64> {} then people will be confused when they release a crate, someone downloads it on a 32-bit machine, and Iterator::rposition randomly stops working.

petrochenkov · 2016-09-14T07:00:07Z

@durka
This is a real problem, 32/64 bits are equally common and often ported between, unlike 16-bit used by very specialized hardware now.
@aturon (IIRC) suggested to add a special lint to avoid these 32-bit <-> 64-bit portability problems.

Impls like From<u64> for usize still need to conditionally exist because a lot of software is supposed to run, for example, on very specific 64-bit server hardware under some enterprise Linux and not going to be ported anywhere.

durka · 2016-09-14T15:31:49Z

I like the idea of having a lint if an impl is selected that's tagged with #[cfg(target_pointer_width)] (or other target attributes maybe).

oyvindln · 2016-09-14T18:52:07Z

I propose that we at least assume that usize/isize are no smaller than u16/i16. Note that this is true, in particular, for 8-bit AVR (Arduino). This implies that we should impl From for usize and impl From for isize.

I don't know about wider types, but From<u16> for usize sounds reasonable. C99 and newer recommends the closest equivalent (size_t) to be at least 16-bits.C99 Standard (see page 259). I would think a system where usize would be less than 16 bits (as @briansmith noted, a processor being 8-bit doesn't imply usize being that small) would require rather specialised code anyhow.

comex · 2016-09-14T20:15:04Z

Maybe a set of special purpose lints?

#[allow(assume_usize_ge_32_bits)]
#[allow(assume_usize_le_64_bits)]

The standard library really should provide some way to safely cast under such assumptions, whether From or something else. If it doesn't, most people won't avoid making them; they'll just hide them in as casts, which are evil.

withoutboats · 2016-09-23T03:45:31Z

I propose that we at least assume that usize/isize are no larger than u64/i64. This implies that we should impl From for u64 and impl From for i64.

Are we actually confident this is a reasonable assumption over the next 50 years? I guess if it becomes untrue we can make a breaking change.

aturon · 2016-09-27T22:46:41Z

Nominated for lang team discussion.

nikomatsakis · 2016-10-07T12:05:29Z

I wrote up the @rust-lang/lang team discussion in this internals thread.

petrochenkov · 2017-02-19T21:12:02Z

cc #1868

SimonSapin · 2017-07-08T18:31:57Z

CC rust-lang/rust#43086 (comment)

SimonSapin · 2017-07-08T19:31:00Z

Conservatively assume that usize may be as narrow as 8 bits.

https://en.wikibooks.org/wiki/C_Programming/stdint.h#Integers_wide_enough_to_hold_pointers claims that uintptr_t is at least 16 bits.

eternaleye · 2017-07-09T05:57:49Z

@SimonSapin: I checked the C standards, because the linked page cites the manpage, which might have been overconstrained (both C and POSIX apply constraints to some types and constants).

C89 lacks intptr_t entirely
C99 section 7.18.2.4, "Limits of integer types capable of holding object pointers"
- minimum value of pointer-holding signed integer type
  - INTPTR_MIN -(2¹⁵ - 1)
- maximum value of pointer-holding signed integer type
  - INTPTR_MAX 2¹⁵ - 1
- maximum value of pointer-holding unsigned integer type
  - UINTPTR_MAX 2¹⁶
C11 section 7.20.2.4, "Limits of integer types capable of holding object pointers"
- minimum value of pointer-holding signed integer type
  - INTPTR_MIN -(2¹⁵ - 1)
- maximum value of pointer-holding signed integer type
  - INTPTR_MAX 2¹⁵ - 1
- maximum value of pointer-holding unsigned integer type
  - UINTPTR_MAX 2¹⁶

So yes, C's uintptr_t is at least 16 bits, as is its intptr_t. (Though it is legal for it to be unable to represent -2¹⁵, this is presumably as a concession to one's-complement machines, which I don't think Rust supports anyway.)

SimonSapin · 2018-03-28T11:08:34Z

PR rust-lang/rust#49305 includes:

Addition of a couple From impls that assume that usize and isize are always at least 16 bits, on the basis that Rust doesn’t need to be more portable than C99.
Removal of fallible TryFrom that could be infallible From impls on only some platforms, with a portability lint. Adding these impls back (one way or another) is tracked at Tracking issue for platform-dependent-API TryFrom impls involving usize/isize rust#49415

scottjmaddox · 2018-12-13T03:05:00Z

Perhaps all From andTryFrom impl's could be conditionally compiled with #[cfg(target_pointer_width=*)], and then some mechanism could be added to cargo check that verifies type checking for the desired supported pointer widths, as configured in Cargo.toml (and defaulting to 16, 32, and 64 bit)?

Making this work (or at least work efficiently) might require an extension to rustc, in order to override the target pointer width during a check pass.

briansmith · 2018-12-13T03:26:20Z

A possible way forward:

Define some new submodules, e.g. std::arch::at_least_32_bits, std::arch::at_most_64_bits. These modules would define the implementations of the u32 -> usize and usize <- u64 conversions. A program that needs these conversions must explicitly import those modules to get them. Those modules aren't available when the target platform doesn't meet the requirements for them. When compiling a crate that makes assumptions about conversions to/from usize, on a target for which those assumptions are invalid, the build will fail pointing directly to the use std::arch::at_least_32_bits; or use std::arch::at_most_64_bits; (or whatever) statements, which will make it obvious what the problem is.

No new language features would be required.

durka · 2018-12-13T04:47:14Z

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

briansmith · 2018-12-13T05:10:58Z

Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go.

Keep in mind that those modules wouldn't exist for targets that don't meet the limits.

briansmith · 2018-12-13T05:12:18Z

Oh, I see, you're saying that the conversions would still be possible even if the program didn't have the use statements. That's right. :(

durka · 2018-12-13T05:25:30Z

But when they do there's no way to enforce the requirement to import them. The impls are visible regardless. I can't think of a way to do this with imports, but maybe there is some hack with generics and specialization or something.

…

On Thu, Dec 13, 2018 at 12:11 AM Brian Smith ***@***.***> wrote: Unfortunately, the idea doesn't work because impls don't respect module scope like that. A portability lint is the way to go. Keep in mind that those modules *wouldn't exist* for targets that don't meet the limits. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1748 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC3n463tqs3icNe7eB3OyT_HZLI1z5yks5u4eFrgaJpZM4J6_rn> .

briansmith · 2019-03-21T14:31:24Z

I see that libc::size_t is defined as type size_t = usize; which allows implicit conversions between size_t and usize, which is an even bigger hazard than explicit conversions between usize and size_t. it's been argued that usize is defined to be equivalent to uintptr_t and not necessarily equivalent to size_t. I think we should have impl From<libc::size_t> for usize and impl From<usize> for libc::uintptr_t at least. However, I think we also need at least impl From<usize> for libc::size_t which, in the case where usize is larger than size_t, somehow knows how to truncate a usize that actually represents a size (vs one that represents a pointer) to a size_t losslessly.

Also note that there are attempts to define a "maximum object size" and so far many people have suggested that isize::max_value() or usize::max_value() are appropriate limits there. That would usually be incorrect in the case where uintptr_t is larger than size_t. Probably such limits need to be defined relative to ssize_t and size_t.

SimonSapin · 2019-03-21T19:02:43Z

type size_t = usize; which allows implicit conversions between size_t and usize

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

briansmith · 2019-03-21T19:21:09Z

There is no conversion here, even implicit. A type item gives another name to a type. The two names refer to the same type. As far as I know there is no difference with a pub use reexport.

You and I are saying the same thing in different ways. The point is that this works for most, but not all, platforms:

fn foo(n: usize) -> libc::size_t { n }

In rust-lang/unsafe-code-guidelines#99 at least one person claimed that that code isn't guaranteed to work for all targets because sometimes size_t will not be an alias for usize. That we can use usize interchangeably with libc::size_t on some platforms but not every platforms is in conflict with the trend of the discussion in this issue above, where we don't even allow explicit conversions Into/From usize unless the conversion would work on every platform. It doesn't seem right that we are rejecting some explicit conversions to/from usize while refusing to provide similar explicit conversions. We should find some way to resolve that inconsistency. My preferred way of removing the consistency is to drop the requirement that usize is the same as uintptr_t and instead require usize is the same as size_t, which is a breaking change that's unlikely to happen. A more realistic change would be to replace type size_t = usize; with #[repr(transparent)] struct size_t(usize); in a new major version of libc.

SimonSapin · 2019-03-21T19:25:11Z

sometimes size_t will not be an alias for usize

I agree that this is incompatible with the way the libc crate is currently defined.

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

briansmith · 2019-03-21T19:35:33Z

(This is somewhat besides the point, but what are some platforms where size_t is not uintptr_t?)

A 64-bit CHERI-based platform will have 256-bit or 128-bit pointers and 64-bit usize. Pointers are a composite of security information and the address. Similarly, any ABI that requires pointers to be represented as (&[T], size_t i) or equivalent would have uintptr_t different than usize.

(Also potentially the ordering of uintptr_t and usize is different for the same bit pattern even when they are the same size, because some new security technologies put authentication information in the high bits of pointers.)

I am particularly interested in Rust supporting these security-oriented ABIs in the future as they become practical.

gnzlbg · 2019-03-25T09:06:51Z

@briansmith

Note that we can only control the maximum allowed size of Rust objects (repr(Rust)). The maximum allowed size of C objects, which repr(C) types have to respect, is fixed by the C platform, and is outside our control.

That would usually be incorrect in the case where uintptr_t is larger than size_t.

AFAICT this would only mean that the maximum allowed size of repr(Rust) values can be greater or equal to the maximum allowed size of repr(C) values, which is perfectly fine. So what do you mean by "incorrect" ?

briansmith · 2019-03-25T17:40:14Z

So what do you mean by "incorrect" ?

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

gnzlbg · 2019-03-25T17:56:40Z

Sure, in theory you could define the maximum object size to be 2**256 - 1 bytes if you want (if uintptr_t is 256 bits). But I doubt anybody wants that.

The exact same can be argued of 2**64 - 1, right? AFAICT these limits only matter if they are small enough for normal Rust code to run into them (e.g. on 8, 16, 32 bit platforms). Once the limits become high enough (e.g. 48-bit or larger), do they still matter ? For example, there is unsafe code in std that ensures that these limits aren't reached on 32-bit platforms, but for 64-bit targets it is essentially dead-code that will never be reached in practice (EDIT: not only essentially, libstd just assumes it does not happen: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L735).

daira · 2021-01-06T17:51:27Z

I propose that Rust code that is targetting std (i.e. does not use #![no_std]) should be able to assume that usize is at least 32 bits.

durka mentioned this issue Sep 12, 2016

RangeInclusive<usize> shouldn't impl ExactSizeIterator rust-lang/rust#36386

Closed

nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 13, 2016

aturon added the I-nominated label Sep 27, 2016

nikomatsakis removed the I-nominated label Oct 6, 2016

durka mentioned this issue Jul 8, 2017

Stabilize the inclusive_range lib feature rust-lang/rust#43086

Closed

durka mentioned this issue Jul 19, 2017

time::Instant is a different size of different platforms rust-lang/rust#43332

Closed

pitdicker mentioned this issue Dec 14, 2017

Select "SimpleRand" over generic "Rand" for backwards compatibility dhardy/rand#71

Merged

durka mentioned this issue Mar 1, 2018

Document minimum size for usize and isize rust-lang/rust#48593

Open

SimonSapin mentioned this issue Mar 28, 2018

Tracking issue for platform-dependent-API TryFrom impls involving usize/isize rust-lang/rust#49415

Closed

briansmith mentioned this issue Mar 21, 2019

Are raw pointers to sized types usable in C FFI ? rust-lang/unsafe-code-guidelines#99

Closed

nw0 mentioned this issue Oct 16, 2019

Support index size != pointer width rust-lang/rust#65473

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy for assumptions about the size of `usize` #1748

Policy for assumptions about the size of `usize` #1748

durka commented Sep 12, 2016 •

edited

Loading

briansmith commented Sep 13, 2016

durka commented Sep 13, 2016

petrochenkov commented Sep 13, 2016

durka commented Sep 13, 2016

petrochenkov commented Sep 14, 2016 •

edited

Loading

durka commented Sep 14, 2016

oyvindln commented Sep 14, 2016

comex commented Sep 14, 2016

withoutboats commented Sep 23, 2016

aturon commented Sep 27, 2016

nikomatsakis commented Oct 7, 2016

petrochenkov commented Feb 19, 2017

SimonSapin commented Jul 8, 2017

SimonSapin commented Jul 8, 2017

eternaleye commented Jul 9, 2017

SimonSapin commented Mar 28, 2018

scottjmaddox commented Dec 13, 2018

briansmith commented Dec 13, 2018

durka commented Dec 13, 2018

briansmith commented Dec 13, 2018

briansmith commented Dec 13, 2018

durka commented Dec 13, 2018 via email

briansmith commented Mar 21, 2019

SimonSapin commented Mar 21, 2019

briansmith commented Mar 21, 2019

SimonSapin commented Mar 21, 2019

briansmith commented Mar 21, 2019

gnzlbg commented Mar 25, 2019

briansmith commented Mar 25, 2019

gnzlbg commented Mar 25, 2019 •

edited

Loading

daira commented Jan 6, 2021 •

edited

Loading

Policy for assumptions about the size of usize #1748

Policy for assumptions about the size of usize #1748

Comments

durka commented Sep 12, 2016 • edited Loading

briansmith commented Sep 13, 2016

durka commented Sep 13, 2016

petrochenkov commented Sep 13, 2016

durka commented Sep 13, 2016

petrochenkov commented Sep 14, 2016 • edited Loading

durka commented Sep 14, 2016

oyvindln commented Sep 14, 2016

comex commented Sep 14, 2016

withoutboats commented Sep 23, 2016

aturon commented Sep 27, 2016

nikomatsakis commented Oct 7, 2016

petrochenkov commented Feb 19, 2017

SimonSapin commented Jul 8, 2017

SimonSapin commented Jul 8, 2017

eternaleye commented Jul 9, 2017

SimonSapin commented Mar 28, 2018

scottjmaddox commented Dec 13, 2018

briansmith commented Dec 13, 2018

durka commented Dec 13, 2018

briansmith commented Dec 13, 2018

briansmith commented Dec 13, 2018

durka commented Dec 13, 2018 via email

briansmith commented Mar 21, 2019

SimonSapin commented Mar 21, 2019

briansmith commented Mar 21, 2019

SimonSapin commented Mar 21, 2019

briansmith commented Mar 21, 2019

gnzlbg commented Mar 25, 2019

briansmith commented Mar 25, 2019

gnzlbg commented Mar 25, 2019 • edited Loading

daira commented Jan 6, 2021 • edited Loading

Policy for assumptions about the size of `usize` #1748

Policy for assumptions about the size of `usize` #1748

durka commented Sep 12, 2016 •

edited

Loading

petrochenkov commented Sep 14, 2016 •

edited

Loading

gnzlbg commented Mar 25, 2019 •

edited

Loading

daira commented Jan 6, 2021 •

edited

Loading