Skip to content

Commit

Permalink
Rollup merge of #89114 - dequbed:c-char, r=yaahc
Browse files Browse the repository at this point in the history
Fixes a technicality regarding the size of C's `char` type

Specifically, ISO/IEC 9899:2018 — better known as "C18" — (and at least
C11, C99 and C89) do not specify the size of `byte` in bits.
Section 3.6 defines "byte" as "addressable unit of data storage" while
section 6.2.5 ("Types") only defines "char" as "large enough to store
any member of the basic execution set" giving it a lower bound of 7 bit
(since there are 96 characters in the basic execution set).
With section 6.5.3.4 paragraph 4 "When sizeof is applied to an operant
that has type char […] the result is 1" you could read this as the size
of `char` in bits being defined as exactly the same as the number of
bits in a byte but it's also valid to read that as an exception.

In general implementations take `char` as the smallest unit of
addressable memory, which for modern byte-addressed architectures is
overwhelmingly 8 bits to the point of this convention being completely
cemented into just about all of our software.

So is any of this actually relevant at all? I hope not. I sincerely hope
that this never, ever comes up.
But if for some reason a poor rustacean is having to interface with C
code running on a Cray X1 that in 2003 is still doing word-addressed
memory with 64-bit chars and they trust the docs here blindly it will
blow up in her face. And I'll be truly sorry for her to have to deal
with … all of that.
  • Loading branch information
the8472 authored Sep 21, 2021
2 parents ecfdadc + 23c608f commit 8a6e9cf
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion library/std/src/os/raw/char.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Equivalent to C's `char` type.

[C's `char` type] is completely unlike [Rust's `char` type]; while Rust's type represents a unicode scalar value, C's `char` type is just an ordinary integer. This type will always be either [`i8`] or [`u8`], as the type is defined as being one byte long.
[C's `char` type] is completely unlike [Rust's `char` type]; while Rust's type represents a unicode scalar value, C's `char` type is just an ordinary integer. On modern architectures this type will always be either [`i8`] or [`u8`], as they use byte-addresses memory with 8-bit bytes.

C chars are most commonly used to make C strings. Unlike Rust, where the length of a string is included alongside the string, C strings mark the end of a string with the character `'\0'`. See [`CStr`] for more information.

Expand Down

0 comments on commit 8a6e9cf

Please sign in to comment.