From 5559913ccacbe1e6c6c5d26d33a08f9b09641c5e Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 13 Jan 2020 08:27:49 +0000
Subject: [PATCH 01/68] Add inline asm RFC

---
 text/0000-inline-asm.md | 869 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 869 insertions(+)
 create mode 100644 text/0000-inline-asm.md

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
new file mode 100644
index 00000000000..4489ee9aa89
--- /dev/null
+++ b/text/0000-inline-asm.md
@@ -0,0 +1,869 @@
+- Feature Name: `asm`
+- Start Date: (fill me in with today's date, YYYY-MM-DD)
+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
+
+# Summary
+[summary]: #summary
+
+This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.
+
+The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.
+
+The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.
+
+# Motivation
+[motivation]: #motivation
+
+In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.
+
+The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+Rust provides support for inline assembly via the `asm!` macro.
+It can be used to embed handwritten assembly in the assembly output generated by the compiler.
+Generally this should not be necessary, but might be where the required performance or timing
+cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
+
+> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.
+
+## Basic usage
+
+Let us start with the simplest possible example:
+
+```rust
+unsafe {
+    asm!("nop");
+}
+```
+
+This will insert a NOP (no operation) instruction into the assembly generated by the compiler.
+Note that all `asm!` invocations have to be inside an `unsafe` block, as they could insert
+arbitrary instructions and break various invariants. The instructions to be inserted are listed
+in the first argument of the `asm!` macro as a string literal.
+
+## Inputs and outputs
+
+Now inserting an instruction that does nothing is rather boring. Let us do something that
+actually acts on data:
+
+```rust
+let x: u32;
+unsafe {
+    asm!("mov {}, 5", out(reg) x);
+}
+assert_eq!(x, 5);
+```
+
+This will write the value `5` into the `u32` variable `x`.
+You can see that the string literal we use to specify instructions is actually a template string.
+It is governed by the same rules as Rust [format strings][format-syntax].
+The arguments that are inserted into the template however look a bit different then you may
+be familiar with. First we need to specify if the variable is an input or an output of the
+inline assembly. In this case it is an output. We declared this by writing `out`.
+We also need to specify in what kind of register the assembly expects the variable.
+In this case we put it in an arbitrary general purpose register by specifying `reg`.
+The compiler will choose an appropriate register to insert into
+the template and will read the variable from there after the inline assembly finishes executing.
+
+Let see another example that also uses an input:
+
+```rust
+let i: u32 = 3;
+let o: u32;
+unsafe {
+    asm!("
+        mov {0}, {1}
+        add {0}, {number}
+    ", out(reg) o, in(reg) i, number = imm 5);
+}
+assert_eq!(i, 8);
+```
+
+This will add `5` to the input in variable `i` and write the result to variable `o`.
+The particular way this assembly does this is first copying the value from `i` to the output,
+and then adding `5` to it.
+
+The example shows a few things:
+
+First we can see that inputs are declared by writing `in` instead of `out`.
+
+Second one of our operands has a type we haven't seen yet, `imm`.
+This tells the compiler to expand this argument to an immediate inside the assembly template.
+This is only possible for constants and literals.
+
+Third we can see that we can specify an argument number, or name as in any format string.
+For inline assembly templates this is particularly useful as arguments are often used more than once.
+For more complex inline assembly using this facility is generally recommended, as it improves
+readability, and allows reordering instructions without changing the argument order.
+
+We can further refine the above example to avoid the `mov` instruction:
+
+```rust
+let mut x: u32 = 3;
+unsafe {
+    asm!("add {0}, {number}", inout(reg) x, number = imm 5);
+}
+assert_eq!(x, 8);
+```
+
+We can see that `inout` is used to specify an argument that is both input and output.
+This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.
+
+It is also possible to specify different variables for the input and output parts of an `inout` operand:
+
+```rust
+let x: u32 = 3;
+let y: u32;
+unsafe {
+    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);
+}
+assert_eq!(y, 8);
+```
+
+## Late output operands
+
+The Rust compiler is conservative with its allocation of operands. It is assumed that an `out`
+can be written at any time, and can therefore not share its location with any other argument.
+However, to guarantee optimal performance it is important to use as few registers as possible,
+so they won't have to be saved and reloaded around the inline assembly block.
+To achieve this Rust provides a `lateout` specifier. This can be used on any output that is
+guaranteed to be written only after all inputs have been consumed.
+There is also a `inlateout` variant of this specifier.
+
+Here is an example where `inlateout` *cannot* be used:
+
+```rust
+let mut a = 4;
+let b = 4;
+let c = 4;
+unsafe {
+    asm!("
+        add {0}, {1}
+        add {0}, {2}
+    ", inout(reg) a, in(reg) b, in(reg) c);
+}
+assert_eq!(a, 12);
+```
+
+Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.
+
+However the following example can use `inlateout` since the output is only modified after all input registers have been read:
+
+```rust
+let mut a = 4;
+let b = 4;
+unsafe {
+    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);
+}
+assert_eq!(a, 8);
+```
+
+As you can see, this assembly fragment will still work correctly if `a` and `b` are assigned to the same register.
+
+## Explicit register operands
+
+Some instructions require that the operands be in a specific register.
+Therefore, Rust inline assembly provides some more specific constraint specifiers.
+While `reg` is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers `eax`, `ebx`, `ecx`, `edx`, `ebp`, `esi`, and `edi`
+among others can be addressed by their name.
+
+```rust
+unsafe {
+    asm!("out 0x64, {}", in("eax") cmd);
+}
+```
+
+In this example we call the `out` instruction to output the content of the `cmd` variable
+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand
+we had to use the `eax` constraint specifier.
+
+It is somewhat common that instructions have operands that are not explicitly listed in the
+assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:
+
+```rust
+fn mul(a: u32, b: u32) -> u64 {
+    let lo: u32;
+    let hi: u32;
+
+    unsafe {
+        asm!(
+            // The x86 mul instruction takes eax as an implicit input and writes
+            // the 64-bit result of the multiplication to eax:edx.
+            "mul {}",
+            in(reg) a, in("eax") b,
+            lateout("eax") lo, lateout("edx") hi
+        );
+    }
+
+    hi as u64 << 32 + lo as u64
+}
+```
+
+This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.
+The only explicit operand is a register, that we fill from the variable `a`.
+The second implicit operand is the `eax` register which we fill from the variable `b`.
+The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.
+The higher 32 bits are stored in `edx` from which we fill the variable `hi`.
+
+Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.
+
+## Clobbered registers
+
+In many cases inline assembly will modify state that is not needed as an output.
+Usually this is either because we have to use a scratch register in the assembly,
+or instructions modify state that we don't need to further examine.
+This state is generally referred to as being "clobbered".
+We need to tell the compiler about this since it may need to save and restore this state
+around the inline assembly block.
+
+```rust
+let ebx: u32;
+let ecx: u32;
+
+unsafe {
+    asm!(
+        "cpuid",
+        in("eax") 4, in("ecx") 0,
+        lateout("ebx") ebx, lateout("ecx") ecx,
+        lateout("eax") _, lateout("edx") _
+    );
+}
+
+println!(
+    "L1 Cache: {}",
+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)
+);
+```
+
+In the example above we use the `cpuid` instruction to get the L1 cache size.
+This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache size we only care about the contents of `ebx` and `ecx`.
+
+However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.
+
+This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code.
+
+## Register template modifiers
+
+In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
+
+```rust
+let mut x: u16 = 0xab;
+
+unsafe {
+    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);
+}
+
+assert_eq!(x, 0xabab);
+```
+
+In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.
+
+Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
+The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
+
+## Flags
+
+By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
+
+Let's take our previous example of an `add` instruction:
+
+```rust
+let mut a = 4;
+let b = 4;
+unsafe {
+    asm!(
+        "add {0}, {1}",
+        inlateout(reg) a, in(reg) b,
+        flags(pure, nomem, nostack)
+    );
+}
+assert_eq!(a, 8);
+```
+
+Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:
+- `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.
+- `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).
+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.
+
+These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.
+
+See the reference for the full list of available flags and their effects.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Inline assembler is implemented as an unsafe macro `asm!()`.
+The first argument to this macro is a template string literal used to build the final assembly.
+The following arguments specify input and output operands.
+When required, flags are specified as the final argument.
+
+The following ABNF specifies the general syntax:
+
+```
+dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"
+reg_spec := <arch specific register class> / "<arch specific register name>"
+operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
+reg_operand := dir_spec "(" reg_spec ")" operand_expr
+operand := reg_operand / "imm" const_expr / "sym" path
+flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
+flags := "flags(" flag *["," flag] ")"
+asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"
+```
+
+[format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
+
+## Template string
+
+The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.
+
+The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.
+
+This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.
+
+However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
+
+The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.
+
+[rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
+
+## Operand type
+
+Several types of operands are supported:
+
+* `in(<reg>) <expr>`
+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
+  - The allocated register will contain the value of `<expr>` at the start of the asm code.
+  - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).
+* `out(<reg>) <expr>`
+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
+  - The allocated register will contain an unknown value at the start of the asm code.
+  - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.
+  - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).
+* `lateout(<reg>) <expr>`
+  - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.
+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.
+  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.
+* `inout(<reg>) <expr>`
+  - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
+  - The allocated register will contain the value of `<expr>` at the start of the asm code.
+  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.
+* `inout(<reg>) <in expr> => <out expr>`
+  - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.
+  - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.
+  - An underscore (`_`) may be specified instead of an expression for `<out expr>`, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).
+  - `<in expr>` and `<out expr>` may have different types.
+* `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`
+  - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).
+  - You should only write to the register after all inputs are read, otherwise you may clobber an input.
+* `imm <expr>`
+  - `<expr>` must be an integer or floating-point constant expression.
+  - The value of the expression is formatted as a string and substituted directly into the asm template string.
+* `sym <path>`
+  - `<path>` must refer to a `fn` or `static` defined in the current crate.
+  - A mangled symbol name referring to the item is substituted into the asm template string.
+  - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).
+
+## Register operands
+
+Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as raw identifiers (e.g. `reg`).
+
+Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.
+
+Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.
+
+If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.
+
+Here is the list of currently supported register classes:
+
+| Architecture | Register class | Registers | LLVM constraint code | Allowed types |
+| ------------ | -------------- | --------- | ----- | ------------- |
+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |
+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |
+| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |
+| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |
+| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |
+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |
+| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
+| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
+| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |
+| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |
+| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |
+| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |
+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |
+| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |
+
+> Notes on allowed types:
+> - Pointers and references are allowed where the equivalent integer type is allowed.
+> - `iLEN` refers to both sized and unsized integer types. It also implicitly includes `isize` and `usize` where the length matches.
+> - Fat pointers are not allowed.
+> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.
+
+Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).
+
+Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:
+
+| Architecture | Base register | Aliases |
+| ------------ | ------------- | ------- |
+| x86 | `ax` | `al`, `eax`, `rax` |
+| x86 | `bx` | `bl`, `ebx`, `rbx` |
+| x86 | `cx` | `cl`, `ecx`, `rcx` |
+| x86 | `dx` | `dl`, `edx`, `rdx` |
+| x86 | `si` | `sil`, `esi`, `rsi` |
+| x86 | `di` | `dil`, `edi`, `rdi` |
+| x86 | `bp` | `bpl`, `ebp`, `rbp` |
+| x86 | `sp` | `spl`, `esp`, `rsp` |
+| x86 | `ip` | `eip`, `rip` |
+| x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |
+| x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |
+| AArch64 | `x[0-30]` | `w[0-30]` |
+| AArch64 | `x29` | `fp` |
+| AArch64 | `x30` | `lr` |
+| AArch64 | `sp` | `wsp` |
+| AArch64 | `xzr` | `wzr` |
+| AArch64 | `v[0-31]` | `b[0-31]`, `h[0-31]`, `s[0-31]`, `d[0-31]`, `q[0-31]` |
+| ARM | `r[0-3]` | `a[1-4]` |
+| ARM | `r[4-9]` | `v[1-6]` |
+| ARM | `r9` | `rfp` |
+| ARM | `r10` | `sl` |
+| ARM | `r11` | `fp` |
+| ARM | `r12` | `ip` |
+| ARM | `r13` | `sp` |
+| ARM | `r14` | `lr` |
+| ARM | `r15` | `pc` |
+| RISC-V | `x0` | `zero` |
+| RISC-V | `x1` | `ra` |
+| RISC-V | `x2` | `sp` |
+| RISC-V | `x3` | `gp` |
+| RISC-V | `x4` | `tp` |
+| RISC-V | `x[5-7]` | `t[0-2]` |
+| RISC-V | `x8` | `fp`, `s0` |
+| RISC-V | `x9` | `s1` |
+| RISC-V | `x[10-17]` | `a[0-7]` |
+| RISC-V | `x[18-27]` | `s[2-11]` |
+| RISC-V | `x[28-31]` | `t[3-6]` |
+| RISC-V | `f[0-7]` | `ft[0-7]` |
+| RISC-V | `f[8-9]` | `fs[0-1]` |
+| RISC-V | `f[10-17]` | `fa[0-7]` |
+| RISC-V | `f[18-27]` | `fs[2-11]` |
+| RISC-V | `f[28-31]` | `ft[8-11]` |
+
+Some registers cannot be used for input or output operands:
+
+| Architecture | Unsupported register | Reason |
+| ------------ | -------------------- | ------ |
+| All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |
+| All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |
+| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |
+| x86 | `k0` | This is a constant zero register which can't be modified. |
+| x86 | `ip` | This is the program counter, not a real register. |
+| AArch64 | `xzr` | This is a constant zero register which can't be modified. |
+| ARM | `pc` | This is the program counter, not a real register. |
+| RISC-V | `x0` | This is a constant zero register which can't be modified. |
+| RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. |
+
+## Template modifiers
+
+The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.
+
+The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].
+
+| Architecture | Register class | Modifier | Input type | Example output |
+| ------------ | -------------- | -------- | ---------- | -------------- |
+| x86 | `reg` | None | `i8` | `al` |
+| x86 | `reg` | None | `i16` | `ax` |
+| x86 | `reg` | None | `i32` | `eax` |
+| x86 | `reg` | None | `i64` | `rax` |
+| x86-32 | `reg_abcd` | `b` | Any | `al` |
+| x86-64 | `reg` | `b` | Any | `al` |
+| x86 | `reg_abcd` | `h` | Any | `ah` |
+| x86 | `reg` | `w` | Any | `ax` |
+| x86 | `reg` | `k` | Any | `eax` |
+| x86-64 | `reg` | `q` | Any | `rax` |
+| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |
+| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |
+| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |
+| x86 (AVX-512) | `kreg` | None | Any | `k1` |
+| AArch64 | `reg` | None | Any | `x0` |
+| AArch64 | `reg` | `w` | Any | `w0` |
+| AArch64 | `reg` | `x` | Any | `x0` |
+| AArch64 | `vreg` | None | Any | `v0` |
+| AArch64 | `vreg` | `b` | Any | `b0` |
+| AArch64 | `vreg` | `h` | Any | `h0` |
+| AArch64 | `vreg` | `s` | Any | `s0` |
+| AArch64 | `vreg` | `d` | Any | `d0` |
+| AArch64 | `vreg` | `q` | Any | `q0` |
+| ARM | `reg` | None | Any | `r0` |
+| ARM | `vreg` | None | `f32` | `s0` |
+| ARM | `vreg` | None | `f64`, `v64` | `d0` |
+| ARM | `vreg` | None | `v128` | `q0` |
+| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |
+| RISC-V | `reg` | None | Any | `x1` |
+| RISC-V | `vreg` | None | Any | `f0` |
+
+> Notes:
+> - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.
+> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.
+
+[llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers
+
+## Flags
+
+Flags are used to further influence the behavior of the inline assembly block.
+Currently the following flags are defined:
+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.
+- `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
+- `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
+- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
+
+The `nomem` and `readonly` flags are mutually exclusive: it is an error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
+
+These flag registers which must be preserved if `preserves_flags` is set:
+- x86
+  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
+  - Direction flag in `EFLAGS` (DF).
+  - Floating-point status word (all).
+  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
+- ARM
+  - Condition flags in `CPSR` (N, Z, C, V)
+  - Saturation flag in `CPSR` (Q)
+  - Greater than or equal flags in `CPSR` (GE).
+  - Condition flags in `FPSCR` (N, Z, C, V)
+  - Saturation flag in `FPSCR` (QC)
+  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).
+- AArch64
+  - Condition flags (`NZCV` register).
+  - Floating-point status (`FPSR` register).
+- RISC-V
+  - Floating-point exception flags in `fcsr` (`fflags`).
+
+> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
+
+## Mapping to LLVM IR
+
+The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):
+
+* `in(reg)` => `r`
+* `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)
+* `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)
+* `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)
+* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)
+
+If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].
+
+As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:
+
+* Register classes are mapped to the appropriate constraint code as per the table above.
+* `imm` operands are formatted and injected directly into the asm string.
+* `sym` is mapped to `s` for statics and `X` for functions.
+* a register name `r1` is mapped to `{r1}`
+* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])
+* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).
+* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
+* If the `preserves_flags` flag is not set then the following are added to the clobber list:
+  - (x86) `~{dirflag}~{flags}~{fpsr}`
+  - (ARM/AArch64) `~{cc}`
+
+For some operand types, we will automatically insert some modifiers into the template string.
+* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
+* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.
+* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.
+
+Additionally, the following attributes are added to the LLVM `asm` statement:
+
+* The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).
+* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.
+* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.
+* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.
+* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.
+* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.
+
+If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.
+
+> Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.
+
+[llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list
+[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints
+[issue-65452]: https://github.com/rust-lang/rust/issues/65452
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+## Unfamiliarity
+
+This RFC proposes a completely new inline assembly format.
+It is not possible to just copy examples of GCC-style inline assembly and re-use them.
+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.
+
+Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.
+
+The `cpuid` example above would look like this in GCC-sytle inline assembly:
+
+```C
+// GCC doesn't allow directly clobbering an input, we need
+// to use a dummy output instead.
+int ebx, ecx, discard;
+asm (
+    "cpuid"
+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs
+    : "a"(4), "c"(0) // inputs
+    : "edx" // clobbers
+);
+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)
+    * (((ebx >> 12) & 0x3ff) + 1)
+    * ((ebx & 0xfff) + 1)
+    * (ecx + 1));
+```
+
+## Limited set of operand types
+
+The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.
+
+We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.
+
+If we discover that there is a demand for a new register class or special operand type, we can always add it later.
+
+## Difficulty of support
+
+Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).
+
+However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:
+
+```rust
+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
+{
+    let c;
+    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);
+    (a, c)
+}
+```
+
+This could be expanded to an external asm file with the following contents:
+
+```
+# Function prefix directives
+.section ".text.foo_inline_asm"
+.globl foo_inline_asm
+.p2align 2
+.type foo_inline_asm, @function
+foo_inline_asm:
+
+// If necessary, save callee-saved registers to the stack here.
+str x20, [sp, #-16]!
+
+// Move the pointer to the argument out of the way since x0 is used.
+mov x1, x0
+
+// Load inputs values
+ldr w2, [x1, #0]
+ldr w0, [x1, #4]
+
+<some asm code>
+
+// Store output values
+str w2, [x1, #0]
+str w20, [x1, #8]
+
+// If necessary, restore callee-saved registers here.
+ldr x20, [sp], #16
+
+ret
+
+# Function suffix directives
+.size foo_inline_asm, . - foo_inline_asm
+```
+
+And the following Rust code:
+
+```rust
+unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
+{
+    let c;
+    {
+        #[repr(C)]
+        struct foo_inline_asm_args {
+            a: i32,
+            b: i32,
+            c: i32,
+        }
+        extern "C" {
+            fn foo_inline_asm(args: *mut foo_inline_asm_args);
+        }
+        let mut args = foo_inline_asm_args {
+            a: a,
+            b: b,
+            c: mem::uninitialized(),
+        };
+        foo_inline_asm(&mut args);
+        a = args.a;
+        c = args.c;
+    }
+    (a, c)
+}
+```
+
+[cranelift]: https://cranelift.readthedocs.io/
+[cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444
+
+## Use of double braces in the template string
+
+Because `{}` are used to denote operand placeholders in the template string, actual uses of braces in the assembly code need to be escaped with `{{` and `}}`. This is needed for AVX-512 mask registers and ARM register lists.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+## Implement an embedded DSL
+[dsl]: #dsl
+
+Both MSVC and D provide what is best described as an embedded DSL for inline assembly.
+It is generally close to the system assembler's syntax, but augmented with the ability to directly access variables that are in scope.
+
+```D
+// This is D code
+int ebx, ecx;
+asm {
+    mov EAX, 4;
+    xor ECX, ECX;
+    cpuid;
+    mov ebx, EBX;
+    mov ecx, ECX;
+}
+writefln("L1 Cache: %s",
+    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1)
+    * ((ebx & 0xfff) + 1) * (ecx + 1));
+```
+
+```C++
+// This is MSVC C++
+int ebx_v, ecx_v;
+__asm {
+    mov eax, 4
+    xor ecx, ecx
+    cpuid
+    mov ebx_v, ebx
+    mov ecx_v, ecx
+}
+std::cout << "L1 Cache: "
+    << ((ebx_v >> 22) + 1) * (((ebx_v >> 12) & 0x3ff) + 1)
+        * ((ebx_v & 0xfff) + 1) * (ecx_v + 1))
+    << '\n';
+```
+
+While this is very convenient on the user side in that it requires no specification of inputs,
+outputs, or clobbers, it puts a major burden on the implementation.
+The DSL needs to be implemented for each supported architecture, and full knowledge of the
+side-effect of every instruction is required.
+
+This huge implementation overhead is likely one of the reasons MSVC only
+provides this capability for x86, while D at least provides it for x86 and x86-64.
+It should also be noted that the D reference implementation falls slightly short of supporting
+arbitrary assembly. E.g. the lack of access to the `RIP` register makes certain techniques for
+writing position independent code impossible.
+
+As a stop-gap the LDC implementation of D provides a `llvmasm` feature that binds it closely
+to LLVM IR's inline assembly.
+
+We believe it would be unfortunate to put Rust into a similar situation, making certain
+architectures a second-class citizen with respect to inline assembly.
+
+## Provide intrinsics for each instruction
+
+In discussions it is often postulated that providing intrinsics is a better solution to the
+problems at hand.
+However, particularly where precise timing, and full control over the number of generated
+instructions is required intrinsics fall short.
+
+Intrinsics are of course still useful and have their place for inserting specific instructions.
+E.g. making sure a loop uses vector instructions, rather than relying on auto-vectorization.
+
+However, inline assembly is specifically designed for cases where more control is required.
+Also providing an intrinsic for every (potentially obscure) instruction that is needed
+e.g. during early system boot in kernel code is unlikely to scale.
+
+## Make the `asm!` macro return outputs
+
+It has been suggested that the `asm!` macro could return its outputs like the LLVM statement does.
+The benefit is that it is clearer to see that variables are being modified.
+Particular in the case of initialization it becomes more obvious what is happening.
+On the other hand by necessity this splits the direction and constraint specification from
+the variable name, which makes this syntax overall harder to read.
+
+```rust
+fn mul(a: u32, b: u32) -> u64 {
+    let (lo, hi) = unsafe {
+        asm!("mul {}", in(reg) a, in("eax") b, lateout("eax"), lateout("edx"))
+    };
+
+    hi as u64 << 32 + lo as u64
+}
+```
+
+# Prior art
+[prior-art]: #prior-art
+
+## GCC inline assembly
+
+The proposed syntax is very similar to GCC's inline assembly in that it is based on string substitution while leaving actual interpretation of the final string to the assembler. However GCC uses poorly documented single-letter constraint codes and template modifiers. Clang tries to emulate GCC's behavior, but there are still several cases where its behavior differs from GCC's.
+
+The main reason why this is so complicated is that GCC's inline assembly basically exports the raw internals of GCC's register allocator. This has resulted in many internal constraint codes and modifiers being widely used, despite them being completely undocumented.
+
+## D & MSVC inline assembly
+
+See the section [above][dsl].
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+None
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+## Flag outputs
+
+GCC supports a special type of output which allows an asm block to return a `bool` encoded in the condition flags register. This allows the compiler to branch directly on the condition flag instead of materializing the condition as a `bool`.
+
+We can support this in the future with a special output operand type.
+
+## `asm goto`
+
+GCC supports passing C labels (the ones used with `goto`) to an inline asm block, with an indication that the asm code may jump directly to one of these labels instead of leaving the asm block normally.
+
+This could be supported by allowing code blocks to be specified as operand types. The following code will print `a` if the input value is `42`, or print `b` otherwise.
+
+```rust
+asm!("cmp {}, 42; jeq {}",
+    in(reg) val,
+    label { println!("a"); },
+    fallthrough { println!("b"); }
+);
+```
+
+## Unique ID per `asm`
+
+GCC supports `%=` which generates a unique identifier per instance of an asm block. This is guaranteed to be unique even if the asm block is duplicated (e.g. because of inlining).
+
+We can support this in the future with a special operand type.
+
+## `imm` and `sym` for `global_asm!`
+
+The `global_asm!` macro could be extended to support `imm` and `sym` operands since those can be resolved by simple string substitution. Symbols used in `global_asm!` will be marked as `#[used]` to ensure that they are not optimized away by the compiler.
+
+## Memory operands
+
+We could support `mem` as an alternative to specifying a register class which would leave the operand in memory and instead produce a memory address when inserted into the asm string. This would allow generating more efficient code by taking advantage of addressing modes instead of using an intermediate register to hold the computed address.
+
+## Shorthand notation for operand names
+
+We should support some sort of shorthand notation for operand names to avoid needing to write `blah = out(reg) blah`? For example, if the expression is just a single identifier, we could implicitly allow that operand to be referred to using that identifier.
+
+## Clobbers for function calls
+
+Sometimes it can be difficult to specify the necessary clobbers for an asm block which performs a function call. In particular, it is difficult for such code to be forward-compatible if the architecture adds new registers in a future revision, which the compiler may use but will be missing from the `asm!` clobber list.
+
+One possible solution to this would be to add a `clobber(<abi>)` operand where `<abi>` is a calling convention such as `"C"` or `"stdcall"`. The compiler would then automatically insert the necessary clobbers for a function call to that ABI. Also `clobber(all)`, could be used to indicate all registers are clobbered by the `asm!`.

From 4edab7677dca8806a1315e614ad96403158d9b26 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 14 Jan 2020 14:20:42 +0000
Subject: [PATCH 02/68] Minor corrections

---
 text/0000-inline-asm.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 4489ee9aa89..1061ace46dd 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -19,6 +19,10 @@ In systems programming some tasks require dropping down to the assembly level. T
 
 The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.
 
+A collection of use cases for inline asm can be found in [this repository][catalogue].
+
+[catalogue]: https://github.com/bjorn3/inline_asm_catalogue/
+
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 
@@ -79,7 +83,7 @@ unsafe {
         add {0}, {number}
     ", out(reg) o, in(reg) i, number = imm 5);
 }
-assert_eq!(i, 8);
+assert_eq!(o, 8);
 ```
 
 This will add `5` to the input in variable `i` and write the result to variable `o`.
@@ -130,7 +134,7 @@ can be written at any time, and can therefore not share its location with any ot
 However, to guarantee optimal performance it is important to use as few registers as possible,
 so they won't have to be saved and reloaded around the inline assembly block.
 To achieve this Rust provides a `lateout` specifier. This can be used on any output that is
-guaranteed to be written only after all inputs have been consumed.
+written only after all inputs have been consumed.
 There is also a `inlateout` variant of this specifier.
 
 Here is an example where `inlateout` *cannot* be used:
@@ -204,7 +208,7 @@ fn mul(a: u32, b: u32) -> u64 {
 
 This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.
 The only explicit operand is a register, that we fill from the variable `a`.
-The second implicit operand is the `eax` register which we fill from the variable `b`.
+The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.
 The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.
 The higher 32 bits are stored in `edx` from which we fill the variable `hi`.
 
@@ -368,7 +372,7 @@ Several types of operands are supported:
 
 ## Register operands
 
-Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as raw identifiers (e.g. `reg`).
+Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
 
 Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.
 
@@ -521,7 +525,7 @@ Currently the following flags are defined:
 - `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
 - `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
 
-The `nomem` and `readonly` flags are mutually exclusive: it is an error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
+The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
 
 These flag registers which must be preserved if `preserves_flags` is set:
 - x86

From 03f22fe32fd74cecbdb4f93dbef4391c6ce0915a Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 14 Jan 2020 22:12:43 +0000
Subject: [PATCH 03/68] More minor corrections

---
 text/0000-inline-asm.md | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 1061ace46dd..d29ba1b79b6 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -10,7 +10,9 @@ This RFC specifies a new syntax for inline assembly which is suitable for eventu
 
 The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.
 
-The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.
+The transition from the existing `asm!` macro is described in RFC [2843][rfc-llvm-asm]. The existing `asm!` macro will be renamed to `llvm_asm!` to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However `llvm_asm!` is not intended to ever be stabilized.
+
+[rfc-llvm-asm]: https://github.com/rust-lang/rfcs/pull/2843
 
 # Motivation
 [motivation]: #motivation
@@ -247,7 +249,21 @@ This instruction writes to `eax`, `ebx`, `ecx`, and `edx`, but for the cache siz
 
 However we still need to tell the compiler that `eax` and `edx` have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with `_` instead of a variable name, which indicates that the output value is to be discarded.
 
-This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code.
+This can also be used with a general register class (e.g. `reg`) to obtain a scratch register for use inside the asm code:
+
+```rust
+// Multiply x by 6 using shifts and adds
+let mut x = 4;
+unsafe {
+    asm!("
+        mov {tmp}, {x}
+        shl {tmp}, 1
+        shl {x}, 2
+        add {x}, {tmp}
+    ", x = inout(reg) x, tmp = out(reg) _);
+}
+assert_eq!(x, 4 * 6);
+```
 
 ## Register template modifiers
 
@@ -402,7 +418,7 @@ Here is the list of currently supported register classes:
 
 > Notes on allowed types:
 > - Pointers and references are allowed where the equivalent integer type is allowed.
-> - `iLEN` refers to both sized and unsized integer types. It also implicitly includes `isize` and `usize` where the length matches.
+> - `iLEN` refers to both sized and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.
 > - Fat pointers are not allowed.
 > - `vLEN` refers to a SIMD vector that is `LEN` bits wide.
 
@@ -421,6 +437,7 @@ Some registers have multiple names. These are all treated by the compiler as ide
 | x86 | `bp` | `bpl`, `ebp`, `rbp` |
 | x86 | `sp` | `spl`, `esp`, `rsp` |
 | x86 | `ip` | `eip`, `rip` |
+| x86 | `st(0)` | `st` |
 | x86 | `r[8-15]` | `r[8-15]b`, `r[8-15]w`, `r[8-15]d` |
 | x86 | `xmm[0-31]` | `ymm[0-31]`, `zmm[0-31]` |
 | AArch64 | `x[0-30]` | `w[0-30]` |
@@ -464,6 +481,8 @@ Some registers cannot be used for input or output operands:
 | x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |
 | x86 | `k0` | This is a constant zero register which can't be modified. |
 | x86 | `ip` | This is the program counter, not a real register. |
+| x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |
+| x86 | `st([0-7])` | x87 registers are not currently supported (but may be in the future). |
 | AArch64 | `xzr` | This is a constant zero register which can't be modified. |
 | ARM | `pc` | This is the program counter, not a real register. |
 | RISC-V | `x0` | This is a constant zero register which can't be modified. |

From 31cab5e47d44e01b4ef4ffe74ac0bf4f8f1723d5 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 14 Jan 2020 23:01:45 +0000
Subject: [PATCH 04/68] Oops

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index d29ba1b79b6..b72ee346fc7 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -418,7 +418,7 @@ Here is the list of currently supported register classes:
 
 > Notes on allowed types:
 > - Pointers and references are allowed where the equivalent integer type is allowed.
-> - `iLEN` refers to both sized and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.
+> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.
 > - Fat pointers are not allowed.
 > - `vLEN` refers to a SIMD vector that is `LEN` bits wide.
 

From add2123e24ee38924aa4dc78268535f747eb0edc Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 21 Jan 2020 15:54:25 +0000
Subject: [PATCH 05/68] Add a section on rules for inline asm

---
 text/0000-inline-asm.md | 59 ++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b72ee346fc7..dfd0a4f20fb 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -540,33 +540,12 @@ Currently the following flags are defined:
 - `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.
 - `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
 - `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
-- `preserves_flags`: The `asm` block does not modify the flags register (defined below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
+- `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
 - `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
 - `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
 
 The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
 
-These flag registers which must be preserved if `preserves_flags` is set:
-- x86
-  - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
-  - Direction flag in `EFLAGS` (DF).
-  - Floating-point status word (all).
-  - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
-- ARM
-  - Condition flags in `CPSR` (N, Z, C, V)
-  - Saturation flag in `CPSR` (Q)
-  - Greater than or equal flags in `CPSR` (GE).
-  - Condition flags in `FPSCR` (N, Z, C, V)
-  - Saturation flag in `FPSCR` (QC)
-  - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).
-- AArch64
-  - Condition flags (`NZCV` register).
-  - Floating-point status (`FPSR` register).
-- RISC-V
-  - Floating-point exception flags in `fcsr` (`fflags`).
-
-> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
-
 ## Mapping to LLVM IR
 
 The direction specification maps to a LLVM constraint specification as follows (using a `reg` operand as an example):
@@ -614,6 +593,42 @@ If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted
 [llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints
 [issue-65452]: https://github.com/rust-lang/rust/issues/65452
 
+## Rules for inline assembly
+[rules]: #rules
+
+- Any registers not specified as inputs will contain an undefined value on entry to the asm block.
+- Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry.
+- Behavior is undefined if execution unwinds out of an asm block.
+- Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
+  - Refer to the unsafe code guidelines for the exact rules.
+  - If the `readonly` flag is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
+  - If the `nomem` flag is set then no reads or write to memory are allowed.
+- Unless the `nostack` flag is set, asm code is allowed to use stack space below the stack pointer.
+  - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
+  - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).
+  - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
+- If the `noreturn` flag is set then behavior is undefined if execution falls through to the end of the asm block.
+- These flags registers must be restored upon exiting the asm block if `preserves_flags` is set:
+  - x86
+    - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
+    - Direction flag in `EFLAGS` (DF).
+    - Floating-point status word (all).
+    - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
+  - ARM
+    - Condition flags in `CPSR` (N, Z, C, V)
+    - Saturation flag in `CPSR` (Q)
+    - Greater than or equal flags in `CPSR` (GE).
+    - Condition flags in `FPSCR` (N, Z, C, V)
+    - Saturation flag in `FPSCR` (QC)
+    - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).
+  - AArch64
+    - Condition flags (`NZCV` register).
+    - Floating-point status (`FPSR` register).
+  - RISC-V
+    - Floating-point exception flags in `fcsr` (`fflags`).
+
+> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
+
 # Drawbacks
 [drawbacks]: #drawbacks
 

From fad338afed816329e8a6110510d65b8648f0a021 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 21 Jan 2020 15:58:33 +0000
Subject: [PATCH 06/68] Rename flags() to options()

---
 text/0000-inline-asm.md | 50 ++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index dfd0a4f20fb..5ac1cb26c4d 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -284,7 +284,7 @@ In this example, we use the `reg_abcd` register class to restrict the register a
 Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
 The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
 
-## Flags
+## Options
 
 By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.
 
@@ -297,20 +297,20 @@ unsafe {
     asm!(
         "add {0}, {1}",
         inlateout(reg) a, in(reg) b,
-        flags(pure, nomem, nostack)
+        options(pure, nomem, nostack)
     );
 }
 assert_eq!(a, 8);
 ```
 
-Flags can be provided as an optional final argument to the `asm!` macro. We specified three flags here:
+Options can be provided as an optional final argument to the `asm!` macro. We specified three options here:
 - `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.
 - `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).
 - `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.
 
 These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.
 
-See the reference for the full list of available flags and their effects.
+See the reference for the full list of available options and their effects.
 
 # Reference-level explanation
 [reference-level-explanation]: #reference-level-explanation
@@ -318,7 +318,7 @@ See the reference for the full list of available flags and their effects.
 Inline assembler is implemented as an unsafe macro `asm!()`.
 The first argument to this macro is a template string literal used to build the final assembly.
 The following arguments specify input and output operands.
-When required, flags are specified as the final argument.
+When required, options are specified as the final argument.
 
 The following ABNF specifies the general syntax:
 
@@ -328,9 +328,9 @@ reg_spec := <arch specific register class> / "<arch specific register name>"
 operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
 reg_operand := dir_spec "(" reg_spec ")" operand_expr
 operand := reg_operand / "imm" const_expr / "sym" path
-flag := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
-flags := "flags(" flag *["," flag] ")"
-asm := "asm!(" format_string *("," [ident "="] operand) ["," flags] ")"
+option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
+options := "options(" option *["," option] ")"
+asm := "asm!(" format_string *("," [ident "="] operand) ["," options] ")"
 ```
 
 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
@@ -533,18 +533,18 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 
 [llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers
 
-## Flags
+## Options
 
 Flags are used to further influence the behavior of the inline assembly block.
-Currently the following flags are defined:
-- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this flag is used on an `asm` with no outputs.
+Currently the following options are defined:
+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this option is used on an `asm` with no outputs.
 - `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
 - `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
 - `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
 - `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
-- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this flag is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
+- `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
 
-The `nomem` and `readonly` flags are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
+The `nomem` and `readonly` options are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
 
 ## Mapping to LLVM IR
 
@@ -566,8 +566,8 @@ As written this RFC requires architectures to map from Rust constraint specifica
 * a register name `r1` is mapped to `{r1}`
 * additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])
 * `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).
-* If the `nomem` flag is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
-* If the `preserves_flags` flag is not set then the following are added to the clobber list:
+* If the `nomem` option is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
+* If the `preserves_flags` option is not set then the following are added to the clobber list:
   - (x86) `~{dirflag}~{flags}~{fpsr}`
   - (ARM/AArch64) `~{cc}`
 
@@ -579,13 +579,13 @@ For some operand types, we will automatically insert some modifiers into the tem
 Additionally, the following attributes are added to the LLVM `asm` statement:
 
 * The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).
-* If the `nomem` flag is set then the `readnone` attribute is added to the LLVM `asm` statement.
-* If the `readonly` flag is set then the `readonly` attribute is added to the LLVM `asm` statement.
-* If the `pure` flag is not set then the `sideffect` flag is added the LLVM `asm` statement.
-* If the `nostack` flag is not set then the `alignstack` flag is added the LLVM `asm` statement.
+* If the `nomem` option is set then the `readnone` attribute is added to the LLVM `asm` statement.
+* If the `readonly` option is set then the `readonly` attribute is added to the LLVM `asm` statement.
+* If the `pure` option is not set then the `sideffect` flag is added the LLVM `asm` statement.
+* If the `nostack` option is not set then the `alignstack` flag is added the LLVM `asm` statement.
 * On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.
 
-If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted after the asm invocation.
+If the `noreturn` option is set then an `unreachable` LLVM instruction is inserted after the asm invocation.
 
 > Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.
 
@@ -601,14 +601,14 @@ If the `noreturn` flag is set then an `unreachable` LLVM instruction is inserted
 - Behavior is undefined if execution unwinds out of an asm block.
 - Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
   - Refer to the unsafe code guidelines for the exact rules.
-  - If the `readonly` flag is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
-  - If the `nomem` flag is set then no reads or write to memory are allowed.
-- Unless the `nostack` flag is set, asm code is allowed to use stack space below the stack pointer.
+  - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
+  - If the `nomem` option is set then no reads or write to memory are allowed.
+- Unless the `nostack` option is set, asm code is allowed to use stack space below the stack pointer.
   - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
   - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).
   - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
-- If the `noreturn` flag is set then behavior is undefined if execution falls through to the end of the asm block.
-- These flags registers must be restored upon exiting the asm block if `preserves_flags` is set:
+- If the `noreturn` option is set then behavior is undefined if execution falls through to the end of the asm block.
+- These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
   - x86
     - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
     - Direction flag in `EFLAGS` (DF).

From 806228fb38aacfe466be89ffd45e311fb9e9d0d3 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 21 Jan 2020 16:43:37 +0000
Subject: [PATCH 07/68] Clarify support for non-LLVM backends

---
 text/0000-inline-asm.md | 156 ++++++++++++++++++++--------------------
 1 file changed, 80 insertions(+), 76 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 5ac1cb26c4d..b8f78d8e9a3 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -593,84 +593,11 @@ If the `noreturn` option is set then an `unreachable` LLVM instruction is insert
 [llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints
 [issue-65452]: https://github.com/rust-lang/rust/issues/65452
 
-## Rules for inline assembly
-[rules]: #rules
-
-- Any registers not specified as inputs will contain an undefined value on entry to the asm block.
-- Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry.
-- Behavior is undefined if execution unwinds out of an asm block.
-- Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
-  - Refer to the unsafe code guidelines for the exact rules.
-  - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
-  - If the `nomem` option is set then no reads or write to memory are allowed.
-- Unless the `nostack` option is set, asm code is allowed to use stack space below the stack pointer.
-  - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
-  - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).
-  - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
-- If the `noreturn` option is set then behavior is undefined if execution falls through to the end of the asm block.
-- These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
-  - x86
-    - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
-    - Direction flag in `EFLAGS` (DF).
-    - Floating-point status word (all).
-    - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
-  - ARM
-    - Condition flags in `CPSR` (N, Z, C, V)
-    - Saturation flag in `CPSR` (Q)
-    - Greater than or equal flags in `CPSR` (GE).
-    - Condition flags in `FPSCR` (N, Z, C, V)
-    - Saturation flag in `FPSCR` (QC)
-    - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).
-  - AArch64
-    - Condition flags (`NZCV` register).
-    - Floating-point status (`FPSR` register).
-  - RISC-V
-    - Floating-point exception flags in `fcsr` (`fflags`).
+## Supporting back-ends without inline assembly
 
-> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
+While LLVM supports inline assembly, rustc may gain alternative backends such as Cranelift or GCC. If a back-end does not support inline assembly natively then we can fall back to invoking an external assembler. The intent is that support for `asm!` should be independent of the rustc back-end used: it should always work, but with lower performance if the backend does not support inline assembly.
 
-# Drawbacks
-[drawbacks]: #drawbacks
-
-## Unfamiliarity
-
-This RFC proposes a completely new inline assembly format.
-It is not possible to just copy examples of GCC-style inline assembly and re-use them.
-There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.
-
-Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.
-
-The `cpuid` example above would look like this in GCC-sytle inline assembly:
-
-```C
-// GCC doesn't allow directly clobbering an input, we need
-// to use a dummy output instead.
-int ebx, ecx, discard;
-asm (
-    "cpuid"
-    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs
-    : "a"(4), "c"(0) // inputs
-    : "edx" // clobbers
-);
-printf("L1 Cache: %i\n", ((ebx >> 22) + 1)
-    * (((ebx >> 12) & 0x3ff) + 1)
-    * ((ebx & 0xfff) + 1)
-    * (ecx + 1));
-```
-
-## Limited set of operand types
-
-The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.
-
-We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.
-
-If we discover that there is a demand for a new register class or special operand type, we can always add it later.
-
-## Difficulty of support
-
-Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]).
-
-However it is possible to implement support for inline assembly without support from the compiler backend by using an external assembler instead. Take the following (AArch64) asm block as an example:
+Take the following (AArch64) asm block as an example:
 
 ```rust
 unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
@@ -745,6 +672,83 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 }
 ```
 
+## Rules for inline assembly
+[rules]: #rules
+
+- Any registers not specified as inputs will contain an undefined value on entry to the asm block.
+- Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry.
+- Behavior is undefined if execution unwinds out of an asm block.
+- Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
+  - Refer to the unsafe code guidelines for the exact rules.
+  - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
+  - If the `nomem` option is set then no reads or write to memory are allowed.
+- Unless the `nostack` option is set, asm code is allowed to use stack space below the stack pointer.
+  - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
+  - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).
+  - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
+- If the `noreturn` option is set then behavior is undefined if execution falls through to the end of the asm block.
+- These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
+  - x86
+    - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
+    - Direction flag in `EFLAGS` (DF).
+    - Floating-point status word (all).
+    - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
+  - ARM
+    - Condition flags in `CPSR` (N, Z, C, V)
+    - Saturation flag in `CPSR` (Q)
+    - Greater than or equal flags in `CPSR` (GE).
+    - Condition flags in `FPSCR` (N, Z, C, V)
+    - Saturation flag in `FPSCR` (QC)
+    - Floating-point exception flags in `FPSCR` (IDC, IXC, UFC, OFC, DZC, IOC).
+  - AArch64
+    - Condition flags (`NZCV` register).
+    - Floating-point status (`FPSR` register).
+  - RISC-V
+    - Floating-point exception flags in `fcsr` (`fflags`).
+
+> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+## Unfamiliarity
+
+This RFC proposes a completely new inline assembly format.
+It is not possible to just copy examples of GCC-style inline assembly and re-use them.
+There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.
+
+Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.
+
+The `cpuid` example above would look like this in GCC-sytle inline assembly:
+
+```C
+// GCC doesn't allow directly clobbering an input, we need
+// to use a dummy output instead.
+int ebx, ecx, discard;
+asm (
+    "cpuid"
+    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs
+    : "a"(4), "c"(0) // inputs
+    : "edx" // clobbers
+);
+printf("L1 Cache: %i\n", ((ebx >> 22) + 1)
+    * (((ebx >> 12) & 0x3ff) + 1)
+    * ((ebx & 0xfff) + 1)
+    * (ecx + 1));
+```
+
+## Limited set of operand types
+
+The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.
+
+We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.
+
+If we discover that there is a demand for a new register class or special operand type, we can always add it later.
+
+## Difficulty of support
+
+Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as [Cranelift][cranelift] (see [this issue][cranelift-asm]). We provide a fallback implementation using an external assembler for such backends.
+
 [cranelift]: https://cranelift.readthedocs.io/
 [cranelift-asm]: https://github.com/bytecodealliance/cranelift/issues/444
 

From fa90ad263cb96b549586bc9d98733b30ad672d97 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 22 Jan 2020 14:42:14 +0000
Subject: [PATCH 08/68] Clarify reg class under ARM Thumb1

---
 text/0000-inline-asm.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b8f78d8e9a3..9e311d85e06 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -409,10 +409,11 @@ Here is the list of currently supported register classes:
 | AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
 | AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
 | AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
-| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |
+| ARM (ARM/Thumb2) | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |
+| ARM (Thumb1) | `reg` | `r[0-r7]` | `r` | `i8`, `i16`, `i32` |
 | ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |
 | ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |
-| ARM | `vreg_low8` | `s[0-15]`, `d[0-d]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |
+| ARM | `vreg_low8` | `s[0-15]`, `d[0-8]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |
 | RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |
 | RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |
 

From fc99a8cdf5d0f3b7ffc5aa22c576027c4fcc23ad Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 24 Jan 2020 17:34:01 +0000
Subject: [PATCH 09/68] Clarify rules

---
 text/0000-inline-asm.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 9e311d85e06..21617f06574 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -678,6 +678,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 
 - Any registers not specified as inputs will contain an undefined value on entry to the asm block.
 - Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry.
+  - This only applies to registers which can be specified as an input or output.
 - Behavior is undefined if execution unwinds out of an asm block.
 - Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
   - Refer to the unsafe code guidelines for the exact rules.
@@ -687,6 +688,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
   - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
   - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).
   - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
+  - The stack pointer must be restored to its original value before leaving the asm block.
 - If the `noreturn` option is set then behavior is undefined if execution falls through to the end of the asm block.
 - These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
   - x86

From 2572476dfbf3675ece3a3e4f9b6c72863dd229e0 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 25 Jan 2020 01:09:56 +0000
Subject: [PATCH 10/68] Rename imm to const

---
 text/0000-inline-asm.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 21617f06574..f235bf886bf 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -83,7 +83,7 @@ unsafe {
     asm!("
         mov {0}, {1}
         add {0}, {number}
-    ", out(reg) o, in(reg) i, number = imm 5);
+    ", out(reg) o, in(reg) i, number = const 5);
 }
 assert_eq!(o, 8);
 ```
@@ -96,8 +96,8 @@ The example shows a few things:
 
 First we can see that inputs are declared by writing `in` instead of `out`.
 
-Second one of our operands has a type we haven't seen yet, `imm`.
-This tells the compiler to expand this argument to an immediate inside the assembly template.
+Second one of our operands has a type we haven't seen yet, `const`.
+This tells the compiler to expand this argument to value directly inside the assembly template.
 This is only possible for constants and literals.
 
 Third we can see that we can specify an argument number, or name as in any format string.
@@ -110,7 +110,7 @@ We can further refine the above example to avoid the `mov` instruction:
 ```rust
 let mut x: u32 = 3;
 unsafe {
-    asm!("add {0}, {number}", inout(reg) x, number = imm 5);
+    asm!("add {0}, {number}", inout(reg) x, number = const 5);
 }
 assert_eq!(x, 8);
 ```
@@ -124,7 +124,7 @@ It is also possible to specify different variables for the input and output part
 let x: u32 = 3;
 let y: u32;
 unsafe {
-    asm!("add {0}, {number}", inout(reg) x => y, number = imm 5);
+    asm!("add {0}, {number}", inout(reg) x => y, number = const 5);
 }
 assert_eq!(y, 8);
 ```
@@ -327,7 +327,7 @@ dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"
 reg_spec := <arch specific register class> / "<arch specific register name>"
 operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
 reg_operand := dir_spec "(" reg_spec ")" operand_expr
-operand := reg_operand / "imm" const_expr / "sym" path
+operand := reg_operand / "const" const_expr / "sym" path
 option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
 options := "options(" option *["," option] ")"
 asm := "asm!(" format_string *("," [ident "="] operand) ["," options] ")"
@@ -378,7 +378,7 @@ Several types of operands are supported:
 * `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`
   - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).
   - You should only write to the register after all inputs are read, otherwise you may clobber an input.
-* `imm <expr>`
+* `const <expr>`
   - `<expr>` must be an integer or floating-point constant expression.
   - The value of the expression is formatted as a string and substituted directly into the asm template string.
 * `sym <path>`
@@ -562,7 +562,7 @@ If an `inout` is used where the output type is smaller than the input type then
 As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:
 
 * Register classes are mapped to the appropriate constraint code as per the table above.
-* `imm` operands are formatted and injected directly into the asm string.
+* `const` operands are formatted and injected directly into the asm string.
 * `sym` is mapped to `s` for statics and `X` for functions.
 * a register name `r1` is mapped to `{r1}`
 * additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])
@@ -573,7 +573,7 @@ As written this RFC requires architectures to map from Rust constraint specifica
   - (ARM/AArch64) `~{cc}`
 
 For some operand types, we will automatically insert some modifiers into the template string.
-* For `sym` and `imm` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
+* For `sym` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
 * On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.
 * On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.
 
@@ -895,9 +895,9 @@ GCC supports `%=` which generates a unique identifier per instance of an asm blo
 
 We can support this in the future with a special operand type.
 
-## `imm` and `sym` for `global_asm!`
+## `const` and `sym` for `global_asm!`
 
-The `global_asm!` macro could be extended to support `imm` and `sym` operands since those can be resolved by simple string substitution. Symbols used in `global_asm!` will be marked as `#[used]` to ensure that they are not optimized away by the compiler.
+The `global_asm!` macro could be extended to support `const` and `sym` operands since those can be resolved by simple string substitution. Symbols used in `global_asm!` will be marked as `#[used]` to ensure that they are not optimized away by the compiler.
 
 ## Memory operands
 

From 93b179cb1a56e0d284ec398d13225137b47e6b61 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Thu, 30 Jan 2020 15:14:50 +0000
Subject: [PATCH 11/68] Rework register classes and modifiers

---
 text/0000-inline-asm.md | 152 +++++++++++++++++++++-------------------
 1 file changed, 78 insertions(+), 74 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index f235bf886bf..cea8e6a8438 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -306,7 +306,7 @@ assert_eq!(a, 8);
 Options can be provided as an optional final argument to the `asm!` macro. We specified three options here:
 - `pure` means that the asm code has no observable side effects and that its output depends only on its inputs. This allows the compiler optimizer to call the inline asm fewer times or even eliminate it entirely.
 - `nomem` means that the asm code does not read or write to memory. By default the compiler will assume that inline assembly can read or write any memory address that is accessible to it (e.g. through a pointer passed as an operand, or a global).
-- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86_64 to avoid stack pointer adjustments.
+- `nostack` means that the asm code does not push any data onto the stack. This allows the compiler to use optimizations such as the stack red zone on x86-64 to avoid stack pointer adjustments.
 
 These allow the compiler to better optimize code using `asm!`, for example by eliminating pure `asm!` blocks whose outputs are not needed.
 
@@ -324,7 +324,7 @@ The following ABNF specifies the general syntax:
 
 ```
 dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"
-reg_spec := <arch specific register class> / "<arch specific register name>"
+reg_spec := <register class> / "<explicit register>"
 operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
 reg_operand := dir_spec "(" reg_spec ")" operand_expr
 operand := reg_operand / "const" const_expr / "sym" path
@@ -390,40 +390,45 @@ Several types of operands are supported:
 
 Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
 
-Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally on ARM, it is a compile-time error to use overlapping VFP registers in input operands or in output operands.
+Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
 
-Different registers classes have different constraints on which Rust types they allow. For example, `reg` generally only allows integers and pointers, but not floats or SIMD vectors.
+Only the following types are allowed as operands for inline assembly:
+- Integers (signed and unsigned)
+- Floating-point numbers
+- Pointers and references (thin only)
+- Function pointers
+- SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`)
 
-If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.
+Each register class has a width which limits the size of operands that can be passed through that register class. If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.
 
 Here is the list of currently supported register classes:
 
-| Architecture | Register class | Registers | LLVM constraint code | Allowed types |
-| ------------ | -------------- | --------- | ----- | ------------- |
-| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |
-| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` | `i8`, `i16`, `i32`, `i64` (x86-64 only) |
-| x86 | `vreg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |
-| x86 | `vreg_evex` | `xmm[0-31]` (AVX-512, otherwise same as `vreg`) | `v` | `i32`, `i64`, `f32`, `f64`, `v128`, `v256`, `v512` |
-| x86 (AVX-512) | `kreg` | `k[1-7]` | `Yk` | `i16`, `i32`, `i64` |
-| AArch64 | `reg` | `x[0-28]`, `x30` | `r` | `i8`, `i16`, `i32`, `i64` |
-| AArch64 | `vreg` | `v[0-31]` | `w` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
-| AArch64 | `vreg_low` | `v[0-15]` | `x` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
-| AArch64 | `vreg_low8` | `v[0-7]` | `y` | `i8`, `i16`, `i32`, `i64`, `f32`, `f64`, `v64`, `v128` |
-| ARM (ARM/Thumb2) | `reg` | `r[0-r10]`, `r12`, `r14` | `r` | `i8`, `i16`, `i32` |
-| ARM (Thumb1) | `reg` | `r[0-r7]` | `r` | `i8`, `i16`, `i32` |
-| ARM | `vreg` | `s[0-31]`, `d[0-31]`, `q[0-15]` | `w` | `f32`, `f64`, `v64`, `v128` |
-| ARM | `vreg_low` | `s[0-31]`, `d[0-15]`, `q[0-7]` | `t` | `f32`, `f64`, `v64`, `v128` |
-| ARM | `vreg_low8` | `s[0-15]`, `d[0-8]`, `q[0-3]` | `x` | `f32`, `f64`, `v64`, `v128` |
-| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` | `i8`, `i16`, `i32`, `i64` (RV64 only) |
-| RISC-V | `vreg` | `f[0-31]` | `f` | `f32`, `f64` |
-
-> Notes on allowed types:
-> - Pointers and references are allowed where the equivalent integer type is allowed.
-> - `iLEN` refers to both signed and unsigned integer types. It also implicitly includes `isize` and `usize` where the length matches.
-> - Fat pointers are not allowed.
-> - `vLEN` refers to a SIMD vector that is `LEN` bits wide.
-
-Additional constraint specifications may be added in the future based on demand for additional register classes (e.g. MMX, x87, etc).
+| Architecture | Register class | Register width | Registers | LLVM constraint code |
+| ------------ | -------------- | -------------- | --------- | -------------------- |
+| x86 | `reg` | 32 / 64 | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` |
+| x86 | `reg_abcd` | 32 / 64 | `ax`, `bx`, `cx`, `dx` | `Q` |
+| x86 (SSE) | `xmm_reg` | 128 | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` |
+| x86 (AVX2) | `ymm_reg` | 256 | `ymm[0-7]` (x86) `ymm[0-15]` (x86-64) | `x` |
+| x86 (AVX-512) | `zmm_reg` | 512 | `zmm[0-7]` (x86) `zmm[0-31]` (x86-64) | `v` |
+| x86 (AVX-512) | `kreg` | 64 | `k[1-7]` | `Yk` |
+| AArch64 | `reg` | 64 | `x[0-28]`, `x30` | `r` |
+| AArch64 | `vreg` | 128 | `v[0-31]` | `w` |
+| AArch64 | `vreg_low16` | 128 | `v[0-15]` | `x` |
+| ARM | `reg` | 32 | `r[0-r10]`, `r12`, `r14` | `r` |
+| ARM (Thumb) | `reg_thumb` | 32 | `r[0-r7]` | `l` |
+| ARM (ARM) | `reg_thumb` | 32 | `r[0-r10]`, `r12`, `r14` | `l` |
+| ARM | `sreg` | 32 | `s[0-31]` | `t` |
+| ARM | `sreg_low16` | 32 | `s[0-15]` | `x` |
+| ARM | `dreg` | 64 | `d[0-31]` | `w` |
+| ARM | `dreg_low16` | 64 | `d[0-15]` | `t` |
+| ARM | `dreg_low8` | 64 | `d[0-8]` | `x` |
+| ARM | `qreg` | 128 | `q[0-15]` | `w` |
+| ARM | `qreg_low8` | 128 | `q[0-7]` | `t` |
+| ARM | `qreg_low4` | 128 | `q[0-3]` | `x` |
+| RISC-V | `reg` | 32 / 64 | `x1`, `x[5-7]`, `x[9-31]` | `r` |
+| RISC-V | `freg` | 64 | `f[0-31]` | `f` |
+
+Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 
 Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:
 
@@ -493,44 +498,49 @@ Some registers cannot be used for input or output operands:
 
 The placeholders can be augmented by modifiers which are specified after the `:` in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.
 
-The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod].
-
-| Architecture | Register class | Modifier | Input type | Example output |
-| ------------ | -------------- | -------- | ---------- | -------------- |
-| x86 | `reg` | None | `i8` | `al` |
-| x86 | `reg` | None | `i16` | `ax` |
-| x86 | `reg` | None | `i32` | `eax` |
-| x86 | `reg` | None | `i64` | `rax` |
-| x86-32 | `reg_abcd` | `b` | Any | `al` |
-| x86-64 | `reg` | `b` | Any | `al` |
-| x86 | `reg_abcd` | `h` | Any | `ah` |
-| x86 | `reg` | `w` | Any | `ax` |
-| x86 | `reg` | `k` | Any | `eax` |
-| x86-64 | `reg` | `q` | Any | `rax` |
-| x86 | `vreg` | None | `i32`, `i64`, `f32`, `f64`, `v128` | `xmm0` |
-| x86 (AVX) | `vreg` | None | `v256` | `ymm0` |
-| x86 (AVX-512) | `vreg` | None | `v512` | `zmm0` |
-| x86 (AVX-512) | `kreg` | None | Any | `k1` |
-| AArch64 | `reg` | None | Any | `x0` |
-| AArch64 | `reg` | `w` | Any | `w0` |
-| AArch64 | `reg` | `x` | Any | `x0` |
-| AArch64 | `vreg` | None | Any | `v0` |
-| AArch64 | `vreg` | `b` | Any | `b0` |
-| AArch64 | `vreg` | `h` | Any | `h0` |
-| AArch64 | `vreg` | `s` | Any | `s0` |
-| AArch64 | `vreg` | `d` | Any | `d0` |
-| AArch64 | `vreg` | `q` | Any | `q0` |
-| ARM | `reg` | None | Any | `r0` |
-| ARM | `vreg` | None | `f32` | `s0` |
-| ARM | `vreg` | None | `f64`, `v64` | `d0` |
-| ARM | `vreg` | None | `v128` | `q0` |
-| ARM | `vreg` | `e` / `f` | `v128` | `d0` / `d1` |
-| RISC-V | `reg` | None | Any | `x1` |
-| RISC-V | `vreg` | None | Any | `f0` |
+The supported modifiers are a subset of LLVM's (and GCC's) [asm template argument modifiers][llvm-argmod], but do not use the same letter codes.
+
+| Architecture | Register class | Modifier | Example output | LLVM modifier |
+| ------------ | -------------- | -------- | -------------- | ------------- |
+| x86-32 | `reg` | None | `eax` | `w` |
+| x86-64 | `reg` | None | `rax` | `q` |
+| x86-32 | `reg_abcd` | `l` | `al` | `b` |
+| x86-64 | `reg` | `l` | `al` | `b` |
+| x86 | `reg_abcd` | `h` | `ah` | `h` |
+| x86 | `reg` | `x` | `ax` | `h` |
+| x86 | `reg` | `e` | `eax` | `w` |
+| x86-64 | `reg` | `r` | `rax` | `q` |
+| x86 | `xmm_reg` | None | `xmm0` | `x` |
+| x86 | `ymm_reg` | None | `ymm0` | `t` |
+| x86 | `zmm_reg` | None | `zmm0` | `g` |
+| x86 | `*mm_reg` | `x` | `xmm0` | `x` |
+| x86 | `*mm_reg` | `y` | `ymm0` | `t` |
+| x86 | `*mm_reg` | `z` | `zmm0` | `g` |
+| x86 | `kreg` | None | `k1` | None |
+| AArch64 | `reg` | None | `x0` | `x` |
+| AArch64 | `reg` | `w` | `w0` | `w` |
+| AArch64 | `reg` | `x` | `x0` | `x` |
+| AArch64 | `vreg` | None | `v0` | None |
+| AArch64 | `vreg` | `v` | `v0` | None |
+| AArch64 | `vreg` | `b` | `b0` | `b` |
+| AArch64 | `vreg` | `h` | `h0` | `h` |
+| AArch64 | `vreg` | `s` | `s0` | `s` |
+| AArch64 | `vreg` | `d` | `d0` | `d` |
+| AArch64 | `vreg` | `q` | `q0` | `q` |
+| ARM | `reg` | None | `r0` | None |
+| ARM | `sreg` | None | `s0` | None |
+| ARM | `dreg` | None | `d0` | `P` |
+| ARM | `qreg` | None | `q0` | `q` |
+| ARM | `qreg` | `e` / `f` | `d0` / `d1` | `e` / `f` |
+| RISC-V | `reg` | None | `x1` | None |
+| RISC-V | `freg` | None | `f0` | None |
 
 > Notes:
 > - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.
-> - on AArch64 `reg`: a warning is emitted if the input type is smaller than 64 bits, suggesting to use the `w` modifier. The warning can be suppressed by explicitly using the `x` modifier.
+> - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the largest size.
+> - on x86 `xmm_reg`: the `x`, `t` and `g` LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
+
+As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done using template modifiers. Since this an easy pitfall, the compiler will warn if a value smaller than the register width is used as an input or output. However this warning is suppressed if all uses of the operand in the template string explicitly specify a modifier, even if this modifier is already the default.
 
 [llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers
 
@@ -559,24 +569,18 @@ The direction specification maps to a LLVM constraint specification as follows (
 
 If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].
 
-As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:
+As written this RFC requires architectures to map from Rust constraint specifications to LLVM [constraint codes][llvm-constraint]. This is in part for better readability on Rust's side and in part for independence of the backend:
 
 * Register classes are mapped to the appropriate constraint code as per the table above.
 * `const` operands are formatted and injected directly into the asm string.
-* `sym` is mapped to `s` for statics and `X` for functions.
+* `sym` is mapped to `s` for statics and `X` for functions. We automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
 * a register name `r1` is mapped to `{r1}`
-* additionally mappings for register classes are added as appropriate (cf. [llvm-constraint])
 * `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).
 * If the `nomem` option is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
 * If the `preserves_flags` option is not set then the following are added to the clobber list:
   - (x86) `~{dirflag}~{flags}~{fpsr}`
   - (ARM/AArch64) `~{cc}`
 
-For some operand types, we will automatically insert some modifiers into the template string.
-* For `sym` operands, we automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
-* On AArch64, we will warn if a value smaller than 64 bits is used without a modifier since this is likely a bug (it will produce `x*` instead of `w*`). Clang has this same warning.
-* On ARM, we will automatically add the `P` or `q` LLVM modifier for `f64`, `v64` and `v128` passed into a `vreg`. This will cause those registers to be formatted as `d*` and `q*` respectively.
-
 Additionally, the following attributes are added to the LLVM `asm` statement:
 
 * The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).

From d82b70db5ea4e0478f33c517cf34a0b1410f3627 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 9 Feb 2020 15:34:04 +0100
Subject: [PATCH 12/68] Fix minor typos

---
 text/0000-inline-asm.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index cea8e6a8438..0fe693d0954 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -74,7 +74,7 @@ In this case we put it in an arbitrary general purpose register by specifying `r
 The compiler will choose an appropriate register to insert into
 the template and will read the variable from there after the inline assembly finishes executing.
 
-Let see another example that also uses an input:
+Let us see another example that also uses an input:
 
 ```rust
 let i: u32 = 3;
@@ -273,7 +273,7 @@ In some cases, fine control is needed over the way a register name is formatted
 let mut x: u16 = 0xab;
 
 unsafe {
-    asm!("mov {0:h} {0:b}", inout(reg_abcd) x);
+    asm!("mov {0:h}, {0:b}", inout(reg_abcd) x);
 }
 
 assert_eq!(x, 0xabab);
@@ -390,7 +390,7 @@ Several types of operands are supported:
 
 Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
 
-Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register two input operand or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
+Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operand or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
 
 Only the following types are allowed as operands for inline assembly:
 - Integers (signed and unsigned)

From 2adfde84446e3dff236eeb65665c7ebe86efbcdc Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 11 Feb 2020 15:13:39 +0100
Subject: [PATCH 13/68] Disallow specifying the same register as both an input
 and an output (use inout instead)

---
 text/0000-inline-asm.md | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 0fe693d0954..8c24693c7c0 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -199,8 +199,9 @@ fn mul(a: u32, b: u32) -> u64 {
             // The x86 mul instruction takes eax as an implicit input and writes
             // the 64-bit result of the multiplication to eax:edx.
             "mul {}",
-            in(reg) a, in("eax") b,
-            lateout("eax") lo, lateout("edx") hi
+            in(reg) a,
+            inlateout("eax") b => lo,
+            lateout("edx") hi
         );
     }
 
@@ -214,8 +215,6 @@ The second operand is implicit, and must be the `eax` register, which we fill fr
 The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.
 The higher 32 bits are stored in `edx` from which we fill the variable `hi`.
 
-Note that `lateout` must be used for `eax` here since we are specifying the same register as both an input and an output.
-
 ## Clobbered registers
 
 In many cases inline assembly will modify state that is not needed as an output.
@@ -232,9 +231,8 @@ let ecx: u32;
 unsafe {
     asm!(
         "cpuid",
-        in("eax") 4, in("ecx") 0,
-        lateout("ebx") ebx, lateout("ecx") ecx,
-        lateout("eax") _, lateout("edx") _
+        inout("eax") 4 => _, inout("ecx") 0 => ecx,
+        lateout("ebx") ebx, lateout("edx") _
     );
 }
 
@@ -365,7 +363,6 @@ Several types of operands are supported:
 * `lateout(<reg>) <expr>`
   - Identical to `out` except that the register allocator can reuse a register allocated to an `in`.
   - You should only write to the register after all inputs are read, otherwise you may clobber an input.
-  - `lateout` must be used instead of `out` if you are specifying the same explicit register as an `in`.
 * `inout(<reg>) <expr>`
   - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
   - The allocated register will contain the value of `<expr>` at the start of the asm code.
@@ -390,7 +387,7 @@ Several types of operands are supported:
 
 Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
 
-Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operand or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
+Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two different operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in different operands.
 
 Only the following types are allowed as operands for inline assembly:
 - Integers (signed and unsigned)

From 89cbf28c2470ca386f9263e6f93029f7d445bdd4 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 11 Feb 2020 15:30:30 +0100
Subject: [PATCH 14/68] Clarify implicit operands that come after named
 operands

---
 text/0000-inline-asm.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 8c24693c7c0..ab5025cbda9 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -337,14 +337,16 @@ asm := "asm!(" format_string *("," [ident "="] operand) ["," options] ")"
 
 The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.
 
+As with format strings, named arguments must appear after positional arguments. However additional unnamed arguments may appear after named arguments: these are implicit arguments, which cannot be addressed using template placeholders but may be used to specify fixed register inputs or outputs.
+
+The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.
+
 The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.
 
 This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.
 
 However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
 
-The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.
-
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 
 ## Operand type

From 5884b7fbdbe86fff1f694d948cca35d98a2aeffc Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 15 Feb 2020 20:51:01 +0000
Subject: [PATCH 15/68] Expand on default modifier used for registers

---
 text/0000-inline-asm.md | 76 ++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 36 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index ab5025cbda9..4c805652c25 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -56,14 +56,14 @@ Now inserting an instruction that does nothing is rather boring. Let us do somet
 actually acts on data:
 
 ```rust
-let x: u32;
+let x: u64;
 unsafe {
     asm!("mov {}, 5", out(reg) x);
 }
 assert_eq!(x, 5);
 ```
 
-This will write the value `5` into the `u32` variable `x`.
+This will write the value `5` into the `u64` variable `x`.
 You can see that the string literal we use to specify instructions is actually a template string.
 It is governed by the same rules as Rust [format strings][format-syntax].
 The arguments that are inserted into the template however look a bit different then you may
@@ -77,8 +77,8 @@ the template and will read the variable from there after the inline assembly fin
 Let us see another example that also uses an input:
 
 ```rust
-let i: u32 = 3;
-let o: u32;
+let i: u64 = 3;
+let o: u64;
 unsafe {
     asm!("
         mov {0}, {1}
@@ -108,7 +108,7 @@ readability, and allows reordering instructions without changing the argument or
 We can further refine the above example to avoid the `mov` instruction:
 
 ```rust
-let mut x: u32 = 3;
+let mut x: u64 = 3;
 unsafe {
     asm!("add {0}, {number}", inout(reg) x, number = const 5);
 }
@@ -121,8 +121,8 @@ This is different from specifying an input and output separately in that it is g
 It is also possible to specify different variables for the input and output parts of an `inout` operand:
 
 ```rust
-let x: u32 = 3;
-let y: u32;
+let x: u64 = 3;
+let y: u64;
 unsafe {
     asm!("add {0}, {number}", inout(reg) x => y, number = const 5);
 }
@@ -142,9 +142,9 @@ There is also a `inlateout` variant of this specifier.
 Here is an example where `inlateout` *cannot* be used:
 
 ```rust
-let mut a = 4;
-let b = 4;
-let c = 4;
+let mut a: u64 = 4;
+let b: u64 = 4;
+let c: u64 = 4;
 unsafe {
     asm!("
         add {0}, {1}
@@ -159,8 +159,8 @@ Here the compiler is free to allocate the same register for inputs `b` and `c` s
 However the following example can use `inlateout` since the output is only modified after all input registers have been read:
 
 ```rust
-let mut a = 4;
-let b = 4;
+let mut a: u64 = 4;
+let b: u64 = 4;
 unsafe {
     asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);
 }
@@ -178,42 +178,42 @@ among others can be addressed by their name.
 
 ```rust
 unsafe {
-    asm!("out 0x64, {}", in("eax") cmd);
+    asm!("out 0x64, {}", in("rax") cmd);
 }
 ```
 
 In this example we call the `out` instruction to output the content of the `cmd` variable
-to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand
-we had to use the `eax` constraint specifier.
+to port `0x64`. Since the `out` instruction only accepts `rax` (and its sub registers) as operand
+we had to use the `rax` constraint specifier.
 
 It is somewhat common that instructions have operands that are not explicitly listed in the
 assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:
 
 ```rust
-fn mul(a: u32, b: u32) -> u64 {
-    let lo: u32;
-    let hi: u32;
+fn mul(a: u64, b: u64) -> u128 {
+    let lo: u64;
+    let hi: u64;
 
     unsafe {
         asm!(
-            // The x86 mul instruction takes eax as an implicit input and writes
-            // the 64-bit result of the multiplication to eax:edx.
+            // The x86 mul instruction takes rax as an implicit input and writes
+            // the 128-bit result of the multiplication to rax:rdx.
             "mul {}",
             in(reg) a,
-            inlateout("eax") b => lo,
-            lateout("edx") hi
+            inlateout("rax") b => lo,
+            lateout("rdx") hi
         );
     }
 
-    hi as u64 << 32 + lo as u64
+    hi as u128 << 64 + lo as u128
 }
 ```
 
-This uses the `mul` instruction to multiply two 32-bit inputs with a 64-bit result.
+This uses the `mul` instruction to multiply two 64-bit inputs with a 128-bit result.
 The only explicit operand is a register, that we fill from the variable `a`.
-The second operand is implicit, and must be the `eax` register, which we fill from the variable `b`.
-The lower 32 bits of the result are stored in `eax` from which we fill the variable `lo`.
-The higher 32 bits are stored in `edx` from which we fill the variable `hi`.
+The second operand is implicit, and must be the `rax` register, which we fill from the variable `b`.
+The lower 64 bits of the result are stored in `rax` from which we fill the variable `lo`.
+The higher 64 bits are stored in `rdx` from which we fill the variable `hi`.
 
 ## Clobbered registers
 
@@ -225,8 +225,8 @@ We need to tell the compiler about this since it may need to save and restore th
 around the inline assembly block.
 
 ```rust
-let ebx: u32;
-let ecx: u32;
+let ebx: u64;
+let ecx: u64;
 
 unsafe {
     asm!(
@@ -251,7 +251,7 @@ This can also be used with a general register class (e.g. `reg`) to obtain a scr
 
 ```rust
 // Multiply x by 6 using shifts and adds
-let mut x = 4;
+let mut x: u64 = 4;
 unsafe {
     asm!("
         mov {tmp}, {x}
@@ -267,6 +267,10 @@ assert_eq!(x, 4 * 6);
 
 In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
 
+By default the compiler will always choose the name that refers to the full register size (e.g. `rax` on x86-64, `eax` on x86, etc). This is the case even if you pass in a smaller data type (e.g. `u16`) or if you explicitly specify a register (e.g. `in("cx")` will be rendered as `rcx` by default).
+
+This default can be overriden by using modifiers on the template string operands, just like you would with format strings:
+
 ```rust
 let mut x: u16 = 0xab;
 
@@ -289,8 +293,8 @@ By default, an inline assembly block is treated the same way as an external FFI
 Let's take our previous example of an `add` instruction:
 
 ```rust
-let mut a = 4;
-let b = 4;
+let mut a: u64 = 4;
+let b: u64 = 4;
 unsafe {
     asm!(
         "add {0}, {1}",
@@ -842,12 +846,12 @@ On the other hand by necessity this splits the direction and constraint specific
 the variable name, which makes this syntax overall harder to read.
 
 ```rust
-fn mul(a: u32, b: u32) -> u64 {
-    let (lo, hi) = unsafe {
-        asm!("mul {}", in(reg) a, in("eax") b, lateout("eax"), lateout("edx"))
+fn mul(a: u64, b: u64) -> u128 {
+    let (lo, hi): (u64, u64) = unsafe {
+        asm!("mul {}", in(reg) a, in("rax") b, lateout("rax"), lateout("rdx"))
     };
 
-    hi as u64 << 32 + lo as u64
+    hi as u128 << 64 + lo as u128
 }
 ```
 

From f427d175a56a2b42191e334fe17c10b5309c4112 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 15 Feb 2020 13:13:46 +0000
Subject: [PATCH 16/68] Explicit register operands can't be used in the
 template

---
 text/0000-inline-asm.md | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 4c805652c25..bb6d9a84e59 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -178,7 +178,7 @@ among others can be addressed by their name.
 
 ```rust
 unsafe {
-    asm!("out 0x64, {}", in("rax") cmd);
+    asm!("out 0x64, rax", in("rax") cmd);
 }
 ```
 
@@ -186,8 +186,11 @@ In this example we call the `out` instruction to output the content of the `cmd`
 to port `0x64`. Since the `out` instruction only accepts `rax` (and its sub registers) as operand
 we had to use the `rax` constraint specifier.
 
+Note that unlike other operand types, explicit register operands cannot be used in the template string: you can't use `{}` and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
+
 It is somewhat common that instructions have operands that are not explicitly listed in the
-assembly (template). Hence, unlike in regular formatting macros, we support excess arguments:
+assembly (template). By default all operands must be used in the template string, but it is possible
+to opt-out of this by giving an operand the name `_`:
 
 ```rust
 fn mul(a: u64, b: u64) -> u128 {
@@ -231,8 +234,10 @@ let ecx: u64;
 unsafe {
     asm!(
         "cpuid",
-        inout("eax") 4 => _, inout("ecx") 0 => ecx,
-        lateout("ebx") ebx, lateout("edx") _
+        inout("eax") 4 => _,
+        inout("ecx") 0 => ecx,
+        lateout("ebx") ebx,
+        lateout("edx") _
     );
 }
 
@@ -341,9 +346,7 @@ asm := "asm!(" format_string *("," [ident "="] operand) ["," options] ")"
 
 The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.
 
-As with format strings, named arguments must appear after positional arguments. However additional unnamed arguments may appear after named arguments: these are implicit arguments, which cannot be addressed using template placeholders but may be used to specify fixed register inputs or outputs.
-
-The compiler will lint against any operands that are not used in the template string, except for operands that specify an explicit register.
+As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after any named arguments if any. Explicit register operands cannot be used by placeholders in the template string. All other operands must appear at least once in the template string, otherwise a compiler error is generated.
 
 The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.
 

From feecfa197b19f141bb50d221401255ea8ea9b4c5 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 16 Feb 2020 10:26:27 +0000
Subject: [PATCH 17/68] Add rationale for not supporting AT&T syntax

---
 text/0000-inline-asm.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index bb6d9a84e59..f31658daf13 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -858,6 +858,17 @@ fn mul(a: u64, b: u64) -> u128 {
 }
 ```
 
+## Use AT&T syntax on x86
+
+x86 is particular in that there are [two widely used dialects] for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a [different syntax][gas-syntax]. This RFC chooses to use Intel syntax since it is more widely used and users generally find it easier to read and write.
+
+Note however that it is relatively easy to add support for AT&T using a proc macro (e.g. `asm_att!()`) which wraps around `asm!`. Only two transformations are needed:
+- A `%` needs to be added in front of register operands in the template string.
+- The `.att_syntax prefix` directive should be inserted at the start of the template string to switch the assembler to AT&T mode.
+- The `.intel_syntax noprefix` directive should be inserted at the end of the template string to restore the assembler to Intel mode.
+
+[gas-syntax]: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
+
 # Prior art
 [prior-art]: #prior-art
 

From 15c7aa88819baf2c794ec3e7d658f9e0cda4c8c4 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 16 Feb 2020 17:06:35 +0000
Subject: [PATCH 18/68] Add rationale for not validating the generating asm in
 rustc.

---
 text/0000-inline-asm.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index f31658daf13..b31eb440221 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -869,6 +869,18 @@ Note however that it is relatively easy to add support for AT&T using a proc mac
 
 [gas-syntax]: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
 
+## Validate the assembly code in rustc
+
+There may be some slight differences in the set of assembly code that is accepted by different compiler back-ends (e.g. LLVM's integrated assembler vs using GAS as an external assembler). Examples of such differences are:
+
+- LLVM's [assembly extensions][llvm-asm-ext]
+- Linking against the system LLVM instead of rustc's, which may/may not support some newer instructions.
+- GAS or LLVM introducing new assembler directives.
+
+While it might be possible for rustc to verify that inline assembly code conforms to a minimal stable subset of the assembly syntax supported by LLVM and GAS, doing so would effectively require rustc to parse the assembly code itself. Implementing a full assembler for all target architectures supported by this RFC is a huge amount of work, most of which is redundant with the work that LLVM has already done in implementing an assembler. As such, this RFC does not propose that rustc perform any validation of the generated assembly code.
+
+[llvm-asm-ext]: https://llvm.org/docs/Extensions.html#machine-specific-assembly-syntax
+
 # Prior art
 [prior-art]: #prior-art
 

From 7d066ce32cee39d3f756906df3cec8ae299407da Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 24 Feb 2020 23:23:20 +0000
Subject: [PATCH 19/68] Remove restriction on sym needing to be from current
 crate.

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b31eb440221..935af58689a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -388,7 +388,7 @@ Several types of operands are supported:
   - `<expr>` must be an integer or floating-point constant expression.
   - The value of the expression is formatted as a string and substituted directly into the asm template string.
 * `sym <path>`
-  - `<path>` must refer to a `fn` or `static` defined in the current crate.
+  - `<path>` must refer to a `fn` or `static`.
   - A mangled symbol name referring to the item is substituted into the asm template string.
   - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).
 

From 44853afbee70f7c99db800b0287447a4617f65e0 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 24 Feb 2020 23:24:02 +0000
Subject: [PATCH 20/68] Fix typos

---
 text/0000-inline-asm.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 935af58689a..2dbafa48b3d 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -508,13 +508,13 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 
 | Architecture | Register class | Modifier | Example output | LLVM modifier |
 | ------------ | -------------- | -------- | -------------- | ------------- |
-| x86-32 | `reg` | None | `eax` | `w` |
+| x86-32 | `reg` | None | `eax` | `k` |
 | x86-64 | `reg` | None | `rax` | `q` |
 | x86-32 | `reg_abcd` | `l` | `al` | `b` |
 | x86-64 | `reg` | `l` | `al` | `b` |
 | x86 | `reg_abcd` | `h` | `ah` | `h` |
-| x86 | `reg` | `x` | `ax` | `h` |
-| x86 | `reg` | `e` | `eax` | `w` |
+| x86 | `reg` | `x` | `ax` | `w` |
+| x86 | `reg` | `e` | `eax` | `k` |
 | x86-64 | `reg` | `r` | `rax` | `q` |
 | x86 | `xmm_reg` | None | `xmm0` | `x` |
 | x86 | `ymm_reg` | None | `ymm0` | `t` |
@@ -543,7 +543,7 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 
 > Notes:
 > - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.
-> - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the largest size.
+> - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.
 > - on x86 `xmm_reg`: the `x`, `t` and `g` LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
 
 As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done using template modifiers. Since this an easy pitfall, the compiler will warn if a value smaller than the register width is used as an input or output. However this warning is suppressed if all uses of the operand in the template string explicitly specify a modifier, even if this modifier is already the default.
@@ -571,7 +571,7 @@ The direction specification maps to a LLVM constraint specification as follows (
 * `out(reg)` => `=&r` (Rust's outputs are early-clobber outputs in LLVM/GCC terminology)
 * `inout(reg)` => `=&r,0` (an early-clobber output with an input tied to it, `0` here is a placeholder for the position of the output)
 * `lateout(reg)` => `=r` (Rust's late outputs are regular outputs in LLVM/GCC terminology)
-* `inlateout(reg)` => `=r, 0` (cf. `inout` and `lateout`)
+* `inlateout(reg)` => `=r,0` (cf. `inout` and `lateout`)
 
 If an `inout` is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See [this bug][issue-65452].
 
@@ -584,7 +584,7 @@ As written this RFC requires architectures to map from Rust constraint specifica
 * `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).
 * If the `nomem` option is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
 * If the `preserves_flags` option is not set then the following are added to the clobber list:
-  - (x86) `~{dirflag}~{flags}~{fpsr}`
+  - (x86) `~{dirflag},~{flags},~{fpsr}`
   - (ARM/AArch64) `~{cc}`
 
 Additionally, the following attributes are added to the LLVM `asm` statement:

From b5d854d69cd4050924552972755a085154d8e618 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 24 Feb 2020 23:24:12 +0000
Subject: [PATCH 21/68] Fix some details to match WIP implementation

---
 text/0000-inline-asm.md | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 2dbafa48b3d..7aa70ddf5e2 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -391,17 +391,18 @@ Several types of operands are supported:
   - `<path>` must refer to a `fn` or `static`.
   - A mangled symbol name referring to the item is substituted into the asm template string.
   - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).
+  - `<path>` is allowed to point to a `#[thread_local]` static, in which case the asm code can combine the symbol with relocations (e.g. `@TPOFF`) to read from thread-local data.
 
 ## Register operands
 
 Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
 
-Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two different operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in different operands.
+Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operands or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
 
 Only the following types are allowed as operands for inline assembly:
 - Integers (signed and unsigned)
 - Floating-point numbers
-- Pointers and references (thin only)
+- Pointers (thin only)
 - Function pointers
 - SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`)
 
@@ -554,14 +555,17 @@ As stated in the previous section, passing an input value smaller than the regis
 
 Flags are used to further influence the behavior of the inline assembly block.
 Currently the following options are defined:
-- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used. A warning is emitted if this option is used on an `asm` with no outputs.
+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used.
 - `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
 - `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
 - `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
 - `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
 - `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
 
-The `nomem` and `readonly` options are mutually exclusive: it is a compile-time error to specify both. Specifying `pure` on an asm block with no outputs is linted against since such a block will be optimized away to nothing.
+The compiler performs some additional checks on options:
+- The `nomem` and `readonly` options are mutually exclusive: it is a compile-time error to specify both.
+- It is a compile-time error to specify `pure` on an asm block with no outputs or only discarded outputs (`_`).
+- It is a compile-time error to specify `noreturn` on an asm block with outputs.
 
 ## Mapping to LLVM IR
 
@@ -579,9 +583,8 @@ As written this RFC requires architectures to map from Rust constraint specifica
 
 * Register classes are mapped to the appropriate constraint code as per the table above.
 * `const` operands are formatted and injected directly into the asm string.
-* `sym` is mapped to `s` for statics and `X` for functions. We automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
+* `sym` is mapped to the `s` constraint code. We automatically insert the `c` modifier which removes target-specific modifiers from the value (e.g. `#` on ARM).
 * a register name `r1` is mapped to `{r1}`
-* `lateout` operands with an `_` expression that are specified as an explicit register are converted to LLVM clobber constraints. For example, `lateout("r1") _` is mapped to `~{r1}` (cf. [llvm-clobber]).
 * If the `nomem` option is not set then `~{memory}` is added to the clobber list. (Although this is currently ignored by LLVM)
 * If the `preserves_flags` option is not set then the following are added to the clobber list:
   - (x86) `~{dirflag},~{flags},~{fpsr}`
@@ -601,7 +604,6 @@ If the `noreturn` option is set then an `unreachable` LLVM instruction is insert
 > Note that `alignstack` is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.
 
 [llvm-constraint]: http://llvm.org/docs/LangRef.html#supported-constraint-code-list
-[llvm-clobber]: http://llvm.org/docs/LangRef.html#clobber-constraints
 [issue-65452]: https://github.com/rust-lang/rust/issues/65452
 
 ## Supporting back-ends without inline assembly

From 6b3a12938a22e840a064619ac10f78e92a57f0b2 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 25 Feb 2020 08:15:15 +0000
Subject: [PATCH 22/68] Clarify that the compiler must treat asm! as a black
 box

---
 text/0000-inline-asm.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 7aa70ddf5e2..f6933e0c5e9 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -696,6 +696,9 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
   - Refer to the unsafe code guidelines for the exact rules.
   - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
   - If the `nomem` option is set then no reads or write to memory are allowed.
+- On targets that support modifying code at runtime, the compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
+  - This effectively means that the compiler must treat the `asm!` as a black box and only take the interface specification into account, not the instructions themselves.
+  - Runtime code patch is allowed, via target-specific mechanisms (outside the scope of this RFC).
 - Unless the `nostack` option is set, asm code is allowed to use stack space below the stack pointer.
   - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
   - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).

From d83a1e10886a8898a81bf31e97b41ec74a171f17 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 25 Feb 2020 12:59:19 +0000
Subject: [PATCH 23/68] Add more examples of asm! usage to motivation

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index f6933e0c5e9..79b54cc4852 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -21,7 +21,7 @@ In systems programming some tasks require dropping down to the assembly level. T
 
 The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.
 
-A collection of use cases for inline asm can be found in [this repository][catalogue].
+Inline assembly is widely used in the Rust community and is one of the top reasons keeping people on the nightly toolchain. Examples of crates using inline assembly include `cortex-m`, `x86`, `riscv`, `parking_lot`, `libprobe`, `msp430`, etc. A collection of use cases for inline asm can also be found in [this repository][catalogue].
 
 [catalogue]: https://github.com/bjorn3/inline_asm_catalogue/
 

From 904b71e334d9906d07b2de44158b3b852bd598eb Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 28 Feb 2020 21:43:28 +0000
Subject: [PATCH 24/68] Add type whitelist for each register class

---
 text/0000-inline-asm.md | 82 +++++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 27 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 79b54cc4852..e6ba6bc2f85 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -406,37 +406,65 @@ Only the following types are allowed as operands for inline assembly:
 - Function pointers
 - SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`)
 
-Each register class has a width which limits the size of operands that can be passed through that register class. If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. It is a compile-time error for a value to be of a larger size than the register it is allocated in.
-
 Here is the list of currently supported register classes:
 
-| Architecture | Register class | Register width | Registers | LLVM constraint code |
-| ------------ | -------------- | -------------- | --------- | -------------------- |
-| x86 | `reg` | 32 / 64 | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` |
-| x86 | `reg_abcd` | 32 / 64 | `ax`, `bx`, `cx`, `dx` | `Q` |
-| x86 (SSE) | `xmm_reg` | 128 | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` |
-| x86 (AVX2) | `ymm_reg` | 256 | `ymm[0-7]` (x86) `ymm[0-15]` (x86-64) | `x` |
-| x86 (AVX-512) | `zmm_reg` | 512 | `zmm[0-7]` (x86) `zmm[0-31]` (x86-64) | `v` |
-| x86 (AVX-512) | `kreg` | 64 | `k[1-7]` | `Yk` |
-| AArch64 | `reg` | 64 | `x[0-28]`, `x30` | `r` |
-| AArch64 | `vreg` | 128 | `v[0-31]` | `w` |
-| AArch64 | `vreg_low16` | 128 | `v[0-15]` | `x` |
-| ARM | `reg` | 32 | `r[0-r10]`, `r12`, `r14` | `r` |
-| ARM (Thumb) | `reg_thumb` | 32 | `r[0-r7]` | `l` |
-| ARM (ARM) | `reg_thumb` | 32 | `r[0-r10]`, `r12`, `r14` | `l` |
-| ARM | `sreg` | 32 | `s[0-31]` | `t` |
-| ARM | `sreg_low16` | 32 | `s[0-15]` | `x` |
-| ARM | `dreg` | 64 | `d[0-31]` | `w` |
-| ARM | `dreg_low16` | 64 | `d[0-15]` | `t` |
-| ARM | `dreg_low8` | 64 | `d[0-8]` | `x` |
-| ARM | `qreg` | 128 | `q[0-15]` | `w` |
-| ARM | `qreg_low8` | 128 | `q[0-7]` | `t` |
-| ARM | `qreg_low4` | 128 | `q[0-3]` | `x` |
-| RISC-V | `reg` | 32 / 64 | `x1`, `x[5-7]`, `x[9-31]` | `r` |
-| RISC-V | `freg` | 64 | `f[0-31]` | `f` |
+| Architecture | Register class | Registers | LLVM constraint code |
+| ------------ | -------------- | --------- | -------------------- |
+| x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` |
+| x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` |
+| x86 | `xmm_reg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` |
+| x86 | `ymm_reg` | `ymm[0-7]` (x86) `ymm[0-15]` (x86-64) | `x` |
+| x86 | `zmm_reg` | `zmm[0-7]` (x86) `zmm[0-31]` (x86-64) | `v` |
+| x86 | `kreg` | `k[1-7]` | `Yk` |
+| AArch64 | `reg` | `x[0-28]`, `x30` | `r` |
+| AArch64 | `vreg` | `v[0-31]` | `w` |
+| AArch64 | `vreg_low16` | `v[0-15]` | `x` |
+| ARM | `reg` | `r[0-r10]`, `r12`, `r14` | `r` |
+| ARM (Thumb) | `reg_thumb` | `r[0-r7]` | `l` |
+| ARM (ARM) | `reg_thumb` | `r[0-r10]`, `r12`, `r14` | `l` |
+| ARM | `sreg` | `s[0-31]` | `t` |
+| ARM | `sreg_low16` | `s[0-15]` | `x` |
+| ARM | `dreg` | `d[0-31]` | `w` |
+| ARM | `dreg_low16` | `d[0-15]` | `t` |
+| ARM | `dreg_low8` | `d[0-8]` | `x` |
+| ARM | `qreg` | `q[0-15]` | `w` |
+| ARM | `qreg_low8` | `q[0-7]` | `t` |
+| ARM | `qreg_low4` | `q[0-3]` | `x` |
+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` |
+| RISC-V | `freg` | `f[0-31]` | `f` |
 
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 
+Each register class has constraints on which value types they can be used with. This is necessary because the way a value is loaded into a register depends on its type. For example, on big-endian systems, loading a `i32x4` and a `i8x16` into a SIMD register may result in different register contents even if the byte-wise memory representation of both values is identical. The availability of supported types for a particular register class may depend on what target features are currently enabled.
+
+| Architecture | Register class | Target feature | Allowed types |
+| ------------ | -------------- | -------------- | ------------- |
+| x86-32 | `reg` | None | `i8`, `i16`, `i32`, `f32` |
+| x86-64 | `reg` | None | `i8`, `i16`, `i32`, `f32`, `i64`, `f64` |
+| x86 | `xmm_reg` | `sse` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |
+| x86 | `ymm_reg` | `avx` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` <br> `i8x32`, `i16x16`, `i32x8`, `i64x4`, `f32x8`, `f64x4` |
+| x86 | `zmm_reg` | `avx512f` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` <br> `i8x32`, `i16x16`, `i32x8`, `i64x4`, `f32x8`, `f64x4` <br> `i8x64`, `i16x32`, `i32x16`, `i64x8`, `f32x16`, `f64x8` |
+| x86 | `kreg` | `axv512f` | `i8`, `i16` |
+| x86 | `kreg` | `axv512bw` | `i32`, `i64` |
+| AArch64 | `reg` | None | `i8`, `i16`, `i32`, `f32`, `i64`, `f64` |
+| AArch64 | `vreg` | `fp` | `i8`, `i16`, `i32`, `f32`, `i64`, `f64`, <br> `i8x8`, `i16x4`, `i32x2`, `i64x1`, `f32x2`, `f64x1`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |
+| ARM | `reg` | None | `i8`, `i16`, `i32`, `f32` |
+| ARM | `sreg` | `vfp2` | `i32`, `f32` |
+| ARM | `dreg` | `vfp2` | `i64`, `f64`, `i8x8`, `i16x4`, `i32x2`, `i64x1`, `f32x2` |
+| ARM | `qreg` | `neon` | `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4` |
+| RISC-V32 | `reg` | None | `i8`, `i16`, `i32`, `f32` |
+| RISC-V64 | `reg` | None | `i8`, `i16`, `i32`, `f32`, `i64`, `f64` |
+| RISC-V | `freg` | `f` | `f32` |
+| RISC-V | `freg` | `d` | `f64` |
+
+> Note: For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i32` or `i64`).
+
+If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the `freg` register class on RISC-V where `f32` values are NaN-boxed in a `f64` as required by the RISC-V architecture.
+
+When separate input and output expressions are specified for an `inout` operand, both expressions must have the same type. The only exception is if both operands are pointers or integers, in which case they are only required to have the same size. This restriction exists register allocators in LLVM and GCC cannot handle tied operands with different types in some situations.
+
+## Register names
+
 Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:
 
 | Architecture | Base register | Aliases |
@@ -547,7 +575,7 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 > - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.
 > - on x86 `xmm_reg`: the `x`, `t` and `g` LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
 
-As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done using template modifiers. Since this an easy pitfall, the compiler will warn if a value smaller than the register width is used as an input or output. However this warning is suppressed if all uses of the operand in the template string explicitly specify a modifier, even if this modifier is already the default.
+As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done by using a template modifier to use a subregister name in the asm code (e.g. `al` instead of `rax`). Since this an easy pitfall, the compiler will suggest a template modifier to use where appropriate given the input type. If all references to an operand already have modifiers then the warning is suppressed for that operand.
 
 [llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers
 

From a48259d979b877445da961bc5523726184df6642 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 28 Feb 2020 23:56:07 +0000
Subject: [PATCH 25/68] Fix typo

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index e6ba6bc2f85..2d2c4dbb19b 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -461,7 +461,7 @@ Each register class has constraints on which value types they can be used with.
 
 If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the `freg` register class on RISC-V where `f32` values are NaN-boxed in a `f64` as required by the RISC-V architecture.
 
-When separate input and output expressions are specified for an `inout` operand, both expressions must have the same type. The only exception is if both operands are pointers or integers, in which case they are only required to have the same size. This restriction exists register allocators in LLVM and GCC cannot handle tied operands with different types in some situations.
+When separate input and output expressions are specified for an `inout` operand, both expressions must have the same type. The only exception is if both operands are pointers or integers, in which case they are only required to have the same size. This restriction exists because the register allocators in LLVM and GCC sometimes cannot handle tied operands with different types.
 
 ## Register names
 

From 843b6cf8904c290ff39230ccf758a31573b5b90a Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 29 Feb 2020 16:41:29 +0000
Subject: [PATCH 26/68] x[16-31] don't exist on RV32E

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 2d2c4dbb19b..810958a13a8 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -430,7 +430,7 @@ Here is the list of currently supported register classes:
 | ARM | `qreg` | `q[0-15]` | `w` |
 | ARM | `qreg_low8` | `q[0-7]` | `t` |
 | ARM | `qreg_low4` | `q[0-3]` | `x` |
-| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-31]` | `r` |
+| RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-15]`, `x[16-31]` (non-RV32E) | `r` |
 | RISC-V | `freg` | `f[0-31]` | `f` |
 
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).

From a1467e2f5a8f5daf80c3c5c60e23332a5f99c8a0 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 1 Mar 2020 00:15:38 +0000
Subject: [PATCH 27/68] Clarify rules

---
 text/0000-inline-asm.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 810958a13a8..1eed905bcfb 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -366,7 +366,7 @@ Several types of operands are supported:
   - The allocated register must contain the same value at the end of the asm code (except if a `lateout` is allocated to the same register).
 * `out(<reg>) <expr>`
   - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
-  - The allocated register will contain an unknown value at the start of the asm code.
+  - The allocated register will contain an undefined value at the start of the asm code.
   - `<expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.
   - An underscore (`_`) may be specified instead of an expression, which will cause the contents of the register to be discarded at the end of the asm code (effectively acting as a clobber).
 * `lateout(<reg>) <expr>`
@@ -717,16 +717,18 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 [rules]: #rules
 
 - Any registers not specified as inputs will contain an undefined value on entry to the asm block.
-- Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry.
-  - This only applies to registers which can be specified as an input or output.
+- Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry, otherwise behavior is undefined.
+  - This only applies to registers which can be specified as an input or output. Other registers follow target-specific rules and are outside the scope of this RFC.
+  - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.
 - Behavior is undefined if execution unwinds out of an asm block.
 - Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
   - Refer to the unsafe code guidelines for the exact rules.
   - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
   - If the `nomem` option is set then no reads or write to memory are allowed.
-- On targets that support modifying code at runtime, the compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
+  - These rules do not apply to memory which is private to the asm code, such as stack space allocated within the asm block.
+- The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
   - This effectively means that the compiler must treat the `asm!` as a black box and only take the interface specification into account, not the instructions themselves.
-  - Runtime code patch is allowed, via target-specific mechanisms (outside the scope of this RFC).
+  - Runtime code patching is allowed, via target-specific mechanisms (outside the scope of this RFC).
 - Unless the `nostack` option is set, asm code is allowed to use stack space below the stack pointer.
   - On entry to the asm block the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
   - You are responsible for making sure you don't overflow the stack (e.g. use stack probing to ensure you hit a guard page).

From 0f4ffff427643b299e285f7b4f82fc2ba8ee6a87 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 1 Mar 2020 00:18:40 +0000
Subject: [PATCH 28/68] Pointers are i16 on some targets

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 1eed905bcfb..2a1a86ac12a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -457,7 +457,7 @@ Each register class has constraints on which value types they can be used with.
 | RISC-V | `freg` | `f` | `f32` |
 | RISC-V | `freg` | `d` | `f64` |
 
-> Note: For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i32` or `i64`).
+> Note: For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i16`/`i32`/`i64` depending on the target).
 
 If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the `freg` register class on RISC-V where `f32` values are NaN-boxed in a `f64` as required by the RISC-V architecture.
 

From 8c547380e09101a92eb8a38c72a0f8e00b0682ee Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 1 Mar 2020 19:52:36 +0000
Subject: [PATCH 29/68] Update semantics of "pure" option

---
 text/0000-inline-asm.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 2a1a86ac12a..00385714135 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -583,7 +583,7 @@ As stated in the previous section, passing an input value smaller than the regis
 
 Flags are used to further influence the behavior of the inline assembly block.
 Currently the following options are defined:
-- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used.
+- `pure`: The `asm` block has no side effects, and its outputs depend only on its direct inputs (i.e. the values themselves, not what they point to) or values read from memory (unless the `nomem` options is also set). This allows the compiler to execute the `asm` block fewer times than specified in the program (e.g. by hoisting it out of a loop) or even eliminate it entirely if the outputs are not used.
 - `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
 - `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
 - `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
@@ -592,6 +592,7 @@ Currently the following options are defined:
 
 The compiler performs some additional checks on options:
 - The `nomem` and `readonly` options are mutually exclusive: it is a compile-time error to specify both.
+- The `pure` option must be combined with either the `nomem` or `readonly` options, otherwise a compile-time error is emitted.
 - It is a compile-time error to specify `pure` on an asm block with no outputs or only discarded outputs (`_`).
 - It is a compile-time error to specify `noreturn` on an asm block with outputs.
 
@@ -621,8 +622,9 @@ As written this RFC requires architectures to map from Rust constraint specifica
 Additionally, the following attributes are added to the LLVM `asm` statement:
 
 * The `nounwind` attribute is always added: unwinding from an inline asm block is not allowed (and not supported by LLVM anyways).
-* If the `nomem` option is set then the `readnone` attribute is added to the LLVM `asm` statement.
-* If the `readonly` option is set then the `readonly` attribute is added to the LLVM `asm` statement.
+* If the `nomem` and `pure` options are both set then the `readnone` attribute is added to the LLVM `asm` statement.
+* If the `readonly` and `pure` options are both set then the `readonly` attribute is added to the LLVM `asm` statement.
+* If the `nomem` option is set without the `pure` option then the `inaccessiblememonly` attribute is added to the LLVM `asm` statement.
 * If the `pure` option is not set then the `sideffect` flag is added the LLVM `asm` statement.
 * If the `nostack` option is not set then the `alignstack` flag is added the LLVM `asm` statement.
 * On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.
@@ -735,6 +737,9 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
   - You should adjust the stack pointer when allocating stack memory as required by the target ABI.
   - The stack pointer must be restored to its original value before leaving the asm block.
 - If the `noreturn` option is set then behavior is undefined if execution falls through to the end of the asm block.
+- If the `pure` option is set then behavior is undefined if the `asm` has side-effects other than its direct outputs. Behavior is also undefined if two executions of the `asm` code with the same inputs result in different outputs.
+  - When used with the `nomem` option, "inputs" are just the direct inputs of the `asm!`.
+  - When used with the `readonly` option, "inputs" comprise the direct inputs of the `asm!` and any memory that the `asm!` block is allowed to read.
 - These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
   - x86
     - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).

From 2d431ab693750d7c6d5b19ff916e7d78eaaca584 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 4 Mar 2020 00:10:42 +0000
Subject: [PATCH 30/68] Clarify that locals are not dropped before entering a
 noreturn asm

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 00385714135..259f6a6b3e2 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -587,7 +587,7 @@ Currently the following options are defined:
 - `nomem`: The `asm` blocks does not read or write to any memory. This allows the compiler to cache the values of modified global variables in registers across the `asm` block since it knows that they are not read or written to by the `asm`.
 - `readonly`: The `asm` block does not write to any memory. This allows the compiler to cache the values of unmodified global variables in registers across the `asm` block since it knows that they are not written to by the `asm`.
 - `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
-- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code.
+- `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code. A `noreturn` asm block behaves just like a function which doesn't return; notably, local variables in scope are not dropped before it is invoked.
 - `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
 
 The compiler performs some additional checks on options:

From 7ddeedcb4b37f9a4d28fa91a0cd80e2948265515 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 4 Mar 2020 14:24:44 +0000
Subject: [PATCH 31/68] Clarify that x86 high byte registers are never
 allocated.

---
 text/0000-inline-asm.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 259f6a6b3e2..0c15feb7c7a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -433,6 +433,8 @@ Here is the list of currently supported register classes:
 | RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-15]`, `x[16-31]` (non-RV32E) | `r` |
 | RISC-V | `freg` | `f[0-31]` | `f` |
 
+> Note: On x86 the `ah`, `bh`, `ch`, `dh` register are never allocated for `i8` operands. This allows values allocated to e.g. `al` to use the full `rax` register.
+
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 
 Each register class has constraints on which value types they can be used with. This is necessary because the way a value is loaded into a register depends on its type. For example, on big-endian systems, loading a `i32x4` and a `i8x16` into a SIMD register may result in different register contents even if the byte-wise memory representation of both values is identical. The availability of supported types for a particular register class may depend on what target features are currently enabled.

From 571474cb886e8bf7e8a240043a96c101bac1d665 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 4 Mar 2020 19:51:03 +0000
Subject: [PATCH 32/68] Clarify the meaning of undefined values on entry to asm

---
 text/0000-inline-asm.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 0c15feb7c7a..50ff32dcc42 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -721,6 +721,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 [rules]: #rules
 
 - Any registers not specified as inputs will contain an undefined value on entry to the asm block.
+  - An "undefined value" in this context means that the register can have any one of the possible value allowed by the architecture. Notably it is not the same as an LLVM `undef` which can have a different value every time you read it (since such a concept does not exist in assembly code).
 - Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry, otherwise behavior is undefined.
   - This only applies to registers which can be specified as an input or output. Other registers follow target-specific rules and are outside the scope of this RFC.
   - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.

From d8fd770188beed42cf23ea9fd1303869c6ff02d5 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Thu, 5 Mar 2020 20:55:30 +0000
Subject: [PATCH 33/68] Fix outdated wording on template modifiers

---
 text/0000-inline-asm.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 50ff32dcc42..a76b8f02107 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -272,7 +272,7 @@ assert_eq!(x, 4 * 6);
 
 In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).
 
-By default the compiler will always choose the name that refers to the full register size (e.g. `rax` on x86-64, `eax` on x86, etc). This is the case even if you pass in a smaller data type (e.g. `u16`) or if you explicitly specify a register (e.g. `in("cx")` will be rendered as `rcx` by default).
+By default the compiler will always choose the name that refers to the full register size (e.g. `rax` on x86-64, `eax` on x86, etc).
 
 This default can be overriden by using modifiers on the template string operands, just like you would with format strings:
 
@@ -291,6 +291,8 @@ In this example, we use the `reg_abcd` register class to restrict the register a
 Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
 The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
 
+If you use a smaller data type (e.g. `u16`) with an operand and forget the use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
+
 ## Options
 
 By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.

From 88d4f21c07e09b4f38963dd89579faea32613fee Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 12:54:50 +0000
Subject: [PATCH 34/68] Add rationale for not including arch name in asm!

---
 text/0000-inline-asm.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index a76b8f02107..2a8c810e300 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -926,6 +926,14 @@ While it might be possible for rustc to verify that inline assembly code conform
 
 [llvm-asm-ext]: https://llvm.org/docs/Extensions.html#machine-specific-assembly-syntax
 
+## Include the target architecture name in `asm!`
+
+Including the name of the target architecture as part of the `asm!` invocation could allow IDEs to perform syntax highlighting on the assembly code. However this has several downsides:
+- It would add a significant amount of complexity to the `asm!` macro which already has many options.
+- Since assembly code is inherently target-specific, `asm!` is already going to be behind a `#[cfg]`. Repeating the architecture name would be redundant.
+- Most inline asm is small and wouldn't really benefit from syntax highlighting.
+- The `asm!` template isn't real assembly code (`{}` placeholders, `{` escaped to `{{`), which may confuse syntax highlighters.
+
 # Prior art
 [prior-art]: #prior-art
 

From df8388faad4ac3ff928960516009790aa9ac2b3f Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 12:57:07 +0000
Subject: [PATCH 35/68] Add rationale for register names as string literals

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 2a8c810e300..b96913e5c64 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -397,7 +397,7 @@ Several types of operands are supported:
 
 ## Register operands
 
-Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`).
+Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`). Using string literals for register names enables support for architectures that use special characters in register names, such as MIPS (`$0`, `$1`, etc).
 
 Note that explicit registers treat register aliases (e.g. `r14` vs `lr` on ARM) and smaller views of a register (e.g. `eax` vs `rax`) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operands or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.
 

From 5554950293bb175f942a12eaea9f3981284b701a Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 12:58:01 +0000
Subject: [PATCH 36/68] Fix typo

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b96913e5c64..7e63504d7f6 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -723,7 +723,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 [rules]: #rules
 
 - Any registers not specified as inputs will contain an undefined value on entry to the asm block.
-  - An "undefined value" in this context means that the register can have any one of the possible value allowed by the architecture. Notably it is not the same as an LLVM `undef` which can have a different value every time you read it (since such a concept does not exist in assembly code).
+  - An "undefined value" in this context means that the register can have any one of the possible values allowed by the architecture. Notably it is not the same as an LLVM `undef` which can have a different value every time you read it (since such a concept does not exist in assembly code).
 - Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry, otherwise behavior is undefined.
   - This only applies to registers which can be specified as an input or output. Other registers follow target-specific rules and are outside the scope of this RFC.
   - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.

From d0a743a99bf14d91a3d29e946aa7fa9af9af04ac Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 13:05:01 +0000
Subject: [PATCH 37/68] Clarify rules for context switching and noreturn

---
 text/0000-inline-asm.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 7e63504d7f6..66f552da706 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -763,6 +763,12 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
     - Floating-point status (`FPSR` register).
   - RISC-V
     - Floating-point exception flags in `fcsr` (`fflags`).
+- The requirement of restoring the stack pointer and non-output registers to their original value only applies when exiting an `asm!` block.
+  - This means that `asm!` blocks that never return (even if not marked `noreturn`) don't need to preserve these registers.
+  - When returning to a different `asm!` block than you entered (e.g. for context switching), these registers must contain the value they had upon entering the `asm!` block that you are *exiting*.
+    - You cannot exit an `asm!` block that has not been entered. Neither can you exit an `asm!` block that has already been exited.
+    - You are responsible for switching any target-specific state (e.g. thread-local storage, stack bounds).
+    - The set of memory locations that you may access is the intersection of those allowed by the `asm!` blocks you entered and exited.
 
 > Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
 

From 1738eecc0c21d2532b7ea8c14204772c519f3914 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 13:14:30 +0000
Subject: [PATCH 38/68] Clarify that the semantics of the asm string are
 target-specific

---
 text/0000-inline-asm.md | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 66f552da706..5a67e75a5ff 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -350,11 +350,9 @@ The assembler template uses the same syntax as [format strings][format-syntax] (
 
 As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after any named arguments if any. Explicit register operands cannot be used by placeholders in the template string. All other operands must appear at least once in the template string, otherwise a compiler error is generated.
 
-The assembly code syntax used is that of the GNU assembler (GAS). The only exception is on x86 where the Intel syntax is used instead of GCC's AT&T syntax.
+The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 
-This RFC only specifies how operands are substituted into the template string. Actual interpretation of the final asm string is left to the assembler.
-
-However there is one restriction on the asm string: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
+The 4 targets specified in this RFC (x86, ARM, AArch64, RISCV) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
 
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 

From 8f2871eca933c83895f83d70151b4469a46cb4f7 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 6 Mar 2020 13:17:32 +0000
Subject: [PATCH 39/68] Fix outdated text regarding _ labels

---
 text/0000-inline-asm.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 5a67e75a5ff..11c99682ced 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -188,9 +188,7 @@ we had to use the `rax` constraint specifier.
 
 Note that unlike other operand types, explicit register operands cannot be used in the template string: you can't use `{}` and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
 
-It is somewhat common that instructions have operands that are not explicitly listed in the
-assembly (template). By default all operands must be used in the template string, but it is possible
-to opt-out of this by giving an operand the name `_`:
+Consider this example which uses the x86 `mul` instruction:
 
 ```rust
 fn mul(a: u64, b: u64) -> u128 {

From 651d152a86b7acf7c4007af0fbff5d3f025279af Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 7 Mar 2020 21:59:03 +0100
Subject: [PATCH 40/68] Allow trailing commas in syntax

---
 text/0000-inline-asm.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 11c99682ced..33e619f53a8 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -336,8 +336,8 @@ operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
 reg_operand := dir_spec "(" reg_spec ")" operand_expr
 operand := reg_operand / "const" const_expr / "sym" path
 option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
-options := "options(" option *["," option] ")"
-asm := "asm!(" format_string *("," [ident "="] operand) ["," options] ")"
+options := "options(" option *["," option] [","] ")"
+asm := "asm!(" format_string *("," [ident "="] operand) ["," options] [","] ")"
 ```
 
 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax

From 766290ffcdf5375d9a8d30d5c503048fe7218911 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 15 Mar 2020 12:20:49 +0000
Subject: [PATCH 41/68] Minor fixes

---
 text/0000-inline-asm.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 33e619f53a8..58d00d6b1b7 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -625,7 +625,7 @@ Additionally, the following attributes are added to the LLVM `asm` statement:
 * If the `nomem` and `pure` options are both set then the `readnone` attribute is added to the LLVM `asm` statement.
 * If the `readonly` and `pure` options are both set then the `readonly` attribute is added to the LLVM `asm` statement.
 * If the `nomem` option is set without the `pure` option then the `inaccessiblememonly` attribute is added to the LLVM `asm` statement.
-* If the `pure` option is not set then the `sideffect` flag is added the LLVM `asm` statement.
+* If the `pure` option is not set then the `sideeffect` flag is added the LLVM `asm` statement.
 * If the `nostack` option is not set then the `alignstack` flag is added the LLVM `asm` statement.
 * On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.
 
@@ -719,7 +719,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 [rules]: #rules
 
 - Any registers not specified as inputs will contain an undefined value on entry to the asm block.
-  - An "undefined value" in this context means that the register can have any one of the possible values allowed by the architecture. Notably it is not the same as an LLVM `undef` which can have a different value every time you read it (since such a concept does not exist in assembly code).
+  - An "undefined value" in the context of this RFC means that the register can (non-deterministically) have any one of the possible values allowed by the architecture. Notably it is not the same as an LLVM `undef` which can have a different value every time you read it (since such a concept does not exist in assembly code).
 - Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry, otherwise behavior is undefined.
   - This only applies to registers which can be specified as an input or output. Other registers follow target-specific rules and are outside the scope of this RFC.
   - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.

From ff3400a20eb05ab6048293fe0422b201b755eda0 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 15 Mar 2020 15:55:30 +0000
Subject: [PATCH 42/68] Add namespacing as an unresolved question

---
 text/0000-inline-asm.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 58d00d6b1b7..f400c600a2f 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -952,7 +952,9 @@ See the section [above][dsl].
 # Unresolved questions
 [unresolved-questions]: #unresolved-questions
 
-None
+## Namespacing the `asm!` macro
+
+Should the `asm!` macro be available directly from the prelude as it is now, or should it have to be imported from `std::arch::$ARCH::asm`? The advantage of the latter is that it would make it explicit that the `asm!` macro is target-specific, but it would make cross-platform code slightly longer to write.
 
 # Future possibilities
 [future-possibilities]: #future-possibilities

From db1743afc81c39ecc3375a8d3f7b522996f17534 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sun, 15 Mar 2020 22:00:15 +0000
Subject: [PATCH 43/68] Add drawback on post-monomorphization errors

---
 text/0000-inline-asm.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index f400c600a2f..b1f4a3e913e 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -816,6 +816,14 @@ Inline assembly is a difficult feature to implement in a compiler backend. While
 
 Because `{}` are used to denote operand placeholders in the template string, actual uses of braces in the assembly code need to be escaped with `{{` and `}}`. This is needed for AVX-512 mask registers and ARM register lists.
 
+## Post-monomorphization errors
+
+Since the code generated by `asm!` is only evaluated late in the compiler back-end, errors in the assembly code (e.g. invalid syntax, unrecognized instruction, etc) are reported during code generation unlike every other error generated by rustc. In particular this means that:
+- Since `cargo check` skips code generation, assembly code is not checked for errors.
+- `asm!` blocks that are determined to be unreachable are not checked for errors. This can even vary depending on the optimization level since inlining provides more opportunities for constant propagation.
+
+However there is a precedent in Rust for post-monomorphization errors: linker errors. Code which references a non-existent `extern` symbol will only cause an error at link-time, and this can also vary with optimization levels as dead code elimination may removed the reference to the symbol before it reaches the linker.
+
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 

From 014624e16ad0b28e4aac4dab81ab062366a24e92 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 17 Mar 2020 09:12:00 +0000
Subject: [PATCH 44/68] Add an alternative about operands before the template
 string

---
 text/0000-inline-asm.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b1f4a3e913e..7b0a51e97b2 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -944,6 +944,10 @@ Including the name of the target architecture as part of the `asm!` invocation c
 - Most inline asm is small and wouldn't really benefit from syntax highlighting.
 - The `asm!` template isn't real assembly code (`{}` placeholders, `{` escaped to `{{`), which may confuse syntax highlighters.
 
+## Operands before template string
+
+The operands could be placed before the template string, which could make the asm easier to read in some cases. However we decided against it because the benefits are small and the syntax would no longer mirror that of Rust format string.
+
 # Prior art
 [prior-art]: #prior-art
 

From 784a13f24d1e76bbd76d00653f034416c9927b03 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 17 Mar 2020 09:18:40 +0000
Subject: [PATCH 45/68] Clarify that the compiler errors if the target doesn't
 support asm!

---
 text/0000-inline-asm.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 7b0a51e97b2..35386ff3188 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -340,6 +340,8 @@ options := "options(" option *["," option] [","] ")"
 asm := "asm!(" format_string *("," [ident "="] operand) ["," options] [","] ")"
 ```
 
+The macro will initially be supported only on ARM, AArch64, x86, x86-64 and RISC-V targets. Support for more targets may be added in the future. The compiler will emit an error if `asm!` is used on an unsupported target.
+
 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
 
 ## Template string
@@ -350,7 +352,7 @@ As with format strings, named arguments must appear after positional arguments.
 
 The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 
-The 4 targets specified in this RFC (x86, ARM, AArch64, RISCV) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
+The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
 
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 

From 14b47f4feb6e64a7126ae0f16fa47574f638e616 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 18 Mar 2020 09:18:45 +0000
Subject: [PATCH 46/68] Explain why inlateout can't be used in example

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 35386ff3188..93a68c98a84 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -154,7 +154,7 @@ unsafe {
 assert_eq!(a, 12);
 ```
 
-Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`.
+Here the compiler is free to allocate the same register for inputs `b` and `c` since it knows they have the same value. However it must allocate a separate register for `a` since it uses `inout` and not `inlateout`. If `inlateout` was used, then `a` and `c` could be allocated to the same register, in which case the first instruction to overwrite the value of `c` and cause the assembly code to produce the wrong result.
 
 However the following example can use `inlateout` since the output is only modified after all input registers have been read:
 

From 3490ce6db23557ebb979873480a0b3efa85140fa Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 18 Mar 2020 09:24:16 +0000
Subject: [PATCH 47/68] Add comments on CPUID arguments

---
 text/0000-inline-asm.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 93a68c98a84..e279ec6b980 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -232,7 +232,9 @@ let ecx: u64;
 unsafe {
     asm!(
         "cpuid",
+        // EAX 4 selects the "Deterministic Cache Parameters" CPUID leaf
         inout("eax") 4 => _,
+        // ECX 0 selects the L0 cache information.
         inout("ecx") 0 => ecx,
         lateout("ebx") ebx,
         lateout("edx") _

From 8d8b1d3e1a36f90e695b17e1be11ef5d070205b8 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 18 Mar 2020 09:34:00 +0000
Subject: [PATCH 48/68] Assembly code that does not conform to the GAS syntax
 will result in assembler-specific behavior.

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index e279ec6b980..cfcb92ffe21 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -354,7 +354,7 @@ As with format strings, named arguments must appear after positional arguments.
 
 The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 
-The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string.
+The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
 
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 

From 66bb5f99706420efc7d8780a1ae5f21aa586a6c4 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 18 Mar 2020 09:39:58 +0000
Subject: [PATCH 49/68] Clarify nounwind rule

---
 text/0000-inline-asm.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index cfcb92ffe21..e1d57a49f7f 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -728,6 +728,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
   - This only applies to registers which can be specified as an input or output. Other registers follow target-specific rules and are outside the scope of this RFC.
   - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.
 - Behavior is undefined if execution unwinds out of an asm block.
+  - This also applies if the assembly code calls a function which then unwinds.
 - Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
   - Refer to the unsafe code guidelines for the exact rules.
   - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.

From ec6a4ce68bfdf3440b452f84bcb3ae800a3e6bf9 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 18 Mar 2020 09:50:59 +0000
Subject: [PATCH 50/68] Remove references to volatile_read/volatile_write

---
 text/0000-inline-asm.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index e1d57a49f7f..427535d18e6 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -729,10 +729,10 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
   - Note that a `lateout` may be allocated to the same register as an `in`, in which case this rule does not apply. Code should not rely on this however since it depends on the results of register allocation.
 - Behavior is undefined if execution unwinds out of an asm block.
   - This also applies if the assembly code calls a function which then unwinds.
-- Any memory reads/writes performed by the asm code follow the same rules as `volatile_read` and `volatile_write`.
+- The set of memory locations that assembly code is allowed the read and write are the same as those allowed for an FFI function.
   - Refer to the unsafe code guidelines for the exact rules.
-  - If the `readonly` option is set, then only memory reads (with the same rules as `volatile_read`) are allowed.
-  - If the `nomem` option is set then no reads or write to memory are allowed.
+  - If the `readonly` option is set, then only memory reads are allowed.
+  - If the `nomem` option is set then no reads or writes to memory are allowed.
   - These rules do not apply to memory which is private to the asm code, such as stack space allocated within the asm block.
 - The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
   - This effectively means that the compiler must treat the `asm!` as a black box and only take the interface specification into account, not the instructions themselves.

From 491d29f323c1bbb42a2f923cfa4ad3b7890e315f Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Thu, 23 Apr 2020 21:41:56 +0100
Subject: [PATCH 51/68] Specify that ARM uses UAL syntax

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 427535d18e6..e453bfda400 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -354,7 +354,7 @@ As with format strings, named arguments must appear after positional arguments.
 
 The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 
-The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
+The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. On ARM, the `.syntax unified` mode is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
 
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 

From 35db0a9b0c18d76d3d2e021664f0a5c41ba1f475 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 24 Apr 2020 00:33:53 +0100
Subject: [PATCH 52/68] Add sym to guide-level explanation

---
 text/0000-inline-asm.md | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index e453bfda400..b7da7446c85 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -268,6 +268,38 @@ unsafe {
 assert_eq!(x, 4 * 6);
 ```
 
+## Symbol operands
+
+A special operand type, `sym`, allows you to use the symbol name of a `fn` or `static` in inline assembly code.
+This allows you to call a function or access a global variable without needing to keep its address in a register.
+
+```rust
+extern "C" fn foo(arg: i32) {
+    println!("arg = {}", arg);
+}
+
+fn call_foo(arg: i32) {
+    unsafe {
+        asm!(
+            "call {}"
+            sym foo,
+            // 1st argument in rdi, which is caller-saved
+            inout("rdi") arg => _,
+            // All caller-saved registers must be marked as clobberred
+            out("rax") _, out("rcx") _, out("rdx") _, out("rsi") _,
+            out("r8") _, out("r9") _, out("r10") _, out("r11") _,
+            out("xmm0") _, out("xmm1") _, out("xmm2") _, out("xmm3") _,
+            out("xmm4") _, out("xmm5") _, out("xmm6") _, out("xmm7") _,
+            out("xmm8") _, out("xmm9") _, out("xmm10") _, out("xmm11") _,
+            out("xmm12") _, out("xmm13") _, out("xmm14") _, out("xmm15") _,
+        )
+    }
+}
+```
+
+Note that the `fn` or `static` item does not need to be public or `#[no_mangle]`:
+the compiler will automatically insert the appropriate mangled symbol name into the assembly code.
+
 ## Register template modifiers
 
 In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).

From f0cd80040d4536dc199f436bfc29c2a21d953e44 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Wed, 29 Apr 2020 00:46:20 +0100
Subject: [PATCH 53/68] Add support for high byte registers on x86

---
 text/0000-inline-asm.md | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b7da7446c85..b652571460c 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -446,6 +446,7 @@ Here is the list of currently supported register classes:
 | ------------ | -------------- | --------- | -------------------- |
 | x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` |
 | x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` |
+| x86 | `reg_byte` | `al`, `ah`, `bl`, `bh`, `cl`, `ch`, `dl`, `dh` <br> `sil` (x86-64 only), `dil`, (x86-64 only) `r[8-15]b` (x86-64 only) | `r` |
 | x86 | `xmm_reg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` |
 | x86 | `ymm_reg` | `ymm[0-7]` (x86) `ymm[0-15]` (x86-64) | `x` |
 | x86 | `zmm_reg` | `zmm[0-7]` (x86) `zmm[0-31]` (x86-64) | `v` |
@@ -467,7 +468,7 @@ Here is the list of currently supported register classes:
 | RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-15]`, `x[16-31]` (non-RV32E) | `r` |
 | RISC-V | `freg` | `f[0-31]` | `f` |
 
-> Note: On x86 the `ah`, `bh`, `ch`, `dh` register are never allocated for `i8` operands. This allows values allocated to e.g. `al` to use the full `rax` register.
+> Note: On x86 we treat `reg_byte` differently from `reg` because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register.
 
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 
@@ -475,8 +476,9 @@ Each register class has constraints on which value types they can be used with.
 
 | Architecture | Register class | Target feature | Allowed types |
 | ------------ | -------------- | -------------- | ------------- |
-| x86-32 | `reg` | None | `i8`, `i16`, `i32`, `f32` |
-| x86-64 | `reg` | None | `i8`, `i16`, `i32`, `f32`, `i64`, `f64` |
+| x86-32 | `reg` | None | `i16`, `i32`, `f32` |
+| x86-64 | `reg` | None | `i16`, `i32`, `f32`, `i64`, `f64` |
+| x86 | `reg_byte` | None | `i8` |
 | x86 | `xmm_reg` | `sse` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` |
 | x86 | `ymm_reg` | `avx` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` <br> `i8x32`, `i16x16`, `i32x8`, `i64x4`, `f32x8`, `f64x4` |
 | x86 | `zmm_reg` | `avx512f` | `i32`, `f32`, `i64`, `f64`, <br> `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2` <br> `i8x32`, `i16x16`, `i32x8`, `i64x4`, `f32x8`, `f64x4` <br> `i8x64`, `i16x32`, `i32x16`, `i64x8`, `f32x16`, `f64x8` |
@@ -505,12 +507,12 @@ Some registers have multiple names. These are all treated by the compiler as ide
 
 | Architecture | Base register | Aliases |
 | ------------ | ------------- | ------- |
-| x86 | `ax` | `al`, `eax`, `rax` |
-| x86 | `bx` | `bl`, `ebx`, `rbx` |
-| x86 | `cx` | `cl`, `ecx`, `rcx` |
-| x86 | `dx` | `dl`, `edx`, `rdx` |
-| x86 | `si` | `sil`, `esi`, `rsi` |
-| x86 | `di` | `dil`, `edi`, `rdi` |
+| x86 | `ax` | `eax`, `rax` |
+| x86 | `bx` | `ebx`, `rbx` |
+| x86 | `cx` | `ecx`, `rcx` |
+| x86 | `dx` | `edx`, `rdx` |
+| x86 | `si` | `esi`, `rsi` |
+| x86 | `di` | `edi`, `rdi` |
 | x86 | `bp` | `bpl`, `ebp`, `rbp` |
 | x86 | `sp` | `spl`, `esp`, `rsp` |
 | x86 | `ip` | `eip`, `rip` |
@@ -555,7 +557,6 @@ Some registers cannot be used for input or output operands:
 | ------------ | -------------------- | ------ |
 | All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. |
 | All | `bp` (x86), `r11` (ARM), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. |
-| x86 | `ah`, `bh`, `ch`, `dh` | These are poorly supported by compiler backends. Use 16-bit register views (e.g. `ax`) instead. |
 | x86 | `k0` | This is a constant zero register which can't be modified. |
 | x86 | `ip` | This is the program counter, not a real register. |
 | x86 | `mm[0-7]` | MMX registers are not currently supported (but may be in the future). |
@@ -581,6 +582,7 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 | x86 | `reg` | `x` | `ax` | `w` |
 | x86 | `reg` | `e` | `eax` | `k` |
 | x86-64 | `reg` | `r` | `rax` | `q` |
+| x86 | `reg_byte` | None | `al` / `ah` | None |
 | x86 | `xmm_reg` | None | `xmm0` | `x` |
 | x86 | `ymm_reg` | None | `ymm0` | `t` |
 | x86 | `zmm_reg` | None | `zmm0` | `g` |
@@ -611,7 +613,7 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 > - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.
 > - on x86 `xmm_reg`: the `x`, `t` and `g` LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
 
-As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done by using a template modifier to use a subregister name in the asm code (e.g. `al` instead of `rax`). Since this an easy pitfall, the compiler will suggest a template modifier to use where appropriate given the input type. If all references to an operand already have modifiers then the warning is suppressed for that operand.
+As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done by using a template modifier to use a subregister name in the asm code (e.g. `ax` instead of `rax`). Since this an easy pitfall, the compiler will suggest a template modifier to use where appropriate given the input type. If all references to an operand already have modifiers then the warning is suppressed for that operand.
 
 [llvm-argmod]: http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers
 

From cef9246826bdc3297b672f3334f152aacdbb348b Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 1 May 2020 20:02:49 +0100
Subject: [PATCH 54/68] Update wording on high byte registers

---
 text/0000-inline-asm.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index b652571460c..0d03505a80a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -446,7 +446,8 @@ Here is the list of currently supported register classes:
 | ------------ | -------------- | --------- | -------------------- |
 | x86 | `reg` | `ax`, `bx`, `cx`, `dx`, `si`, `di`, `r[8-15]` (x86-64 only) | `r` |
 | x86 | `reg_abcd` | `ax`, `bx`, `cx`, `dx` | `Q` |
-| x86 | `reg_byte` | `al`, `ah`, `bl`, `bh`, `cl`, `ch`, `dl`, `dh` <br> `sil` (x86-64 only), `dil`, (x86-64 only) `r[8-15]b` (x86-64 only) | `r` |
+| x86-32 | `reg_byte` | `al`, `bl`, `cl`, `dl`, `ah`, `bh`, `ch`, `dh` | `q` |
+| x86-64 | `reg_byte` | `al`, `bl`, `cl`, `dl`, `sil`, `dil`, `r[8-15]b`, `ah`\*, `bh`\*, `ch`\*, `dh`\* | `q` |
 | x86 | `xmm_reg` | `xmm[0-7]` (x86) `xmm[0-15]` (x86-64) | `x` |
 | x86 | `ymm_reg` | `ymm[0-7]` (x86) `ymm[0-15]` (x86-64) | `x` |
 | x86 | `zmm_reg` | `zmm[0-7]` (x86) `zmm[0-31]` (x86-64) | `v` |
@@ -469,6 +470,8 @@ Here is the list of currently supported register classes:
 | RISC-V | `freg` | `f[0-31]` | `f` |
 
 > Note: On x86 we treat `reg_byte` differently from `reg` because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register.
+>
+> Note #2: On x86-64 the high byte registers (e.g. `ah`) are only available when used as an explicit register. Specifying the `reg_byte` register class for an operand will always allocate a low byte register.
 
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 

From 5dd81ea582b652cc09244cab26963663727839f3 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 1 May 2020 20:11:03 +0100
Subject: [PATCH 55/68] Add att_syntax option for AT&T syntax

---
 text/0000-inline-asm.md | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 0d03505a80a..79532981021 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -369,7 +369,7 @@ reg_spec := <register class> / "<explicit register>"
 operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
 reg_operand := dir_spec "(" reg_spec ")" operand_expr
 operand := reg_operand / "const" const_expr / "sym" path
-option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn"
+option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn" / "att_syntax"
 options := "options(" option *["," option] [","] ")"
 asm := "asm!(" format_string *("," [ident "="] operand) ["," options] [","] ")"
 ```
@@ -386,7 +386,7 @@ As with format strings, named arguments must appear after positional arguments.
 
 The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 
-The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used. On ARM, the `.syntax unified` mode is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
+The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the `.intel_syntax noprefix` mode of GAS is used by default. On ARM, the `.syntax unified` mode is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with `.section`) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.
 
 [rfc-2795]: https://github.com/rust-lang/rfcs/pull/2795
 
@@ -425,7 +425,7 @@ Several types of operands are supported:
   - `<path>` must refer to a `fn` or `static`.
   - A mangled symbol name referring to the item is substituted into the asm template string.
   - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).
-  - `<path>` is allowed to point to a `#[thread_local]` static, in which case the asm code can combine the symbol with relocations (e.g. `@TPOFF`) to read from thread-local data.
+  - `<path>` is allowed to point to a `#[thread_local]` static, in which case the asm code can combine the symbol with relocations (e.g. `@plt`, `@TPOFF`) to read from thread-local data.
 
 ## Register operands
 
@@ -630,6 +630,7 @@ Currently the following options are defined:
 - `preserves_flags`: The `asm` block does not modify the flags register (defined in the [rules][rules] below). This allows the compiler to avoid recomputing the condition flags after the `asm` block.
 - `noreturn`: The `asm` block never returns, and its return type is defined as `!` (never). Behavior is undefined if execution falls through past the end of the asm code. A `noreturn` asm block behaves just like a function which doesn't return; notably, local variables in scope are not dropped before it is invoked.
 - `nostack`: The `asm` block does not push data to the stack, or write to the stack red-zone (if supported by the target). If this option is *not* used then the stack pointer is guaranteed to be suitably aligned (according to the target ABI) for a function call.
+- `att_syntax`: This option is only valid on x86, and causes the assembler to use the `.att_syntax prefix` mode of the GNU assembler. Register operands are substituted in with a leading `%`.
 
 The compiler performs some additional checks on options:
 - The `nomem` and `readonly` options are mutually exclusive: it is a compile-time error to specify both.
@@ -668,7 +669,7 @@ Additionally, the following attributes are added to the LLVM `asm` statement:
 * If the `nomem` option is set without the `pure` option then the `inaccessiblememonly` attribute is added to the LLVM `asm` statement.
 * If the `pure` option is not set then the `sideeffect` flag is added the LLVM `asm` statement.
 * If the `nostack` option is not set then the `alignstack` flag is added the LLVM `asm` statement.
-* On x86 the `inteldialect` flag is added the LLVM `asm` statement so that the Intel syntax is used instead of the AT&T syntax.
+* On x86, if the `att_syntax` option is not set then the `inteldialect` flag is added to the LLVM `asm` statement.
 
 If the `noreturn` option is set then an `unreachable` LLVM instruction is inserted after the asm invocation.
 
@@ -819,7 +820,7 @@ This RFC proposes a completely new inline assembly format.
 It is not possible to just copy examples of GCC-style inline assembly and re-use them.
 There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.
 
-Additionally, this RFC proposes using the Intel asm syntax on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.
+Additionally, this RFC proposes using the Intel asm syntax by default on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.
 
 The `cpuid` example above would look like this in GCC-sytle inline assembly:
 
@@ -955,17 +956,6 @@ fn mul(a: u64, b: u64) -> u128 {
 }
 ```
 
-## Use AT&T syntax on x86
-
-x86 is particular in that there are [two widely used dialects] for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a [different syntax][gas-syntax]. This RFC chooses to use Intel syntax since it is more widely used and users generally find it easier to read and write.
-
-Note however that it is relatively easy to add support for AT&T using a proc macro (e.g. `asm_att!()`) which wraps around `asm!`. Only two transformations are needed:
-- A `%` needs to be added in front of register operands in the template string.
-- The `.att_syntax prefix` directive should be inserted at the start of the template string to switch the assembler to AT&T mode.
-- The `.intel_syntax noprefix` directive should be inserted at the end of the template string to restore the assembler to Intel mode.
-
-[gas-syntax]: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
-
 ## Validate the assembly code in rustc
 
 There may be some slight differences in the set of assembly code that is accepted by different compiler back-ends (e.g. LLVM's integrated assembler vs using GAS as an external assembler). Examples of such differences are:

From 52d00969433353dd4124283640b3e2b04bf7ca62 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 1 May 2020 20:29:55 +0100
Subject: [PATCH 56/68] Add rationale for defaulting to Intel syntax

---
 text/0000-inline-asm.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 79532981021..2a292dba50c 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -956,6 +956,12 @@ fn mul(a: u64, b: u64) -> u128 {
 }
 ```
 
+## Use AT&T syntax by default on x86
+
+x86 is particular in that there are [two widely used dialects] for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a [different syntax][gas-syntax]. This RFC chooses to use Intel syntax by default since it is more widely used and users generally find it easier to read and write.
+
+[gas-syntax]: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
+
 ## Validate the assembly code in rustc
 
 There may be some slight differences in the set of assembly code that is accepted by different compiler back-ends (e.g. LLVM's integrated assembler vs using GAS as an external assembler). Examples of such differences are:

From a3c53ecbfd66cf4850e02fdeecc887064ba7be03 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Fri, 1 May 2020 20:29:32 +0100
Subject: [PATCH 57/68] Clarify that inout requires a mutable place

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 2a292dba50c..1382a05bb50 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -409,7 +409,7 @@ Several types of operands are supported:
 * `inout(<reg>) <expr>`
   - `<reg>` can refer to a register class or an explicit register. The allocated register name is substituted into the asm template string.
   - The allocated register will contain the value of `<expr>` at the start of the asm code.
-  - `<expr>` must be an initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.
+  - `<expr>` must be a mutable initialized place expression, to which the contents of the allocated register is written to at the end of the asm code.
 * `inout(<reg>) <in expr> => <out expr>`
   - Same as `inout` except that the initial value of the register is taken from the value of `<in expr>`.
   - `<out expr>` must be a (possibly uninitialized) place expression, to which the contents of the allocated register is written to at the end of the asm code.

From f97d53a8534bd614b4d0d4b973ea7051702d44a9 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 9 May 2020 22:10:36 +0100
Subject: [PATCH 58/68] More clarifications and formatting fixes

---
 text/0000-inline-asm.md | 48 +++++++++++++++++++++++++----------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 1382a05bb50..3afe13fa59c 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -33,7 +33,7 @@ It can be used to embed handwritten assembly in the assembly output generated by
 Generally this should not be necessary, but might be where the required performance or timing
 cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.
 
-> Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.
+> **Note**: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.
 
 ## Basic usage
 
@@ -177,14 +177,15 @@ While `reg` is generally available on any architecture, these are highly archite
 among others can be addressed by their name.
 
 ```rust
+let cmd = 0xd1;
 unsafe {
-    asm!("out 0x64, rax", in("rax") cmd);
+    asm!("out 0x64, eax", in("eax") cmd);
 }
 ```
 
 In this example we call the `out` instruction to output the content of the `cmd` variable
-to port `0x64`. Since the `out` instruction only accepts `rax` (and its sub registers) as operand
-we had to use the `rax` constraint specifier.
+to port `0x64`. Since the `out` instruction only accepts `eax` (and its sub registers) as operand
+we had to use the `eax` constraint specifier.
 
 Note that unlike other operand types, explicit register operands cannot be used in the template string: you can't use `{}` and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.
 
@@ -206,7 +207,7 @@ fn mul(a: u64, b: u64) -> u128 {
         );
     }
 
-    hi as u128 << 64 + lo as u128
+    (hi as u128) << 64 + lo as u128
 }
 ```
 
@@ -233,9 +234,9 @@ unsafe {
     asm!(
         "cpuid",
         // EAX 4 selects the "Deterministic Cache Parameters" CPUID leaf
-        inout("eax") 4 => _,
+        inout("eax") 4u64 => _,
         // ECX 0 selects the L0 cache information.
-        inout("ecx") 0 => ecx,
+        inout("ecx") 0u64 => ecx,
         lateout("ebx") ebx,
         lateout("edx") _
     );
@@ -281,7 +282,7 @@ extern "C" fn foo(arg: i32) {
 fn call_foo(arg: i32) {
     unsafe {
         asm!(
-            "call {}"
+            "call {}",
             sym foo,
             // 1st argument in rdi, which is caller-saved
             inout("rdi") arg => _,
@@ -312,7 +313,7 @@ This default can be overriden by using modifiers on the template string operands
 let mut x: u16 = 0xab;
 
 unsafe {
-    asm!("mov {0:h}, {0:b}", inout(reg_abcd) x);
+    asm!("mov {0:h}, {0:l}", inout(reg_abcd) x);
 }
 
 assert_eq!(x, 0xabab);
@@ -321,7 +322,7 @@ assert_eq!(x, 0xabab);
 In this example, we use the `reg_abcd` register class to restrict the register allocator to the 4 legacy x86 register (`ax`, `bx`, `cx`, `dx`) of which the first two bytes can be addressed independently.
 
 Let us assume that the register allocator has chosen to allocate `x` in the `ax` register.
-The `h` modifier will emit the register name for the high byte of that register and the `b` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
+The `h` modifier will emit the register name for the high byte of that register and the `l` modifier will emit the register name for the low byte. The asm code will therefore be expanded as `mov ah, al` which copies the low byte of the value into the high byte.
 
 If you use a smaller data type (e.g. `u16`) with an operand and forget the use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.
 
@@ -418,6 +419,7 @@ Several types of operands are supported:
 * `inlateout(<reg>) <expr>` / `inlateout(<reg>) <in expr> => <out expr>`
   - Identical to `inout` except that the register allocator can reuse a register allocated to an `in` (this can happen if the compiler knows the `in` has the same initial value as the `inlateout`).
   - You should only write to the register after all inputs are read, otherwise you may clobber an input.
+  - As with `inout`, `<out expr>` is allowed to be an underscore (`_`) which discards the contents of the register at the end of the asm code.
 * `const <expr>`
   - `<expr>` must be an integer or floating-point constant expression.
   - The value of the expression is formatted as a string and substituted directly into the asm template string.
@@ -469,9 +471,9 @@ Here is the list of currently supported register classes:
 | RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-15]`, `x[16-31]` (non-RV32E) | `r` |
 | RISC-V | `freg` | `f[0-31]` | `f` |
 
-> Note: On x86 we treat `reg_byte` differently from `reg` because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register.
+> **Note**: On x86 we treat `reg_byte` differently from `reg` (and `reg_abcd`) because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register.
 >
-> Note #2: On x86-64 the high byte registers (e.g. `ah`) are only available when used as an explicit register. Specifying the `reg_byte` register class for an operand will always allocate a low byte register.
+> **Note #2**: On x86-64 the high byte registers (e.g. `ah`) are only available when used as an explicit register. Specifying the `reg_byte` register class for an operand will always allocate a low byte register.
 
 Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).
 
@@ -498,7 +500,9 @@ Each register class has constraints on which value types they can be used with.
 | RISC-V | `freg` | `f` | `f32` |
 | RISC-V | `freg` | `d` | `f64` |
 
-> Note: For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i16`/`i32`/`i64` depending on the target).
+> **Note**: For the purposes of the above table, unsigned types `uN`, `isize`, pointers and function pointers are treated as the equivalent integer type (`i16`/`i32`/`i64` depending on the target).
+>
+> **Note #2**: Registers not listed in the table above cannot be used as operands for inline assembly.
 
 If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the `freg` register class on RISC-V where `f32` values are NaN-boxed in a `f64` as required by the RISC-V architecture.
 
@@ -554,7 +558,9 @@ Some registers have multiple names. These are all treated by the compiler as ide
 | RISC-V | `f[18-27]` | `fs[2-11]` |
 | RISC-V | `f[28-31]` | `ft[8-11]` |
 
-Some registers cannot be used for input or output operands:
+> **Note**: This table includes registers which are not usable as operands. They are listed here purely for the purposes of compiler diagnostics.
+
+Registers not listed in the table of register classes cannot be used as operands for inline assembly. This includes the following registers:
 
 | Architecture | Unsupported register | Reason |
 | ------------ | -------------------- | ------ |
@@ -579,12 +585,17 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 | ------------ | -------------- | -------- | -------------- | ------------- |
 | x86-32 | `reg` | None | `eax` | `k` |
 | x86-64 | `reg` | None | `rax` | `q` |
-| x86-32 | `reg_abcd` | `l` | `al` | `b` |
 | x86-64 | `reg` | `l` | `al` | `b` |
-| x86 | `reg_abcd` | `h` | `ah` | `h` |
 | x86 | `reg` | `x` | `ax` | `w` |
 | x86 | `reg` | `e` | `eax` | `k` |
 | x86-64 | `reg` | `r` | `rax` | `q` |
+| x86-32 | `reg_abcd` | None | `eax` | `k` |
+| x86-64 | `reg_abcd` | None | `rax` | `q` |
+| x86 | `reg_abcd` | `l` | `al` | `b` |
+| x86 | `reg_abcd` | `h` | `ah` | `h` |
+| x86 | `reg_abcd` | `x` | `ax` | `w` |
+| x86 | `reg_abcd` | `e` | `eax` | `k` |
+| x86-64 | `reg_abcd` | `r` | `rax` | `q` |
 | x86 | `reg_byte` | None | `al` / `ah` | None |
 | x86 | `xmm_reg` | None | `xmm0` | `x` |
 | x86 | `ymm_reg` | None | `ymm0` | `t` |
@@ -611,7 +622,8 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen
 | RISC-V | `reg` | None | `x1` | None |
 | RISC-V | `freg` | None | `f0` | None |
 
-> Notes:
+> **Notes**:
+> - on ARM and AArch64, the `*_low` register classes have the same modifiers as their base register class.
 > - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register.
 > - on x86: our behavior for `reg` with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.
 > - on x86 `xmm_reg`: the `x`, `t` and `g` LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.
@@ -809,7 +821,7 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
     - You are responsible for switching any target-specific state (e.g. thread-local storage, stack bounds).
     - The set of memory locations that you may access is the intersection of those allowed by the `asm!` blocks you entered and exited.
 
-> Note: As a general rule, these are the flags which are *not* preserved when performing a function call.
+> **Note**: As a general rule, these are the flags which are *not* preserved when performing a function call.
 
 # Drawbacks
 [drawbacks]: #drawbacks

From 62fe4e4acbc9bebd3c4cdf5ba7d24a1426068b7a Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 11 May 2020 01:30:48 +0100
Subject: [PATCH 59/68] Use u32 for CPUID example

---
 text/0000-inline-asm.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 3afe13fa59c..31adeba696a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -227,16 +227,16 @@ We need to tell the compiler about this since it may need to save and restore th
 around the inline assembly block.
 
 ```rust
-let ebx: u64;
-let ecx: u64;
+let ebx: u32;
+let ecx: u32;
 
 unsafe {
     asm!(
         "cpuid",
         // EAX 4 selects the "Deterministic Cache Parameters" CPUID leaf
-        inout("eax") 4u64 => _,
+        inout("eax") 4 => _,
         // ECX 0 selects the L0 cache information.
-        inout("ecx") 0u64 => ecx,
+        inout("ecx") 0 => ecx,
         lateout("ebx") ebx,
         lateout("edx") _
     );

From f45ca4b7d521c314122c9bfb19412e981dec128c Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 16 May 2020 12:58:24 +0100
Subject: [PATCH 60/68] Clarify that stdarch vector types are usable with asm!

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 31adeba696a..02d5d4aba54 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -440,7 +440,7 @@ Only the following types are allowed as operands for inline assembly:
 - Floating-point numbers
 - Pointers (thin only)
 - Function pointers
-- SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`)
+- SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`). This includes archtecture-specific vector types defined in `std::arch` such as `__m128` (x86) or `int8x16_t` (ARM).
 
 Here is the list of currently supported register classes:
 

From 1dbeaf9afbcd92a5ede40fa97248316d342beabb Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 16 May 2020 14:03:52 +0100
Subject: [PATCH 61/68] Clarify the evaluation order of asm! operands

---
 text/0000-inline-asm.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 02d5d4aba54..530eeeb055b 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -429,6 +429,8 @@ Several types of operands are supported:
   - The substituted string does not include any modifiers (e.g. GOT, PLT, relocations, etc).
   - `<path>` is allowed to point to a `#[thread_local]` static, in which case the asm code can combine the symbol with relocations (e.g. `@plt`, `@TPOFF`) to read from thread-local data.
 
+Operand expressions are evaluated from left to right, just like function call arguments. After the `asm!` has executed, outputs are written to in left to right order. This is significant if two outputs point to the same place: that place will contain the value of the rightmost output.
+
 ## Register operands
 
 Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. `"eax"`) while register classes are specified as identifiers (e.g. `reg`). Using string literals for register names enables support for architectures that use special characters in register names, such as MIPS (`$0`, `$1`, etc).

From 89f917e7825689c3dde7d7d7fb6190b5e78d09a3 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 16 May 2020 15:19:15 +0100
Subject: [PATCH 62/68] Update text/0000-inline-asm.md

Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 530eeeb055b..0b78ed50f16 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -442,7 +442,7 @@ Only the following types are allowed as operands for inline assembly:
 - Floating-point numbers
 - Pointers (thin only)
 - Function pointers
-- SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`). This includes archtecture-specific vector types defined in `std::arch` such as `__m128` (x86) or `int8x16_t` (ARM).
+- SIMD vectors (structs defined with `#[repr(simd)]` and which implement `Copy`). This includes architecture-specific vector types defined in `std::arch` such as `__m128` (x86) or `int8x16_t` (ARM).
 
 Here is the list of currently supported register classes:
 

From 3633849ed0d0bf3fe29e729d513d4e812aa1de53 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 16 May 2020 19:24:28 +0100
Subject: [PATCH 63/68] Clarify rules regarding the x86 direction flag

---
 text/0000-inline-asm.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 0b78ed50f16..45b0659c86b 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -801,7 +801,6 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
 - These flags registers must be restored upon exiting the asm block if the `preserves_flags` option is set:
   - x86
     - Status flags in `EFLAGS` (CF, PF, AF, ZF, SF, OF).
-    - Direction flag in `EFLAGS` (DF).
     - Floating-point status word (all).
     - Floating-point exception flags in `MXCSR` (PE, UE, OE, ZE, DE, IE).
   - ARM
@@ -816,6 +815,8 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
     - Floating-point status (`FPSR` register).
   - RISC-V
     - Floating-point exception flags in `fcsr` (`fflags`).
+- On x86, the direction flag (DF in `EFLAGS`) is clear on entry to an asm block and must be clear on exit.
+  - Behavior is undefined if the direction flag is set on exiting an asm block.
 - The requirement of restoring the stack pointer and non-output registers to their original value only applies when exiting an `asm!` block.
   - This means that `asm!` blocks that never return (even if not marked `noreturn`) don't need to preserve these registers.
   - When returning to a different `asm!` block than you entered (e.g. for context switching), these registers must contain the value they had upon entering the `asm!` block that you are *exiting*.

From ae298d843595b6f66cfbf9bbed9818b50c488bf1 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Sat, 16 May 2020 20:28:55 +0100
Subject: [PATCH 64/68] Clarify rules around the use of symbols in asm code

---
 text/0000-inline-asm.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 45b0659c86b..fe8ee0a4aa9 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -823,8 +823,12 @@ unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
     - You cannot exit an `asm!` block that has not been entered. Neither can you exit an `asm!` block that has already been exited.
     - You are responsible for switching any target-specific state (e.g. thread-local storage, stack bounds).
     - The set of memory locations that you may access is the intersection of those allowed by the `asm!` blocks you entered and exited.
+- You cannot assume that an `asm!` block will appear exactly once in the output binary. The compiler is allowed to instantiate multiple copies of the `asm!` block, for example when the function containing it is inlined in multiple places.
+  - As a consequence, you should only use [local labels] inside inline assembly code. Defining symbols in assembly code may lead to assembler and/or linker errors due to duplicate symbol definitions.
 
-> **Note**: As a general rule, these are the flags which are *not* preserved when performing a function call.
+> **Note**: As a general rule, the flags covered by `preserves_flags` are those which are *not* preserved when performing a function call.
+
+[local labels]: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Local-Labels
 
 # Drawbacks
 [drawbacks]: #drawbacks

From 65b1cf293adea1df812f138eb895f3f6e03cb64c Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Mon, 18 May 2020 21:41:21 +0100
Subject: [PATCH 65/68] Fix broken link

---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index fe8ee0a4aa9..4cd6e75150c 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -977,7 +977,7 @@ fn mul(a: u64, b: u64) -> u128 {
 
 ## Use AT&T syntax by default on x86
 
-x86 is particular in that there are [two widely used dialects] for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a [different syntax][gas-syntax]. This RFC chooses to use Intel syntax by default since it is more widely used and users generally find it easier to read and write.
+x86 is particular in that there are [two widely used dialects][gas-syntax] for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a different syntax. This RFC chooses to use Intel syntax by default since it is more widely used and users generally find it easier to read and write.
 
 [gas-syntax]: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
 

From a03813df8944d0fd97da35097fe1dc317e71d322 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Tue, 2 Jun 2020 16:22:15 +0100
Subject: [PATCH 66/68] Update text/0000-inline-asm.md

Co-authored-by: laizy <laizy@users.noreply.github.com>
---
 text/0000-inline-asm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index 4cd6e75150c..fbeefa6e62a 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -207,7 +207,7 @@ fn mul(a: u64, b: u64) -> u128 {
         );
     }
 
-    (hi as u128) << 64 + lo as u128
+    ((hi as u128) << 64) + lo as u128
 }
 ```
 

From 087ac5c8bdf3d7a99868a3e6a47e25d2fda37f83 Mon Sep 17 00:00:00 2001
From: Amanieu d'Antras <amanieu@gmail.com>
Date: Thu, 11 Jun 2020 13:42:47 +0100
Subject: [PATCH 67/68] Clarify wording around positional/named/register
 arguments

---
 text/0000-inline-asm.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index fbeefa6e62a..e1463eee287 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -383,7 +383,9 @@ The macro will initially be supported only on ARM, AArch64, x86, x86-64 and RISC
 
 The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.
 
-As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after any named arguments if any. Explicit register operands cannot be used by placeholders in the template string. All other operands must appear at least once in the template string, otherwise a compiler error is generated.
+As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after named arguments if any.
+
+Explicit register operands cannot be used by placeholders in the template string. All other named and positional operands must appear at least once in the template string, otherwise a compiler error is generated.
 
 The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.
 

From 733869aa3bda434356030501fee2c23cc2f944f3 Mon Sep 17 00:00:00 2001
From: Josh Triplett <josh@joshtriplett.org>
Date: Mon, 15 Jun 2020 00:29:36 -0700
Subject: [PATCH 68/68] RFC 2873 (asm!): Allow multiple template string
 arguments

Interpret them as newline-separated.

Update examples and explanations for this change.
---
 text/0000-inline-asm.md | 68 ++++++++++++++++++++++++++++-------------
 1 file changed, 46 insertions(+), 22 deletions(-)

diff --git a/text/0000-inline-asm.md b/text/0000-inline-asm.md
index e1463eee287..5e5d5b5c6a7 100644
--- a/text/0000-inline-asm.md
+++ b/text/0000-inline-asm.md
@@ -80,10 +80,13 @@ Let us see another example that also uses an input:
 let i: u64 = 3;
 let o: u64;
 unsafe {
-    asm!("
-        mov {0}, {1}
-        add {0}, {number}
-    ", out(reg) o, in(reg) i, number = const 5);
+    asm!(
+        "mov {0}, {1}",
+        "add {0}, {number}",
+        out(reg) o,
+        in(reg) i,
+        number = const 5,
+    );
 }
 assert_eq!(o, 8);
 ```
@@ -94,13 +97,18 @@ and then adding `5` to it.
 
 The example shows a few things:
 
-First we can see that inputs are declared by writing `in` instead of `out`.
+First, we can see that `asm!` allows multiple template string arguments; each
+one is treated as a separate line of assembly code, as if they were all joined
+together with newlines between them. This makes it easy to format assembly
+code.
+
+Second, we can see that inputs are declared by writing `in` instead of `out`.
 
-Second one of our operands has a type we haven't seen yet, `const`.
+Third, one of our operands has a type we haven't seen yet, `const`.
 This tells the compiler to expand this argument to value directly inside the assembly template.
 This is only possible for constants and literals.
 
-Third we can see that we can specify an argument number, or name as in any format string.
+Fourth, we can see that we can specify an argument number, or name as in any format string.
 For inline assembly templates this is particularly useful as arguments are often used more than once.
 For more complex inline assembly using this facility is generally recommended, as it improves
 readability, and allows reordering instructions without changing the argument order.
@@ -146,10 +154,13 @@ let mut a: u64 = 4;
 let b: u64 = 4;
 let c: u64 = 4;
 unsafe {
-    asm!("
-        add {0}, {1}
-        add {0}, {2}
-    ", inout(reg) a, in(reg) b, in(reg) c);
+    asm!(
+        "add {0}, {1}",
+        "add {0}, {2}",
+        inout(reg) a,
+        in(reg) b,
+        in(reg) c,
+    );
 }
 assert_eq!(a, 12);
 ```
@@ -203,7 +214,7 @@ fn mul(a: u64, b: u64) -> u128 {
             "mul {}",
             in(reg) a,
             inlateout("rax") b => lo,
-            lateout("rdx") hi
+            lateout("rdx") hi,
         );
     }
 
@@ -238,7 +249,7 @@ unsafe {
         // ECX 0 selects the L0 cache information.
         inout("ecx") 0 => ecx,
         lateout("ebx") ebx,
-        lateout("edx") _
+        lateout("edx") _,
     );
 }
 
@@ -259,12 +270,14 @@ This can also be used with a general register class (e.g. `reg`) to obtain a scr
 // Multiply x by 6 using shifts and adds
 let mut x: u64 = 4;
 unsafe {
-    asm!("
-        mov {tmp}, {x}
-        shl {tmp}, 1
-        shl {x}, 2
-        add {x}, {tmp}
-    ", x = inout(reg) x, tmp = out(reg) _);
+    asm!(
+        "mov {tmp}, {x}",
+        "shl {tmp}, 1",
+        "shl {x}, 2",
+        "add {x}, {tmp}",
+        x = inout(reg) x,
+        tmp = out(reg) _,
+    );
 }
 assert_eq!(x, 4 * 6);
 ```
@@ -359,6 +372,7 @@ See the reference for the full list of available options and their effects.
 
 Inline assembler is implemented as an unsafe macro `asm!()`.
 The first argument to this macro is a template string literal used to build the final assembly.
+Additional template string literal arguments may be provided; all of the template string arguments are interpreted as if concatenated into a single template string with `\n` between them.
 The following arguments specify input and output operands.
 When required, options are specified as the final argument.
 
@@ -372,17 +386,19 @@ reg_operand := dir_spec "(" reg_spec ")" operand_expr
 operand := reg_operand / "const" const_expr / "sym" path
 option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn" / "att_syntax"
 options := "options(" option *["," option] [","] ")"
-asm := "asm!(" format_string *("," [ident "="] operand) ["," options] [","] ")"
+asm := "asm!(" format_string *("," format_string) *("," [ident "="] operand) ["," options] [","] ")"
 ```
 
 The macro will initially be supported only on ARM, AArch64, x86, x86-64 and RISC-V targets. Support for more targets may be added in the future. The compiler will emit an error if `asm!` is used on an unsupported target.
 
 [format-syntax]: https://doc.rust-lang.org/std/fmt/#syntax
 
-## Template string
+## Template string arguments
 
 The assembler template uses the same syntax as [format strings][format-syntax] (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by [RFC #2795][rfc-2795]) are not supported.
 
+An `asm!` invocation may have one or more template string arguments; an `asm!` with multiple template string arguments is treated as if all the strings were concatenated with a `\n` between them. The expected usage is for each template string argument to correspond to a line of assembly code. All template string arguments must appear before any other arguments.
+
 As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after named arguments if any.
 
 Explicit register operands cannot be used by placeholders in the template string. All other named and positional operands must appear at least once in the template string, otherwise a compiler error is generated.
@@ -1007,6 +1023,12 @@ Including the name of the target architecture as part of the `asm!` invocation c
 
 The operands could be placed before the template string, which could make the asm easier to read in some cases. However we decided against it because the benefits are small and the syntax would no longer mirror that of Rust format string.
 
+## Operands interleaved with template string arguments
+
+An asm directive could contain a series of template string arguments, each followed by the operands referenced in that template string argument. This could potentially simplify long blocks of assembly. However, this could introduce significant complexity and difficulty of reading, due to the numbering of positional arguments, and the possibility of referencing named or numbered arguments other than those that appear grouped with a given template string argument.
+
+Experimentation with such mechanisms could take place in wrapper macros around `asm!`, rather than in `asm!` itself.
+
 # Prior art
 [prior-art]: #prior-art
 
@@ -1043,7 +1065,9 @@ GCC supports passing C labels (the ones used with `goto`) to an inline asm block
 This could be supported by allowing code blocks to be specified as operand types. The following code will print `a` if the input value is `42`, or print `b` otherwise.
 
 ```rust
-asm!("cmp {}, 42; jeq {}",
+asm!(
+    "cmp {}, 42",
+    "jeq {}",
     in(reg) val,
     label { println!("a"); },
     fallthrough { println!("b"); }