-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for improving std::fmt::Arguments and format_args!() #99012
Comments
Most likely, the best alternative will look vaguely like this: That is, it tries to:
The most interesting question is what goes in the block with the question marks, the static data that encodes the format string. (Perhaps it could even include the three strings, to save the overhead of the pointers and indirection, at least in the case of tiny strings like these.) There are many ways to represent a format string in a data structure. There's a trade off to be made between the size of that structure, runtime performance, and complexity. One of the ideas I want to explore more is to represent it as a single array of 'commands' such as This direction is something I experimented a bit with in #84823, but still requires a lot more exploration to get it to something that can be a good alternative. Another option we discussed in the libs meeting earlier this year, is to represent it as compiled code instead of a regular data structure. Unfortunately, it takes quite some time to build every single experiment, as it requires a re-implementation of not only fmt::Arguments, but als of the format_args!() macro, every time. |
An interesting case is something like However, if that second argument is a dynamically chosen It means that the If we go for the closure idea, we could instead get something that looks like this: So,
|
A tricky problem with the closure approach, is how to implement |
If we do not go for the closure approach, a tricky problem is how to make Currently we just make use of something like One possibility of handling this is described in #44343, but it's quite tricky to get that right. The last comment there is a comment from me from 2020 saying that I'd try to implement it. I tried it, but it turned out to be quite tricky to get right and make it compile in all cases, and it got messy quite fast. Specifically, creating just a |
In the VM approach, the "opcodes" can be encoded together with the static string: since in UTF-8, byte values |
What about putting all the This would almost cut the size of the current |
Would using a custom trait instead of |
I've opened a compiler MCP for changing how format_args!() expands, to make it easier to work on these improvements. |
The implementation of the It's a lot of code. Any new implementation of I'm now working on splitting this code into a part into two steps:
Between step 1 and 2, there will be a (relatively simple) intermediate representation. Then a new implementation of Currently, these two steps are quite interwoven, so it'll take some time and effort to separate. Additionaly, I think it'd be nice if we could delay step 2 until later in the compilation process, making the intermediate representation part of the AST and HIR, which is what my opened an MCP for: rust-lang/compiler-team#541 |
I made a template PR for those who want to help out by experimenting with a new fmt::Arguments implementation: #101272 Unfortunately, it's a lot of work to try out a new idea for a The template PR provides a starting point that points out exactly what to implement. I'll work on getting #100996 merged (which is for now included in the template PR), and will start on the closure approach right after. Please feel free to try out another approach (like that 'VM' idea or something else) and leave a comment here with what you're working on, and ping me when you have something ready to test. :) There's a few benchmarks here thanks to @mojave2. It'd be nice if we could also include a representative performance benchmark for formatting in https://perf.rust-lang.org/ or at least as an optional |
We're now in the process of adding an experimental version of runtime benchmarks into |
Some very promising first results from the closure approach: #101568 (comment) |
Update: The compiler MCP for making format_args!() its own ast node has been accepted, and #100996 has been reviewed and approved, and is about to be merged. The closure approach produces some great results for small programs, but doesn't perform as well in all situations. It also can significantly increase compilation time. Once #100996 is merged, I'll be much easier to make changes to fmt::Arguments. I'll start with a few small incremental changes that should be easy to review, before continuing with more experimental ideas again. Now that the compiler MCP has been accepted, I'll also work on moving the data types from |
Rewrite and refactor format_args!() builtin macro. This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`. This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts: 1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself. 2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure. 3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to. In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^) This can serve as a great starting point for rust-lang#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged. It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process. It also fixes a few diagnostics bugs. This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
Not sure if someone has already shared a similar idea, but I want to share this idea that is somewhat a hybrid of the closure method and the list of instructions method from: https://blog.m-ou.se/format-args/. The basic idea is that:
The pub struct Arguments<'a> {
data: NonNull<()>, // Runtime data.
f: unsafe fn(context: &mut dyn Write, data: NonNull<()>) -> fmt::Result, // Instructions.
_phantom: PhantomData<&'a ()>,
} Instruction listFor the the instruction list type, each instruction could have a definition like: struct ThisInstruction<..., RestInstructions>(..., RestInstructions);
impl<..., RestInstructions> ThisInstruction<..., RestInstructions> {
unsafe fn call(output: &mut dyn Write, data: NonNull<()>) -> fmt::Result {
// Execute the current instruction. If we need to access the runtime data, we can store the offset of the
// runtime data in the generic arguments using const generics.
...
// Execute the rest of the list of instructions.
RestInstructions::call(output, data)
}
} In this way, we can chain the list of instructions together to form a single nested type with a call function, which we can store in the type InstructionList = WriteStr<..., WriteDisplay<u32, ..., WriteStr<..., WriteDebug<bool, ..., WriteStr<..., End>>>>>;
// ^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^
// "abc" `{}` "def" `{:?}` "ghi" The advantage of this method is that it allows different format specifications that have the same structure to share the same instruction list, which reduces the binary size, without sacrificing much runtime performance. Of course, the detailed design could be a little more complicated, we may need to define some traits in order to get the idea to work. First, the trait Context: Write {
// Creates the `Formatter` object if necessary.
fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result;
}
// The `Formatter` object has not been created.
impl<'a> Context for dyn Write + 'a {
fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result {
f(&mut Formatter::new(self))
}
}
// The `Formatter` object has already been created.
impl Context for Formatter<'_> {
fn with_formatter(&mut self, f: impl FnOnce(&mut Formatter) -> fmt::Result) -> fmt::Result {
f(self)
}
} Then, we can have the trait Instruction {
// `context` is either `&mut dyn Write` or `&mut Formatter`, depending on whether a `Formatter` object
// have been created.
unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result;
// Not sure why this wrapper is needed, but converting `call` into a function pointer directly causes lifetime error.
unsafe fn call_wrapper(output: &mut dyn Write, data: NonNull<()>) -> fmt::Result {
Self::call(output, data)
}
}
// An instruction that denotes the end of the instruction list.
struct End;
impl Instruction for End {
unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
Ok(())
}
} Then, we can implement other instructions we need: struct WriteStr<const OFFSET: usize, R>(R);
impl<const OFFSET: usize, R> Instruction for WriteStr<OFFSET, R>
where
R: Instruction,
{
unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
let s: &str = todo!("Get the string from `data` and `OFFSET`.");
context.write_str(s)?;
R::call(context, data)
}
}
struct WriteDisplay<T, const OFFSET: usize, R>(PhantomData<T>, R);
impl<T, const OFFSET: usize, R> Instruction for WriteDisplay<T, OFFSET, R>
where
T: Display,
R: Instruction,
{
unsafe fn call(context: &mut (impl Context + ?Sized), data: NonNull<()>) -> fmt::Result {
context.with_formatter(|f| {
let s: &T = todo!("Get the reference to `T` from `data` and `OFFSET`.");
Display::fmt(s, f)?;
R::call(f, data)
})
}
} We can also have other instructions in the same way like After building the type for the instruction list, we can get the function pointer using let _ = Arguments {
data: ...,
f: WriteStr<..., WriteDisplay<u32, ..., WriteStr<..., WriteDebug<bool, ..., WriteStr<..., End>>>>>::call_wrapper,
_phantom: PhantomData,
}; Runtime dataFor the runtime data, it can be as simple as an array of: union Item {
usize: usize, // For storing static string length, placeholder width and placeholder precision.
ptr: NonNull<()>, // For storing object reference and static string address.
} The If we arrange the runtime data carefully (maybe in the reverse order), we can even allow format specifications with the same suffix to share some instructions: format_args!("abc{}xyz{}uvw{:?}", 1_i32, 2_f64, false);
// ^^^^^^^^^^^^ <- The instructions of this part can be shared with the next one.
format_args!( "{:?}foo{}bar{:?}", [4_u32], 3_f64, true);
// ^^^^^^^^^^^^ <- The instructions of this part can be shared with the previous one.
|
Edit: remove pointless quote @EFanZh That's more or less the spirit of my suggestion here. Mine just took a different approach to representation to squeeze out much better space savings:
|
Edit: Drop trivially-desugared The more I think about it, the more I feel something between my idea and your all's might be best: using a binary instruction format that special-cases some types while still remaining very concise. Part of this is also to try to minimize stack setup and usage as well: arguments require about 2 stack pointers per value, and their setup requires storing (and in PIC, looking up) several pointers. This format tries to bury that overhead into a shared blob of code in the standard library.
Runtime layout: 8/16 bytes in position-dependent code and 12/24 bytes in position-independent code. struct Arguments<'a> {
// Self-terminating
bytecode: *const u8,
// Used by bytecode by reference
arguments: *const (),
_phantom: PhantomData<&'a ()>,
// Required for position-independent code for opcode `D1`
// Value only read if static custom formatter used
#[cfg(pic)]
base_ptr: MaybeUninit<*const ()>,
} Interpreter state + initial values:
Instructions (67 encodable, 43 used):
Other notes:
For some case studies (taken from here and Mara's blog post:
@m-ou-se @tgross35 Thoughts? Also, what all can/should we special-case? For one, it may be worth special-casing some string flags, and deferring to the full version only when some flags exist. As for stuff like
@tgross35 I tried to take your feedback from last time into consideration. Tried to address all those pain points this time around. Hopefully, this spells everything out a lot better. 🙂 |
perf: improve write_fmt to handle simple strings Per `@dtolnay` suggestion in serde-rs/serde#2697 (comment) - attempt to speed up performance in the cases of a simple string format without arguments: ```rust write!(f, "text") -> f.write_str("text") ``` ```diff + #[inline] pub fn write_fmt(&mut self, f: fmt::Arguments) -> fmt::Result { + if let Some(s) = f.as_str() { + self.buf.write_str(s) + } else { write(self.buf, f) + } } ``` Hopefully it will improve the simple case for the rust-lang#99012 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
perf: improve write_fmt to handle simple strings Per `@dtolnay` suggestion in serde-rs/serde#2697 (comment) - attempt to speed up performance in the cases of a simple string format without arguments: ```rust write!(f, "text") -> f.write_str("text") ``` ```diff + #[inline] pub fn write_fmt(&mut self, f: fmt::Arguments) -> fmt::Result { + if let Some(s) = f.as_str() { + self.buf.write_str(s) + } else { write(self.buf, f) + } } ``` * Hopefully it will improve the simple case for the rust-lang#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
I've been thinking about this issue while working to shrink a wasm module that necessarily has a lot of string formatting both for human-readable error messages and panic messages. In this context, performance is not very important, but shrinking code size and the static data structures would be very welcome (I think code size is a bit more important here than static data because the former has to be validated and often JIT-compiled to instantiate the module). The "bytecode" ideas presented in earlier comments are very interesting from this angle, as they go to great lengths to reduce the size of the data structures while keeping set-up code as small as anyone could ask for. However, I'm worried about pushing too much complexity in the code that has to interpret With respect to performance, I'm wary of the potential hidden (because it's difficult to pin down in typical benchmarks) cost of having too many distinct encodings that could plausibly be encountered somewhat regularly. This is distinct from just minimizing branches (e.g., Stream-VByte as opposed to other varint schemes), which is not always feasible. Consider modern, fast byte-aligned LZ compression like LZ4: decoding necessarily spends a lot of time on variable-size data copies, but can still run at multiple GB/s because data-dependent branches are minimized and the remaining ones carry as much weight as possible. For example, if you eat a mispredict because a literal or match length exceeded some threshold and falls off the common code path, that's usually a good thing because you get to spend more time in a single large memcpy, which is faster than having many short matches and copies. I think there's an interesting corner of the design space that can address both of these points: minimizing the number of different opcodes and encoding formats, just leaning heavily on the fact that most integers that occur in all the static data structures are almost always very small. So, something closer to the very first design @dead-claudia described than every other variant described since then. Without wanting to commit to a very concrete proposal, here's a sketch for a variant that prioritizes the size of a canonical "interpreter" loop while still packing the entire static data into a single, rather compact blob and probably having a shot at running decently fast:
Some notes:
Edit: I no longer remember why I decided to store the string pieces out-of-line, after the "bytecode". Intermingling the two would remove the need for the separate "string len" field and simplify |
@hanna-kruppe To be clear, I only did a small subset that, in my experience, almost always ends up used anyways. Also, there's less cardinality than you'd think at first glance.
And in each of those, you're saving:
|
I appreciate that you've put a lot of thought into all of the designs you've described, and I don't have trouble believing that they can be implemented compactly with enough heroics. However, nobody needs to step up to actually perform such heroics if the project chooses an approach that can be implemented straightforwardly without any "cliffs" w.r.t. binary size or runtime performance. This is a bit of a worse-is-better sentiment, but here's some very tangible downsides of bundling the formatting code for certain types into a bytecode interpreter:
And that's not even touching on the many different "instructions" and how to decode + dispatch them. The straightforward implementations that come to mind when reading through them are either not very fast or not very compact -- even taking into account that most of the opcode space is occupied by ranges of related things that can just do something generic based on the lower N bits of the "opcode". I'm sure there's a lot of clever tricks possible beyond what I can imagine right now, but are all of them compatible with each other and how do they balance the sort-of-competing concerns of binary size and runtime performance? With so many moving pieces, it's difficult (for me, at least) to get a feeling for these things without putting in the hard work of actually writing a complete implementation or two. |
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` rust-lang#100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang/rust#99012 * Another related (original?) issues #10761 * Previous similar attempt to fix it by by `@Kobzol` #100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang/rust#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` #100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
Rewrite and refactor format_args!() builtin macro. This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`. This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts: 1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself. 2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure. 3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to. In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^) This can serve as a great starting point for rust-lang/rust#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged. It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process. It also fixes a few diagnostics bugs. This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
More core::fmt::rt cleanup. - Removes the `V1` suffix from the `Argument` and `Flag` types. - Moves more of the format_args lang items into the `core::fmt::rt` module. (The only remaining lang item in `core::fmt` is `Arguments` itself, which is a public type.) Part of rust-lang/rust#99012 Follow-up to rust-lang/rust#110616
Rewrite and refactor format_args!() builtin macro. This is a near complete rewrite of `compiler/rustc_builtin_macros/src/format.rs`. This gets rid of the massive unmaintanable [`Context` struct](https://github.com/rust-lang/rust/blob/76531befc4b0352247ada67bd225e8cf71ee5686/compiler/rustc_builtin_macros/src/format.rs#L176-L263), and splits the macro expansion into three parts: 1. First, `parse_args` will parse the `(literal, arg, arg, name=arg, name=arg)` syntax, but doesn't parse the template (the literal) itself. 2. Second, `make_format_args` will parse the template, the format options, resolve argument references, produce diagnostics, and turn the whole thing into a `FormatArgs` structure. 3. Finally, `expand_parsed_format_args` will turn that `FormatArgs` structure into the expression that the macro expands to. In other words, the `format_args` builtin macro used to be a hard-to-maintain 'single pass compiler', which I've split into a three phase compiler with a parser/tokenizer (step 1), semantic analysis (step 2), and backend (step 3). (It's compilers all the way down. ^^) This can serve as a great starting point for rust-lang/rust#99012, which will only need to change the implementation of 3, while leaving step 1 and 2 unchanged. It also makes rust-lang/compiler-team#541 easier, which could then upgrade the new `FormatArgs` struct to an `ast` node and remove step 3, moving that step to later in the compilation process. It also fixes a few diagnostics bugs. This also [significantly reduces](https://gist.github.com/m-ou-se/b67b2d54172c4837a5ab1b26fa3e5284) the amount of generated code for cases with arguments in non-default order without formatting options, like `"{1} {0}"` or `"{a} {}"`, etc.
More core::fmt::rt cleanup. - Removes the `V1` suffix from the `Argument` and `Flag` types. - Moves more of the format_args lang items into the `core::fmt::rt` module. (The only remaining lang item in `core::fmt` is `Arguments` itself, which is a public type.) Part of rust-lang/rust#99012 Follow-up to rust-lang/rust#110616
perf: improve write_fmt to handle simple strings In case format string has no arguments, simplify its implementation with a direct call to `output.write_str(value)`. This builds on `@dtolnay` original [suggestion](serde-rs/serde#2697 (comment)). This does not change any expectations because the original `fn write()` implementation calls `write_str` for parts of the format string. ```rust write!(f, "text") -> f.write_str("text") ``` ```diff /// [`write!`]: crate::write! +#[inline] #[stable(feature = "rust1", since = "1.0.0")] pub fn write(output: &mut dyn Write, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { output.write_str(s) } else { write_internal(output, args) } +} + +/// Actual implementation of the [`write`], but without the simple string optimization. +fn write_internal(output: &mut dyn Write, args: Arguments<'_>) -> Result { let mut formatter = Formatter::new(output); let mut idx = 0; ``` * Hopefully it will improve the simple case for the rust-lang/rust#99012 * Another related (original?) issues rust-lang#10761 * Previous similar attempt to fix it by by `@Kobzol` #100700 CC: `@m-ou-se` as probably the biggest expert in everything `format!`
…<try> Evaluate `std::fmt::Arguments::new_const()` during Compile Time Fixes rust-lang#128709 This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context. Currently, `println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`. With this PR, `println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`. This is accomplished in two steps: 1. Const stabilize `std::fmt::Arguments::new_const()`. 2. Wrap calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR. This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input. CC rust-lang#99012
…<try> Evaluate `std::fmt::Arguments::new_const()` during Compile Time Fixes rust-lang#128709 This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context. Currently, `println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`. With this PR, `println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`. This is accomplished in two steps: 1. Const stabilize `std::fmt::Arguments::new_const()`. 2. Wrap calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR. This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input. CC rust-lang#99012
…<try> Evaluate `std::fmt::Arguments::new_const()` during Compile Time Fixes rust-lang#128709 This PR aims to optimize calls to string formating macros without any arguments by evaluating `std::fmt::Arguments::new_const()` in a const context. Currently, `println!("hola")` compiles to `std::io::_print(std::fmt::Arguments::new_const(&["hola\n"]))`. With this PR, `println!("hola")` compiles to `std::io::_print(const { std::fmt::Arguments::new_const(&["hola\n"]) })`. This is accomplished by wrapping calls to `std::fmt::Arguments::new_const()` in an inline const block when lowering the AST to HIR. This reduces the generated code to a `memcpy` instead of multiple `getelementptr` and `store` instructions even with `-C no-prepopulate-passes -C opt-level=0`. Godbolt for code comparison: https://rust.godbolt.org/z/P7Px7de6c This is a safe and sound transformation because `std::fmt::Arguments::new_const()` is a trivial constructor function taking a slice containing a `'static` string literal as input. CC rust-lang#99012
Earlier this year in the libs team meeting, I presented several different ideas for alternative implementations of
std::fmt::Arguments
which could result in smaller binary size or higher performance. Now that #93740 is mostly done, I'll be shifting my focus to fmt::Arguments and exploring those ideas.Currently, fmt::Arguments is the size of six pointers, and refers to three slices:
&'static [&'static str]
containing the literal parts around the formatting placeholders. E.g. for"a{}b{}c"
, these are["a", "b", "c"]
.&[&(ptr, fn_ptr)]
which is basically a&[&dyn Display]
(but can point toDebug
orHex
etc. too), pointing to the arguments. This one is not'static
, as it points to the actual arguments to be formatted.Option<&'static [FmtArgument]>
, whereFmtArgument
is a struct containing all the options like precision, width, alignment, fill character, etc. This is unused (None
) when all placeholders have no options, like in"{} {}"
, but is used and filled in for all place holders as soon as any placeholder uses any options, like in"{:.5} {}"
.Here's a visualisation of that, for a
"a{}b{:.5}c"
format string:An important part of this design is that most of it can be stored in
static
storage, to minimize the amount of work that a function that needs to create/pass a fmt::Arguments needs to do. It can just refer to the static data, and only fill in a slice of the arguments.Some downsides:
"a{}b{}c"
needs a&["a", "b", "c"]
, which is stored in memory as a (ptr, size) pair referencing three (ptr, size) pairs referencing one byte each, which is a lot of overhead. Small string literals with just a newline or a space are very common in formatting."{:02x}"
, a relatively large array with all the (mostly default) formatting options is stored for all placeholders.&str
argument with a simple"{}"
placeholder, the fullDisplay
implementation for&str
is pulled in, which include code for all the unused options like padding, alignment, etc.Issues like those are often reason to avoid formatting in some situations, which is a shame.
None of these things are trivial to fix, and all involve a trade off between compile time, code size, runtime performance, and implementation complexity. It's also very tricky to make these tradeoffs for many different use cases at once, as the ways in which formatting is used in a program differs vastly per type of Rust program.
Still, there are many ideas that are worth exploring. It's hard to predict which one will end up being best, so this will involve several different implementations to test and benchmark.
I'll explain the different ideas one by one in the comments below as I explore them.
To do:
alloc
. #101569V1
suffix from theArgumentV1
andFlagV1
types: More core::fmt::rt cleanup. #110766evaluate_trait_predicate_recursively
in rustdoc when proving Send/Sync #106930FormatArgsExpn
rust-clippy#10561format_args.rs
torustc_ast::FormatArgs
rust-clippy#10484FormatArgsExpn
rust-clippy#10561FormatArgsExpn
rust-clippy#10561write.rs
torustc_ast::FormatArgs
rust-clippy#10275FormatArgsExpn
rust-clippy#10561&str
: New fmt::Arguments representation. #115129write!(f, "literal")
is just as efficient asf.write_str("literal")
(Impact on compile time too big, because of all the extra code generation.)
The text was updated successfully, but these errors were encountered: