Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use into_string() instead of to_string() on string literals #19708

Closed
wants to merge 2 commits into from

Conversation

japaric
Copy link
Member

@japaric japaric commented Dec 10, 2014

The to_string() method goes through the formatting machinery and tends to over-allocate memory (always 2^N bytes). On the other hand into_string() does a memcpy and doesn't over-allocate. I don't see any reason to use the former if we can use the latter.

r? @alexcrichton

@alexcrichton
Copy link
Member

I requested @gankro remove his r+ because I have some reservations about this move. Specifically, I plan on deprecating the into_string method on &str in the upcoming std::str stabilization. It is not necessarily the correct name for the method as it's not actually consuming &str (which the into_ prefix implies).

The into_string method is a relic of an era where owned and borrowed strings were generically programmed over. Today we much prefer taking &str wherever possible (I believe it's even a convention) instead of String. What this is getting at is that from an API perspective, I believe into_string is not the right method for the job here.

I'll also note that calling .into_string clashes with the usage of fmt::Show to go from an arbitrary type to a String, when .to_string() must still be used.

In terms of performance, the to_string() problem definitely exists, but I would like to see concrete numbers about runtime or memory usage which strongly argue for a fix now. It's possible to improve the std::fmt system in the future, but it's not a priority at this time as we have many other parts of the standard library that need to be standardized. Out of curiosity, did you happen to collect some numbers along the way with respect to this change?

@thestinger
Copy link
Contributor

to_string() does many reallocations instead of reserving space with a size hint which is a major performance issue, especially for large strings. You don't need numbers specific to this case to know that it's a problem.

The only thing that's unknown is the additional cost of the formatting infrastructure on top of the memory allocations, but I think it's going to be insignificant relative to the cost of memory allocation.

@thestinger
Copy link
Contributor

The fact that it over-allocates up to the next power of 2 makes it easy to calculate the excess memory usage. Just take a look at the size classes here:

https://github.com/jemalloc/jemalloc/blob/9b41ac909facf4f09bb1b637b78ba647348e572e/doc/jemalloc.xml.in#L540-L655

An 85 byte string will consume 160 bytes instead of 96 bytes, a 4.5MiB string will consume 8MiB instead of 5MiB and so on.

@thestinger
Copy link
Contributor

The only thing that's unknown is the additional cost of the formatting infrastructure on top of the memory allocations, but I think it's going to be insignificant relative to the cost of memory allocation.

(If it only did 1 allocation, then this would actually matter)

@tbu-
Copy link
Contributor

tbu- commented Dec 10, 2014

A quick benchmark shows the opposite, I have no idea why:

$ ./a --bench

running 6 tests
test bench_long_direct       ... bench:  42474438 ns/iter (+/- 818109) = 17 MB/s
test bench_long_via_format   ... bench:   9322190 ns/iter (+/- 388832) = 80 MB/s
test bench_medium_direct     ... bench:     42862 ns/iter (+/- 1327) = 17 MB/s
test bench_medium_via_format ... bench:      9349 ns/iter (+/- 282) = 80 MB/s
test bench_short_direct      ... bench:       871 ns/iter (+/- 29) = 13 MB/s
test bench_short_via_format  ... bench:       344 ns/iter (+/- 9) = 34 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured

a.rs:

extern crate test;

const TEXT: &'static str = "Hello World!";
const LONG_TEXT: &'static str = "
There are not many persons who know what wonders are opened to them in the
stories and visions of their youth; for when as children we listen and dream,
we think but half-formed thoughts, and when as men we try to remember, we are
dulled and prosaic with the poison of life. But some of us awake in the night
with strange phantasms of enchanted hills and gardens, of fountains that sing
in the sun, of golden cliffs overhanging murmuring seas, of plains that stretch
down to sleeping cities of bronze and stone, and of shadowy companies of heroes
that ride caparisoned white horses along the edges of thick forests; and then
we know that we have looked back through the ivory gates into that world of
wonder which was ours before we were wise and unhappy.";

fn bench_direct(b: &mut test::Bencher, string: &str) {
    b.iter(|| {
        test::black_box(string.into_string());
    });
    b.bytes = string.as_bytes().len().to_u64().unwrap();
}

fn bench_via_format(b: &mut test::Bencher, string: &str) {
    b.iter(|| {
        test::black_box(string.to_string());
    });
    b.bytes = string.as_bytes().len().to_u64().unwrap();
}

fn very_long_string() -> String {
    let mut s = String::with_capacity(LONG_TEXT.len() * 1000);
    for _ in range(0u32, 1000) {
        s.push_str(LONG_TEXT);
    }
    s
}

#[bench] fn bench_short_direct(b: &mut test::Bencher) { bench_direct(b, TEXT); }
#[bench] fn bench_short_via_format(b: &mut test::Bencher) { bench_via_format(b, TEXT); }

#[bench] fn bench_medium_direct(b: &mut test::Bencher) { bench_direct(b, LONG_TEXT); }
#[bench] fn bench_medium_via_format(b: &mut test::Bencher) { bench_via_format(b, LONG_TEXT); }

#[bench] fn bench_long_direct(b: &mut test::Bencher) { bench_direct(b, very_long_string().as_slice()); }
#[bench] fn bench_long_via_format(b: &mut test::Bencher) { bench_via_format(b, very_long_string().as_slice()); }

@tbu-
Copy link
Contributor

tbu- commented Dec 10, 2014

Ah, it was the optimization level, my bad:

$ ./a --bench

running 6 tests
test bench_long_direct       ... bench:    174465 ns/iter (+/- 15410) = 4332 MB/s
test bench_long_via_format   ... bench:    823062 ns/iter (+/- 16784) = 917 MB/s
test bench_medium_direct     ... bench:        54 ns/iter (+/- 1) = 13999 MB/s
test bench_medium_via_format ... bench:       827 ns/iter (+/- 17) = 914 MB/s
test bench_short_direct      ... bench:        30 ns/iter (+/- 1) = 399 MB/s
test bench_short_via_format  ... bench:       116 ns/iter (+/- 2) = 103 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured

@tbu-
Copy link
Contributor

tbu- commented Dec 10, 2014

So it looks like it's >= 4 times as fast in this micro-benchmark, if we now knew the amount of these calls in the average Rust program and the Rust compiler, we could get to conclusions.

@thestinger
Copy link
Contributor

We can make conclusions about memory usage by calculating the waste. You don't need a benchmark to know that it increases the memory usage of many ranges of string sizes by up to 2x.

@alexcrichton
Copy link
Member

@tbu- thanks for collecting some data! I would be very interested in the impact on the compiler as well in terms of this patch.

@thestinger I agree that these are drawbacks of the current formatting implementation, but I do not want to jump to conclusions about memory usage or runtime without analyzing the data. For example the profile of @tbu-'s benchmark shows the hottest rust function is str::is_utf8, not allocation.

 52.22%  foo  [kernel.kallsyms]   [k] 0xffffffff8104f45a
 35.00%  foo  foo                 [.] str::is_utf8::hab1db396681c19080ds
  8.22%  foo  foo                 [.] vec::Vec$LT$T$GT$::push_all::h14954786383563052596
  3.82%  foo  foo                 [.] vec::Vec$LT$u8$GT$.fmt..FormatWriter::write::h576aa16a265866a1e3k
  0.17%  foo  foo                 [.] bench_long_direct::hf06e2cc845c083d2Xca
  0.14%  foo  foo                 [.] arena_purge_stashed

To be clear I'm not disagreeing with your technical points, I'd just like to emphasize that we still need data to draw conclusions from this change as its sometimes surprising. I do not expect, for example, that formatting with a size hint would reduce memory usage of the compiler by half. I would be, however, curious in the impact of the memory usage on the compiler if we inserted a shrink_to_fit after formatting.

@japaric
Copy link
Member Author

japaric commented Dec 11, 2014

@alexcrichton I agree with you in that into_string() is a bad/misleading name, and than it should be removed. However, I still don't think that we should use to_string(), for two reasons:

  • The method to_string() is tied to formatting (via Show). In all these operations, we don't want to "format a string literal", but rather produce an owned value from a slice, just like to_vec() does on &[T].
  • It has performance problems, as shown in other comments.

I'm going to leave this alternative as a suggestion.

  • Instead of using [in]to_string() on string literals use to_owned()
  • Implement to_owned() as String(self.as_bytes().to_vec()) instead of just calling to_string()

Advantages:

  • It conveys the right semantics: give me an owned version of this string literal
  • It's performant (no overallocation)
  • It's one character shorter than to_string() :P

Disadvantages:

  • The ToOwned trait is not in the prelude, so it'll need to be imported in several places. (Or we could just put it in the prelude :-))
  • If at some point we implement, e.g. ToOwned<Box<str>> for str, then type annotations will be required in some cases.

Feel free to close this PR, as I don't expect into_string() to survive.

@thestinger
Copy link
Contributor

@alexcrichton:

but I do not want to jump to conclusions about memory usage or runtime without analyzing the data.

Calculating the excess memory usage is not jumping to conclusions.

I do not expect, for example, that formatting with a size hint would reduce memory usage of the compiler by half.

The compiler is not representative of every application. It's enough to calculate the excess memory usage for the different sizes of strings and then apply that to the observed distribution of string sizes in a few domains (Rust or not) to figure out the increase in memory usage.

@thestinger
Copy link
Contributor

A naive micro-benchmark is unable to measure the cost of memory allocation. It will just be grabbing the same memory over and over again without the usual bookkeeping work and cache misses.

@thestinger
Copy link
Contributor

In-place reallocation rarely succeeds in the real world but it will succeed every time in a naive micro-benchmark because nothing else has been allocated so the cost won't be apparent. It will also be grabbing the same small allocations over and over again from the thread cache. This is a great example of why performing benchmarks can decrease your understanding of the performance rather than increasing it.

The suggestion to benchmark / measure with rustc isn't helpful either. It's not at all dependent on the performance on string manipulation... but many applications are.

@tbu-
Copy link
Contributor

tbu- commented Dec 11, 2014

@alexcrichton When does this call is_utf8? Shouldn't everything on the way be known to be UTF-8?

@alexcrichton
Copy link
Member

@japaric your suggestion of .to_owned() seems like a good one to me. It does, however, call into question the .to_vec() method as to whether it should become .to_owned() as well. It seems a little odd to me that we would have two traits for references => owned objects in the prelude (Clone + ToOwned), but they seem semantically core enough that I would be ok putting them into the prelude. With path slices and such I also suspect that the .to_owend() operation may be more common. cc @aturon on this topic.

@tbu- whenever formatting finishes it calls String::from_utf8 which calls is_utf8 at the top of the function.

@steveklabnik
Copy link
Member

I don't like 'to owned' because it's not specific enough. It used to refer to owned pointers but we don't have those anymore, and for good reason.=

@seanmonstar
Copy link
Contributor

Please don't completely remove it. I just completed a lint to catch exactly this https://github.com/hyperium/hyper/blob/pocketlint/hyperlint/src/lib.rs#L64

@japaric
Copy link
Member Author

japaric commented Dec 12, 2014

@seanmonstar into_string() is being removed in #19741. But, if @alexcrichton takes my to_owned() suggestion, then you can use to_owned() instead of into_string() as they would have the same implementation.

@alexcrichton
Copy link
Member

I'd like to reiterate that we should not be jumping to conclusions about how slow or fast code is without profiling and analyzing it. To be super clear about the is_utf8 check problem, I'd like to reiterate with an updated version of @tbu-'s benchmark with more numbers.

running 9 tests
test bench_long_direct       ... bench:    187995 ns/iter (+/- 6773) = 4021 MB/s
test bench_long_via_format   ... bench:    190527 ns/iter (+/- 6585) = 3967 MB/s
test bench_long_via_write    ... bench:    190592 ns/iter (+/- 7014) = 3965 MB/s
test bench_medium_direct     ... bench:        36 ns/iter (+/- 3) = 20999 MB/s
test bench_medium_via_format ... bench:       102 ns/iter (+/- 4) = 7411 MB/s
test bench_medium_via_write  ... bench:        61 ns/iter (+/- 5) = 12393 MB/s
test bench_short_direct      ... bench:        21 ns/iter (+/- 1) = 571 MB/s
test bench_short_via_format  ... bench:        66 ns/iter (+/- 3) = 181 MB/s
test bench_short_via_write   ... bench:        49 ns/iter (+/- 1) = 244 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 9 measured

Before collecting these numbers, I made the following changes:

  • I altered the one method of FormatWriter to write_str so we have a static guarantee that the contents are always valid UTF-8.
  • I altered the .to_string() implementation to call .shrink_to_fit() when it's done formatting.
  • I added a *_via_write variant of the tests where a String is pre-allocated with enough space for the result, and then I write!() into it. This notably should benchmark the overhead of the formatting system relative to just calling .to_owned() because the number of allocations are the same.

Primarily we can see a drastic improvement by avoiding the is_utf8 check. I plan on writing an RFC about this soon. From the numbers we can also conclude that the formatting system does indeed impose some overhead, but the claims made in this thread can seem much more drastic than the effects actually are. Note that very little work has been put into optimizing that work, however, and it is likely ripe for improvements to reduce the overhead.

@thestinger
Copy link
Contributor

As I pointed out above, the benchmark does not include the cost of reallocations because it's too naive. I guess you didn't read what I wrote. If you did, then it's blatant misinformation.

@thestinger
Copy link
Contributor

but the claims made in this thread can seem much more drastic than the effects actually are.

No, you just didn't take the time to read the claims in this thread:

A naive micro-benchmark is unable to measure the cost of memory allocation. It will just be grabbing the same memory over and over again without the usual bookkeeping work and cache misses.

In-place reallocation rarely succeeds in the real world but it will succeed every time in a naive micro-benchmark because nothing else has been allocated so the cost won't be apparent. It will also be grabbing the same small allocations over and over again from the thread cache. This is a great example of why performing benchmarks can decrease your understanding of the performance rather than increasing it.

@shepmaster
Copy link
Member

I ran into this a while back (based on profiler feedback about my full application), you can see the relevant change in my repo, but the core part was:

On a 100MB XML file, [replacing .to_string with String::from_str] reduced time to open, parse, and output it to /dev/null by 1.2 seconds (taking 87% of the previous time).

@alexcrichton
Copy link
Member

Now that #19741 has landed, I'm going to close this now that into_string has been deprecated. As mentioned, the combination of tweaking Formatter and size hints should address the performance concerns with to_string.

@japaric japaric deleted the into-string branch February 17, 2015 15:53
@japaric japaric restored the into-string branch February 17, 2015 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants