Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise std::vec's .sort_by for short vectors (remove/reduce allocations) #12011

Closed
huonw opened this issue Feb 3, 2014 · 2 comments
Closed
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@huonw
Copy link
Member

huonw commented Feb 3, 2014

For a vector of length n, it currently allocates 2 * n space, but for small vectors it only uses n... and should be able get away without allocations completely.

Note: the comparison function could unwind, so this will need to be careful about what it puts on the stack etc. to be failure safe... although there is a test that should help ensure that. [].swap is probably useful.

Random notes:

  • servo saw speed-ups from this: style: Quicksort rules to avoid allocation of temporary vectors. servo/servo#1567 (apparently for vectors of length at most 16)
  • there may be some advantage to switching what implementation is used based on the size of the elements, e.g. for &mut [uint], [].swap is cheap, but for &mut [HundredKBStruct], many swaps may be more expensive than what one can do with an allocation. (Pure speculation on my part.) Of course, we could just avoid doing this, and people could go faster by boxing it and storing ~HundredKBStruct.
bors added a commit that referenced this issue Feb 7, 2014
This pull request:
1) Changes the initial insertion sort to be in-place, and defers allocation of working set until merge is needed.
2) Increases the increases the maximum run length to use insertion sort for from 8 to 32 elements. This increases the size of vectors that will not allocate, and reduces the number of merge passes by two. It seemed to be the sweet spot in the benchmarks that I ran.

Here are the results of some benchmarks. Note that they are sorting u64s, so types that are more expensive to compare or copy may have different behaviors.
Before changes:
```
test vec::bench::sort_random_large      bench:    719753 ns/iter (+/- 130173) = 111 MB/s
test vec::bench::sort_random_medium     bench:      4726 ns/iter (+/- 742) = 169 MB/s
test vec::bench::sort_random_small      bench:       344 ns/iter (+/- 76) = 116 MB/s
test vec::bench::sort_sorted            bench:    437244 ns/iter (+/- 70043) = 182 MB/s
```

Deferred allocation (8 element insertion sort):
```
test vec::bench::sort_random_large      bench:    702630 ns/iter (+/- 88158) = 113 MB/s
test vec::bench::sort_random_medium     bench:      4529 ns/iter (+/- 497) = 176 MB/s
test vec::bench::sort_random_small      bench:       185 ns/iter (+/- 49) = 216 MB/s
test vec::bench::sort_sorted            bench:    425853 ns/iter (+/- 60907) = 187 MB/s
```

Deferred allocation (16 element insertion sort):
```
test vec::bench::sort_random_large      bench:    692783 ns/iter (+/- 165837) = 115 MB/s
test vec::bench::sort_random_medium     bench:      4434 ns/iter (+/- 722) = 180 MB/s
test vec::bench::sort_random_small      bench:       187 ns/iter (+/- 38) = 213 MB/s
test vec::bench::sort_sorted            bench:    393783 ns/iter (+/- 85548) = 203 MB/s
```

Deferred allocation (32 element insertion sort):
```
test vec::bench::sort_random_large      bench:    682556 ns/iter (+/- 131008) = 117 MB/s
test vec::bench::sort_random_medium     bench:      4370 ns/iter (+/- 1369) = 183 MB/s
test vec::bench::sort_random_small      bench:       179 ns/iter (+/- 32) = 223 MB/s
test vec::bench::sort_sorted            bench:    358353 ns/iter (+/- 65423) = 223 MB/s
```

Deferred allocation (64 element insertion sort):
```
test vec::bench::sort_random_large      bench:    712040 ns/iter (+/- 132454) = 112 MB/s
test vec::bench::sort_random_medium     bench:      4425 ns/iter (+/- 784) = 180 MB/s
test vec::bench::sort_random_small      bench:       179 ns/iter (+/- 81) = 223 MB/s
test vec::bench::sort_sorted            bench:    317812 ns/iter (+/- 62675) = 251 MB/s
```

This is the best I could manage with the basic merge sort while keeping the invariant that the original vector must contain each element exactly once when the comparison function is called. If one is not married to a stable sort, an in-place n*log(n) sorting algorithm may have better performance in some cases.

for #12011
cc @huonw
@pongad
Copy link
Contributor

pongad commented Feb 14, 2014

A potentially off-topic question: Is there any interest in unstable sort? It would get away without allocations for large vectors. For many things the stability is not needed.

@huonw
Copy link
Member Author

huonw commented Feb 14, 2014

@pongad it seems like a possibly useful thing, although collections::PriorityQueue gives an inplace heap sort for ~[]; feel free to open another issue.

(This one was fixed by #12029. Closing.)

@huonw huonw closed this as completed Feb 14, 2014
flip1995 pushed a commit to flip1995/rust that referenced this issue Dec 28, 2023
Remove me from `users_on_vacation`

Came back from surgery about 2 days ago, totally cool to review some more PRs.

changelog:none
r? ghost
flip1995 pushed a commit to flip1995/rust that referenced this issue Jan 25, 2024
…dnet

I'm not on vacation (again)

A few weeks ago I opened rust-lang#12011 removing me from `users_on_vacation`, it got merged. The subtree sync reverted this change (weirdly)

changelog: none

r? `@xFrednet`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

2 participants