Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

See if removing a vec allocation in animation code improves performance #11328

Closed
alice-i-cecile opened this issue Jan 13, 2024 · 3 comments · Fixed by #11330
Closed

See if removing a vec allocation in animation code improves performance #11328

alice-i-cecile opened this issue Jan 13, 2024 · 3 comments · Fixed by #11330
Labels
A-Animation Make things move and change over time C-Performance A change motivated by improving speed, memory usage or compile times D-Trivial Nice and easy! A great choice to get started with Bevy

Comments

@alice-i-cecile
Copy link
Member

          do we need to make a whole new vec or can we just add/remove elements until the len is the same?

Originally posted by @rodolphito in #11306 (comment)

This wasn't part of my PR, so I didn't change it there :)

@alice-i-cecile alice-i-cecile added D-Trivial Nice and easy! A great choice to get started with Bevy C-Performance A change motivated by improving speed, memory usage or compile times A-Animation Make things move and change over time labels Jan 13, 2024
@nicopap
Copy link
Contributor

nicopap commented Jan 13, 2024

To quote myself

It's possible to do a clear() followed by an extend(…). The Vec::new here doesn't cause an allocation until something is pushed to it.

@nicopap
Copy link
Contributor

nicopap commented Jan 13, 2024

I think I might have been misunderstood. the code in question is:

        animation.path_cache = vec![Vec::new(); animation_clip.paths.len()];

In this, we have to Vec constructors (1) vec![…], (2) Vec::new().

(2) doesn't allocate, while (1) will probably.

Replacing the code by:

        let new_len = animation_clip.paths.len();
        animation.path_cache.clear();
        animation.path_cache.extend((0..new_len).map(|_| Vec::new()));

Will likely result in better code generation.

A more "rusty" approach would use ….extend(iter::repeat_with(Vec::new).take(new_len)). While this does still eliminate the one allocation, it generally results in worse code generation. In this case Vec::new() may be translated into a bzero().

@nicopap
Copy link
Contributor

nicopap commented Jan 13, 2024

I'm an idiot, I misunderstood what the comment referred to. I've a PR coming where allocation is minimized as much as possible. What I suggest isn't the optimal solution.

github-merge-queue bot pushed a commit that referenced this issue Jan 13, 2024
Not always, but skip it if the new length is smaller.

For context, `path_cache` is a `Vec<Vec<Option<Entity>>>`.

# Objective

Previously, when setting a new length to the `path_cache`, we would:

1. Deallocate all existing `Vec<Option<Entity>>`
2. Deallocate the `path_cache`
3. Allocate a new `Vec<Vec<Option<Entity>>>`, where each item is an
empty `Vec`, and would have to be allocated when pushed to.

This is a lot of allocations!

## Solution

Use
[`Vec::resize_with`](https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.resize_with).

With this change, what occurs is:

1. We `clear` each `Vec<Option<Entity>>`, keeping the allocation, but
making the memory of each `Vec` re-usable
2. We only append new `Vec` to `path_cache` when it is too small.

* Fixes #11328 

### Note on performance

I didn't benchmark it, I just ran a diff on the generated assembly (ran
with `--profile stress-test` and `--native`). I found this PR has 20
less instructions in `apply_animation` (out of 2504).

Though on a purely abstract level, I can deduce this leads to less
allocation.

More information on profiling allocations in rust:
https://nnethercote.github.io/perf-book/heap-allocations.html

## Future work

I think a [jagged vec](https://en.wikipedia.org/wiki/Jagged_array) would
be much more pertinent. Because it allocates everything in a single
contiguous buffer.

This would avoid dancing around allocations, and reduces the overhead of
one `*mut T` and two `usize` per row, also removes indirection,
improving cache efficiency. I think it would both improve code quality
and performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Animation Make things move and change over time C-Performance A change motivated by improving speed, memory usage or compile times D-Trivial Nice and easy! A great choice to get started with Bevy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants