Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip alloc when updating animation path cache #11330

Merged
merged 1 commit into from
Jan 13, 2024

Conversation

nicopap
Copy link
Contributor

@nicopap nicopap commented Jan 13, 2024

Not always, but skip it if the new length is smaller.

For context, path_cache is a Vec<Vec<Option<Entity>>>.

Objective

Previously, when setting a new length to the path_cache, we would:

  1. Deallocate all existing Vec<Option<Entity>>
  2. Deallocate the path_cache
  3. Allocate a new Vec<Vec<Option<Entity>>>, where each item is an empty Vec, and would have to be allocated when pushed to.

This is a lot of allocations!

Solution

Use Vec::resize_with.

With this change, what occurs is:

  1. We clear each Vec<Option<Entity>>, keeping the allocation, but making the memory of each Vec re-usable
  2. We only append new Vec to path_cache when it is too small.

Note on performance

I didn't benchmark it, I just ran a diff on the generated assembly (ran with --profile stress-test and --native). I found this PR has 20 less instructions in apply_animation (out of 2504).

Though on a purely abstract level, I can deduce this leads to less allocation.

More information on profiling allocations in rust: https://nnethercote.github.io/perf-book/heap-allocations.html

Future work

I think a jagged vec would be much more pertinent. Because it allocates everything in a single contiguous buffer.

This would avoid dancing around allocations, and reduces the overhead of one *mut T and two usize per row, also removes indirection, improving cache efficiency. I think it would both improve code quality and performance.

Not always, but skip it if the new length is smaller.

For context, `path_cache` is a `Vec<Vec<Option<Entity>>>`.

Previously, when setting a new length to the `path_cache`, we would:

1. Deallocate all existing `Vec<Option<Entity>>`
2. Deallocate the `path_cache`
3. Allocate a new `Vec<Vec<Option<Entity>>>`, where each item is an
   empty `Vec`, and would have to be allocated when pushed to.

With this change, what occurs is:

1. We `clear` each `Vec<Option<Entity>>`, keeping the allocation, but
   making the memory of each `Vec` re-usable
2. We only append new `Vec` to `path_cache` when it is too small.

**Future work**

I think a [jagged vec](https://en.wikipedia.org/wiki/Jagged_array) would
be much more pertinent. Because it allocates everything in a single
contiguous buffer.

This would avoid dancing around allocations, and reduces the overhead of
one `*mut T` and two `usize` per row, also removes indirection,
improving cache efficiency.
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times A-Animation Make things move and change over time labels Jan 13, 2024
nicopap added a commit to nicopap/bevy that referenced this pull request Jan 13, 2024
In bevyengine#11330 I found out that `Parent::get` didn't get inlined, **even with
LTO on**! Not sure what's up with that, but marking functions that
consist of a single call as `inline(always)` has no downside.

`inline(always)` may increase compilation time proportional to how
many time the function is called **and the size of the function marked
with `inline`**. Since we mark as `inline` no-ops functions, there is no
cost to it.

I also took the opportunity to `inline` other functions. I'm not as
confident that marking functions calling other functions as `inline`
works similarly to very simple functions, so I used `inline` over
`inline(always)`.
Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! Would love to see perf numbers, this is obviously better but by how much?

@mockersf
Copy link
Member

This doesn't happen on any perf sensitive scenario so it probably doesn't have any impact... still it's probably better

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 13, 2024
Merged via the queue into bevyengine:main with commit 78b5f32 Jan 13, 2024
25 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Jan 13, 2024
# Objective

In #11330 I found out that `Parent::get` didn't get inlined, **even with
LTO on**!

This means that just to access a field, we have an instruction cache
invalidation, we will move some registers to the stack, will jump to new
instructions, move the field into a register, then do the same dance in
the other direction to go back to the call site.

## Solution

Mark trivial functions as `#[inline]`.

`inline(always)` may increase compilation time proportional to how many
time the function is called **and the size of the function marked with
`inline`**. Since we mark as `inline` functions that consists in a
single instruction, the cost is absolutely negligible.

I also took the opportunity to `inline` other functions. I'm not as
confident that marking functions calling other functions as `inline`
works similarly to very simple functions, so I used `inline` over
`inline(always)`, which doesn't have the same downsides as
`inline(always)`.

More information on inlining in rust:
https://nnethercote.github.io/perf-book/inlining.html
@nicopap nicopap deleted the do-not-allocate-animation branch January 14, 2024 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Animation Make things move and change over time C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

See if removing a vec allocation in animation code improves performance
5 participants