-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove uses of Arc::make_mut to avoid clones #2934
Conversation
13dc931
to
73c1d1a
Compare
5b3bf7d
to
7b2cdb0
Compare
7b2cdb0
to
24ec704
Compare
// SAFETY: We're not inside a tile, there's nothing else writing to the encoder config here. | ||
unsafe { | ||
// Increment the film grain seed for the next frame | ||
if let Some(params) = arc_get_mut_unsafe(&mut fi.config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be better to move the film grain information to the FrameInvariants? Also who is accessing the config
here?
fi, | ||
); | ||
// SAFETY: We know no other threads are accessing this because we are done with tiles. | ||
// The only references would be in ref frames, which are not being touched at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the gain from doing this? Maybe it should be moved to a different patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.frame_me_stats
hits tile_iter_mut
with a count of 2, this is in itself fishy or possibly a bug that went unnoticed since a while.
Ideally get_mut() should always work.
This partially addresses the comments in xiph#2934 by removing the unnecessary field `coded_data.lookahead_me_stats`. This data is never modified between the time it is created in `compute_lookahead_motion_vectors` and the time it is used in `compute_block_importances`, so there is no need to keep a separate copy of it. Instead, we can use the main copy of the data in `fs.frame_me_stats`. Results in ~1% peak memory usage reduction.
This partially addresses the comments in xiph#2934 by removing the unnecessary field `coded_data.lookahead_me_stats`. This data is never modified between the time it is created in `compute_lookahead_motion_vectors` and the time it is used in `compute_block_importances`, so there is no need to keep a separate copy of it. Instead, we can use the main copy of the data in `fs.frame_me_stats`. Results in ~1% peak memory usage reduction.
This partially addresses the comments in #2934 by removing the unnecessary field `coded_data.lookahead_me_stats`. This data is never modified between the time it is created in `compute_lookahead_motion_vectors` and the time it is used in `compute_block_importances`, so there is no need to keep a separate copy of it. Instead, we can use the main copy of the data in `fs.frame_me_stats`. Results in ~1% peak memory usage reduction.
https://gist.github.com/lu-zero/ffc7152ab0aabc651f0fde3da42b4559 this fixes most of the problems
|
All the tests pass for me when I apply these changes against master. However, I still see the significant increase in memory usage, because this is basically the same as not having an |
You need 1 clone for the lookahead and 1 clone for the normal encoding in theory. |
Closed for #2959 |
Profiling revealed there to be a measurable number of clones coming out of
TileStateMut::new
, which was traceable to usage ofArc::make_mut
. A characteristic ofArc::make_mut
is that it will clone the inner data when a thread attempts to write to it. In theory, I have concerns that this could cause data inconsistencies, although that does not seem to have happened in practice in rav1e. However, the clones are avoidable, because each place where we useArc::make_mut
can be safely assumed to be independent--i.e. there is never more than one thread accessing the same portion of the data.640x360 with 1 tile: 1.5% memory usage reduction
640x360 with 4 tiles: 4% memory usage reduction
1920x1080 with 4 tiles: 6% memory usage reduction
No measurable runtime improvement with 1 tile.
~1% runtime improvement with 4 tiles.