Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.
This change creates
_unchecked
versions ofplm_buffer_read
andplm_buffer_skip
, which, as the name implies, doesn't check for the amount of available data still left.To compensate,
plm_buffer_has
has been added to many places where the needed amount of available data can be figured out beforehand, so all_unchecked
reads should be guaranteed to be safe.I also added
plm_buffer_is_aligned
, which checks for bit alignment to a byte,plm_buffer_read_byte
, which checks for enough buffer data available and bit alignment andplm_buffer_read_byte_unchecked
, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.A very small optimization to
plm_video_idct
was also added, preventing an avoidable sign flip to they7
calculation by swapping out all remaining signs.Some warnings specific to Visual Studio were also removed.
Overall, this yields a 5% to 7% performance improvement in my test cases.
As a note, I tried fiddling with SIMD, especially on
plm_video_idct
. I did get it to work but the performance was either worse (using SSE4.1) or only marginally (<1%) better (with AVX2), so I scrapped that idea.