Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight performance optimizations #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

crudelios
Copy link
Contributor

One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.

This change creates _unchecked versions of plm_buffer_read and plm_buffer_skip, which, as the name implies, doesn't check for the amount of available data still left.

To compensate, plm_buffer_has has been added to many places where the needed amount of available data can be figured out beforehand, so all _unchecked reads should be guaranteed to be safe.

I also added plm_buffer_is_aligned, which checks for bit alignment to a byte, plm_buffer_read_byte, which checks for enough buffer data available and bit alignment and plm_buffer_read_byte_unchecked, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.

A very small optimization to plm_video_idct was also added, preventing an avoidable sign flip to the y7 calculation by swapping out all remaining signs.

Some warnings specific to Visual Studio were also removed.

Overall, this yields a 5% to 7% performance improvement in my test cases.

As a note, I tried fiddling with SIMD, especially on plm_video_idct. I did get it to work but the performance was either worse (using SSE4.1) or only marginally (<1%) better (with AVX2), so I scrapped that idea.

One of the most time consuming tasks in pl_mpeg is actually reading the buffers, especially because every single read checked if the buffer still had enough data.

This change creates `_unchecked` versions of `plm_buffer_read` and `plm_buffer_skip`, which, as the name implies, doesn't check for the amount of available data still left.

To compensate, `plm_buffer_has` has been added to many places where the needed amount of available data can be figured out beforehand, so all `_unchecked` reads should be guaranteed to be safe.

I also added `plm_buffer_is_aligned`, which checks for bit alignment to a byte, `plm_buffer_read_byte`, which checks for enough buffer data available and bit alignment and `plm_buffer_read_byte_unchecked`, which actually directly reads the byte from the buffer without checking for the remaining buffer length or bit alignment.

A very small optimization to `plm_video_idct` was also added, (prevent a sign flip on `y7` calculation).

Some warnings specific to Visual Studio were also removed.

Overall, this yields a 5% to 7% performance improvement in my test cases.
unwiredben added a commit to unwiredben/vector-video that referenced this pull request Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant