Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid needless gather in fast_integer_divide lowering #6441

Merged
merged 5 commits into from
Nov 23, 2021

Conversation

abadams
Copy link
Member

@abadams abadams commented Nov 22, 2021

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.
@abadams abadams requested a review from shoaibkamil November 22, 2021 23:54
Buffer<uint8_t> integer_divide_table_u8() {
static std::mutex initialize_lock;
std::lock_guard<std::mutex> lock_guard(initialize_lock);
{
static Buffer<uint8_t> im(256, 2);
static Buffer<uint8_t> im(256);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: since C++11 onwards default to promising thread-safe initialization of statics, we could rewrite this to be a bit terser like so:

Buffer<uint8_t> integer_divide_table_u8() {
    static Buffer<uint8_t> im(256) = []() {
        Buffer<uint8_t> im(256);
        for (uint32_t i = 0; i < 256; i++) {
            im(i) = table_runtime_u8[i][2];
            if (i > 1) {
                internal_assert(table_runtime_u8[i][3] == shift_for_denominator(i));
            }
        }
        return im;
    }();

    return im;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will fix.

@abadams
Copy link
Member Author

abadams commented Nov 23, 2021

No, that's compile-time code.

@abadams abadams merged commit 8b68f85 into master Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants