Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe Performance Issue With std.rand.Random #10037

Closed
ominitay opened this issue Oct 26, 2021 · 1 comment · Fixed by #10045
Closed

Severe Performance Issue With std.rand.Random #10037

ominitay opened this issue Oct 26, 2021 · 1 comment · Fixed by #10045
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. optimization standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@ominitay
Copy link
Contributor

Zig Version

On 0.9.0-dev.1444+e2a2e6c14, though should apply to all versions of Zig with std which include this interface.

Steps to Reproduce

To reproduce this error, an example using the current Random interface, and a modified copy is provided. These can be benchmarked against each other using a tool such as hyperfine to show the significant performance difference. (The while loop is used just to increase the number of times we run the function so that the time taken is easily measurable).

Example 1 (Current interface):

const std = @import("std");

pub fn main() !void {
    var seed: u64 = undefined;
    try std.os.getrandom(std.mem.asBytes(&seed));
    var prng = std.rand.DefaultPrng.init(seed);
    var rand = &prng.random;

    var i: usize = 0;
    var r: i32 = undefined;
    while (i < 100000000) : (i += 1) {
        r = rand.int(i32);
        std.mem.doNotOptimizeAway(&r);
    }
}

Example 2 (Modified interface):

const std = @import("std");

pub fn main() !void {
    var seed: u64 = undefined;
    try std.os.getrandom(std.mem.asBytes(&seed));
    var prng = std.rand.DefaultPrng.init(seed);
    var rand = Random.init(&prng);

    var i: usize = 0;
    var r: i32 = undefined;
    while (i < 100000000) : (i += 1) {
        r = rand.int(i32);
        std.mem.doNotOptimizeAway(&r);
    }
}

// Modified `Random` interface by @SpexGuy 

const Random = struct {
    self: *c_void,
    fill: fn(*c_void, []u8) void,

    pub fn init(ptr: anytype) Random {
        const Ptr = @TypeOf(ptr);
        const gen = struct {
            fn fill_erased(erased: *c_void, bytes: []u8) void {
                const alignment = @typeInfo(Ptr).Pointer.alignment;
                const self = @ptrCast(Ptr, @alignCast(alignment, erased));
                self.fill(bytes);
            }
        };
        return .{
            .self = ptr,
            .fill = gen.fill_erased,
        };
    }

    /// Returns a random int `i` such that `0 <= i <= maxInt(T)`.
    /// `i` is evenly distributed.
    pub fn int(self: Random, comptime T: type) T {
        const bits = @typeInfo(T).Int.bits;
        const UnsignedT = std.meta.Int(.unsigned, bits);
        const ByteAlignedT = std.meta.Int(.unsigned, @divTrunc(bits + 7, 8) * 8);

        var rand_bytes: [@sizeOf(ByteAlignedT)]u8 = undefined;
        self.fill(self.self, rand_bytes[0..]);

        // use LE instead of native endian for better portability maybe?
        // TODO: endian portability is pointless if the underlying prng isn't endian portable.
        // TODO: document the endian portability of this library.
        const byte_aligned_result = std.mem.readIntSliceLittle(ByteAlignedT, &rand_bytes);
        const unsigned_result = @truncate(UnsignedT, byte_aligned_result);
        return @bitCast(T, unsigned_result);
    }
};

// These functions below are modified from the `Xoshiro256` implementation

fn fill(self: *std.rand.Xoshiro256, buf: []u8) void {
    var i: usize = 0;
    const aligned_len = buf.len - (buf.len & 7);

    // Complete 8 byte segments.
    while (i < aligned_len) : (i += 8) {
        var n = next(self);
        comptime var j: usize = 0;
        inline while (j < 8) : (j += 1) {
            buf[i + j] = @truncate(u8, n);
            n >>= 8;
        }
    }

    // Remaining. (cuts the stream)
    if (i != buf.len) {
        var n = next(self);
        while (i < buf.len) : (i += 1) {
            buf[i] = @truncate(u8, n);
            n >>= 8;
        }
    }
}

fn next(self: *std.rand.Xoshiro256) u64 {
    const r = std.math.rotl(u64, self.s[0] +% self.s[3], 23) +% self.s[0];

    const t = self.s[1] << 17;

    self.s[2] ^= self.s[0];
    self.s[3] ^= self.s[1];
    self.s[1] ^= self.s[2];
    self.s[0] ^= self.s[3];

    self.s[2] ^= t;

    self.s[3] = std.math.rotl(u64, self.s[3], 45);

    return r;
}

The examples were built with the command zig build-exe -O ReleaseFast example1.zig && zig build-exe -O ReleaseFast example2.zig, and benchmarked with the command hyperfine -w 10 ./example{1,2}.

Behaviour

The poor performance of the current Random interface has a significant impact on the performance of the random generators in the standard library. Example 2 performs significantly better than both example 1; on my desktop system, example 2 was measured to run at over 2 times the speed of example 1.

cc: @SpexGuy @cryptocode

@SpexGuy
Copy link
Contributor

SpexGuy commented Oct 26, 2021

Here's a godbolt showing the issue and a potential fix: https://zig.godbolt.org/z/E1ebnYMzM

The problem here is that dirty tracking in LLVM is relatively coarse-grained for functions which are not inlined. This is inevitable in any optimizer, using more data for dirty tracking gets multiplied by every parameter of every function in the codebase. In the current code, LLVM peels off the first iteration of the loop and runs it first. In that iteration, it devirtualizes fill but doesn't inline it. fill writes to state in prng, dirtying it. Unfortunately, that also dirties the function pointer in rand, which is stored inside of prng. With the function pointer dirtied, all future iterations of the loop must do a full virtual call.

The fix is to store the vtable in a separate provenance which is easier for the compiler to destructure. This would change the use of the random interface from const rand = &prng.random; to const rand = prng.random(), where random() constructs a temporary struct with an erased pointer to the source random and a pointer to fill. This allows the compiler to call fill without dirtying the vtable, which leads it to inline everything.

This problem applies to all interfaces in Zig which use the @fieldParentPtr idiom. We should switch them to use this temporary setup instead.

@SpexGuy SpexGuy added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Oct 26, 2021
ominitay added a commit to ominitay/zig that referenced this issue Oct 27, 2021
These changes have been made to resolve issue ziglang#10037. The `Random`
interface was implemented in such a way that causes significant slowdown
when calling the `fill` function of the rng used.

The `Random` interface is no longer stored in a field of the rng, and is
instead returned by the child function `random()` of the rng. This
avoids the performance issues caused by the interface.
ominitay added a commit to ominitay/zig that referenced this issue Oct 27, 2021
These changes have been made to resolve issue ziglang#10037. The `Random`
interface was implemented in such a way that causes significant slowdown
when calling the `fill` function of the rng used.

The `Random` interface is no longer stored in a field of the rng, and is
instead returned by the child function `random()` of the rng. This
avoids the performance issues caused by the interface.
@andrewrk andrewrk added this to the 0.9.0 milestone Oct 27, 2021
andrewrk pushed a commit that referenced this issue Oct 27, 2021
These changes have been made to resolve issue #10037. The `Random`
interface was implemented in such a way that causes significant slowdown
when calling the `fill` function of the rng used.

The `Random` interface is no longer stored in a field of the rng, and is
instead returned by the child function `random()` of the rng. This
avoids the performance issues caused by the interface.
@leecannon leecannon mentioned this issue Oct 28, 2021
@ghost ghost mentioned this issue Oct 28, 2021
@leecannon leecannon mentioned this issue Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. optimization standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants