-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port __clzsi2 from compiler_rt #4195
Conversation
@@ -177,7 +179,9 @@ comptime { | |||
@export(@import("compiler_rt/arm.zig").__aeabi_memclr, .{ .name = "__aeabi_memclr4", .linkage = linkage }); | |||
@export(@import("compiler_rt/arm.zig").__aeabi_memclr, .{ .name = "__aeabi_memclr8", .linkage = linkage }); | |||
|
|||
@export(@import("compiler_rt/arm.zig").__aeabi_read_tp, .{ .name = "__aeabi_read_tp", .linkage = linkage }); | |||
if (builtin.os == .linux) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing linux-specific here, what error message did you get? I suppose it's because the GBA cpu doesn't support the mrc
opcode we're using here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only export that symbol under linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can't see anything about that symbol being exclusive to Linux in the specs.
fn __clzsi2_generic(a: i32) callconv(.C) i32 { | ||
@setRuntimeSafety(builtin.is_test); | ||
|
||
var x: c_uint = @bitCast(c_uint, a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the iN
/uN
types as it makes it easier to reason about their effective size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use them as first but it caused too many interger overflow, once I noted that __udivsi3 was using c_uint and c_int to avoid this problem and be closer to the C original source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't sweep the overflow problems under the rug! Bitcast the i32
argument to u32
and use it.
@setRuntimeSafety(builtin.is_test); | ||
|
||
var x: c_uint = @bitCast(c_uint, a); | ||
var t: c_int = @intCast(c_int, @boolToInt((x & 0xFFFF0000) == 0)) << 4; // if (x is small) t = 16 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, can you please put the comments on their own line if it becomes too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, but this is how it was in the original C code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is not the original C code :)
} | ||
|
||
fn __clzsi2_arm_clz(a: i32) callconv(.C) i32 { | ||
return asm volatile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you aim for efficiency you probably want something like
fn __clzsi2_arm_clz() callconv(.Naked) void {
asm volatile(
\\ clz r0, r0
\\ bx lr
);
unreachable;
}
This way you don't have any frame setup/teardown overhead and no redundant input/output/clobbers in the code (remember that the final set of clobbered registers is the union of the in/out/clobber indications).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that but the test suite would fail on ARM platforms because you can't call a naked function from a non-naked function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the problem in the test code?
I can't read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't call a naked function from a non-naked function
You have to @ptrCast
the function to a different calling convention first. If the assembly code in your naked function matches one of the calling conventions then it is safe to cast the function to that and call it.
); | ||
} | ||
|
||
fn __clzsi2_arm32(a: i32) callconv(.C) i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto as before WRT the use of the asm block.
This is the first builtin to have asm-optimized implementations, are you sure the generic Zig version doesn't compile down to something sensible (I'd say that since you're targeting GBA you're interested in code size) ?
The two algorithms are exactly the same with the exception of the the generic one being branch-less, and I guess LLVM should do a good job at optimizing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, this is how it was in LLVM compiler-rt code, I did try implementing a generic code closer to the ARM non-clz version but I think the goal is to be closer to the original compiler-rt source
} | ||
|
||
const can_use_arm_clz = switch (builtin.arch) { | ||
.arm => |sub_arch| switch (sub_arch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can merge the two branches as .arm, .armeb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here (search for "ACLE 6.4.5 CLZ"), it seems that v6m
doesn't support clz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.v4t => false, | ||
else => true, | ||
}, | ||
.thumb => |sub_arch| switch (sub_arch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can merge the .thumb, .thumbeb
branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this is true for every revision that supports thumb2, but until #3927 lands we cannot simplify this :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and yeah can't wait for #3927 to be merged
ccbdff2
to
969fd01
Compare
My counter-proposal here is to use something simpe & generic such as export fn my_clz(i: i32) i32 {
@setRuntimeSafety(false);
var x = @bitCast(u32, i);
var n: i32 = 32;
// Count first bit set using binary search, from Hacker's Delight
var y = x >> 16;
if (y != 0) {
n = n - 16;
x = y;
}
y = x >> 8;
if (y != 0) {
n = n - 8;
x = y;
}
y = x >> 4;
if (y != 0) {
n = n - 4;
x = y;
}
y = x >> 2;
if (y != 0) {
n = n - 2;
x = y;
}
y = x >> 1;
if (y != 0) {
n = n - 1;
x = y;
}
return n - @bitCast(i32, x);
} This produces something sensible for ARM32 (with
|
Even better export fn my_clz(i: i32) i32 {
@setRuntimeSafety(false);
var x = @bitCast(u32, i);
var n: i32 = 32;
// Count first bit set using binary search, from Hacker's Delight
var y: u32 = 0;
inline for ([_] i32 { 16, 8, 4, 2, 1}) |shift| {
y = x >> shift;
if (y != 0) {
n = n - shift;
x = y;
}
}
return n - @bitCast(i32, x);
} |
Oh that's even nicer, I keep forgetting about |
When @LemonBoy gives this the 👌 then I'll hit merge |
…me ARM architecture.
969fd01
to
d563cf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I only have a few nits that I can address in a PR later:
__clzsi2_arm32
is not needed as__clzsi2_generic
compiles down to something very similar- After some digging in the LLVM code
__clzsi2_arm_clz
won't be ever called as__clzsi2
is only called if 1) the target is ARM < 5 2) if the target only supports the Thumb1 instruction set and in both cases theclz
opcode is not available.
It's required for using std.fmt.format on some ARM architecture such as ARM7TDMI in Thumb mode. Also fix regression with __aeabi_read_tp that only works with OS Linux
Tested with ZigGBA and ran zig build test-compiler-rt on Windows, Linux and qemu on Linux.