-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for #2721 #2723
fix for #2721 #2723
Conversation
self.reserve_unused_temp_gpr(GPR::R9); | ||
self.move_location(Size::S64, loc, Location::GPR(GPR::R9)); | ||
self.move_location( | ||
self.assembler.emit_xchg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On x86, the xchg
instruction always does an atomic swap, which is very slow. Try to use a normal mov
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but I need to save the value of the register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably cheaper to add another pair of push
/pop
for r9
than to use xchg
:
push rax (dummy)
push r9
mov r9, #imm64
mov [sp + 8], r9
pop r9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not.
I have done this simple program:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
int main(int argc, const char* argv)
{
struct timespec t1, t2;
clock_gettime(CLOCK_MONOTONIC, &t1);
#define NUM 100000
for(int i=0; i<NUM; ++i)
asm volatile ("\n\
push %%r9\n\
mov $0x123456789abcdef, %%r9\n\
xchg (%%rsp), %%r9\n\
pop %%r9\n\
"::);
clock_gettime(CLOCK_MONOTONIC, &t2);
uint64_t t = t1.tv_sec*1000000000LL + t1.tv_nsec;
t = (t2.tv_sec*1000000000LL + t2.tv_nsec) - t;
printf("Xchg: %llu nsec for %d occurences\n", t, NUM);
for(int i=0; i<NUM; ++i)
asm volatile ("\n\
push %%rax\n\
push %%r9\n\
mov $0x123456789abcdef, %%r9\n\
mov 0x8(%%rsp), %%r9\n\
pop %%r9\n\
pop %%rax\n\
"::);
clock_gettime(CLOCK_MONOTONIC, &t2);
t = t1.tv_sec*1000000000LL + t1.tv_nsec;
t = (t2.tv_sec*1000000000LL + t2.tv_nsec) - t;
printf("Push: %llu nsec for %d occurences\n", t, NUM);
}
And while I have some variation, the xchg method is always faster:
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1911800 nsec for 100000 occurences
Push: 2216700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1156300 nsec for 100000 occurences
Push: 1452800 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1156000 nsec for 100000 occurences
Push: 1452800 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1345700 nsec for 100000 occurences
Push: 1668700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1499400 nsec for 100000 occurences
Push: 1898400 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1185400 nsec for 100000 occurences
Push: 1493200 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1276400 nsec for 100000 occurences
Push: 1598700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1276800 nsec for 100000 occurences
Push: 1596000 nsec for 100000 occurences
The implicit LOCK is not that slow, compare to a write+read in memory that the push/pop.
Please retain the authorship for the test case. Either cherry-pick the commit from my branch or if you'd rather not do so for some reason, I can submit a PR with the changes I've made as a separate PR once the fix here lands. |
Ok, I'll see what I can do so you keep the authorship, no problem. |
I'll let you do a PR when this fix is merged. |
bors r+ |
2725: Regression test for #2721 r=syrusakbary a=nagisa The wasm snippet in the test corresponds to roughly: ```rust let name = "bananapeach"; banana(a, b, c, name.len() as _, name.as_ptr() as _, f, g, h); ``` however sometime between 2.0 and 2.1 the name pointer is no longer being passed through as an argument. Instead a 0 gets passed in. To make things weirder, if `name.as_ptr()` is passed through multiple times, the second time the pointer will get passed correctly. The fix for this issue is in #2723. Co-authored-by: Simonas Kazlauskas <[email protected]> Co-authored-by: Syrus Akbary <[email protected]>
Co-authored-by: ptitSeb <[email protected]>
Description
Fix for #2721, regression introduce with Windows backend for Singlepass.