Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for #2721 #2723

Merged
merged 2 commits into from
Dec 14, 2021
Merged

fix for #2721 #2723

merged 2 commits into from
Dec 14, 2021

Conversation

ptitSeb
Copy link
Contributor

@ptitSeb ptitSeb commented Dec 14, 2021

Description

Fix for #2721, regression introduce with Windows backend for Singlepass.

self.reserve_unused_temp_gpr(GPR::R9);
self.move_location(Size::S64, loc, Location::GPR(GPR::R9));
self.move_location(
self.assembler.emit_xchg(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On x86, the xchg instruction always does an atomic swap, which is very slow. Try to use a normal mov instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but I need to save the value of the register.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably cheaper to add another pair of push/pop for r9 than to use xchg:

push rax (dummy)
push r9
mov r9, #imm64
mov [sp + 8], r9
pop r9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not.
I have done this simple program:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>


int main(int argc, const char* argv)
{
    struct timespec t1, t2;

    clock_gettime(CLOCK_MONOTONIC, &t1);
    #define NUM 100000
    for(int i=0; i<NUM; ++i)
        asm volatile ("\n\
            push %%r9\n\
            mov $0x123456789abcdef, %%r9\n\
            xchg (%%rsp), %%r9\n\
            pop  %%r9\n\
        "::);
    clock_gettime(CLOCK_MONOTONIC, &t2);
    uint64_t t = t1.tv_sec*1000000000LL + t1.tv_nsec;
    t = (t2.tv_sec*1000000000LL + t2.tv_nsec) - t;
    printf("Xchg: %llu nsec for %d occurences\n", t, NUM);
    for(int i=0; i<NUM; ++i)
        asm volatile ("\n\
            push %%rax\n\
            push %%r9\n\
            mov $0x123456789abcdef, %%r9\n\
            mov 0x8(%%rsp), %%r9\n\
            pop  %%r9\n\
            pop %%rax\n\
        "::);
    clock_gettime(CLOCK_MONOTONIC, &t2);
    t = t1.tv_sec*1000000000LL + t1.tv_nsec;
    t = (t2.tv_sec*1000000000LL + t2.tv_nsec) - t;
    printf("Push: %llu nsec for %d occurences\n", t, NUM);
}

And while I have some variation, the xchg method is always faster:

seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1911800 nsec for 100000 occurences
Push: 2216700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1156300 nsec for 100000 occurences
Push: 1452800 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1156000 nsec for 100000 occurences
Push: 1452800 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1345700 nsec for 100000 occurences
Push: 1668700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1499400 nsec for 100000 occurences
Push: 1898400 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1185400 nsec for 100000 occurences
Push: 1493200 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1276400 nsec for 100000 occurences
Push: 1598700 nsec for 100000 occurences
seb@Seb-PC:~/git/wasmer$ ./test
Xchg: 1276800 nsec for 100000 occurences
Push: 1596000 nsec for 100000 occurences

The implicit LOCK is not that slow, compare to a write+read in memory that the push/pop.

@nagisa
Copy link
Contributor

nagisa commented Dec 14, 2021

Please retain the authorship for the test case. Either cherry-pick the commit from my branch or if you'd rather not do so for some reason, I can submit a PR with the changes I've made as a separate PR once the fix here lands.

@ptitSeb
Copy link
Contributor Author

ptitSeb commented Dec 14, 2021

Ok, I'll see what I can do so you keep the authorship, no problem.

@ptitSeb
Copy link
Contributor Author

ptitSeb commented Dec 14, 2021

Please retain the authorship for the test case. Either cherry-pick the commit from my branch or if you'd rather not do so for some reason, I can submit a PR with the changes I've made as a separate PR once the fix here lands.

I'll let you do a PR when this fix is merged.

@ptitSeb
Copy link
Contributor Author

ptitSeb commented Dec 14, 2021

bors r+

@ptitSeb ptitSeb changed the title fix #2721, also adding the test in the regression test suite fix #2721 Dec 14, 2021
@ptitSeb ptitSeb changed the title fix #2721 fix for #2721 Dec 14, 2021
@bors
Copy link
Contributor

bors bot commented Dec 14, 2021

@bors bors bot merged commit 068ba62 into master Dec 14, 2021
@bors bors bot deleted the fix_call_paramter_lost_x64_linux branch December 14, 2021 13:55
bors bot added a commit that referenced this pull request Dec 14, 2021
2725: Regression test for #2721 r=syrusakbary a=nagisa

The wasm snippet in the test corresponds to roughly:

```rust
let name = "bananapeach";
banana(a, b, c, name.len() as _, name.as_ptr() as _, f, g, h);
```

however sometime between 2.0 and 2.1 the name pointer is no longer being
passed through as an argument. Instead a 0 gets passed in.

To make things weirder, if `name.as_ptr()` is passed through multiple
times, the second time the pointer will get passed correctly.

The fix for this issue is in #2723.

Co-authored-by: Simonas Kazlauskas <[email protected]>
Co-authored-by: Syrus Akbary <[email protected]>
nagisa referenced this pull request in near/wasmer Dec 14, 2021
nagisa referenced this pull request in near/wasmer Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants