Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A #4872

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

chenx97
Copy link
Contributor

@chenx97 chenx97 commented Aug 14, 2024

This fix is similar to
2d80641.

@chenx97
Copy link
Contributor Author

chenx97 commented Aug 14, 2024

I first ran into this bug in CTEST(potrf, smoketest_trivial), where a constant like 1e-5, with underlying bits 0x3ee4f8b588e368f1, can become 1.1346735896e-314 because of the lost upper 32 bits, making the number significantly smaller than the actual error produced by the smoke test and leads to test failure.

I don't plan to add LOONGSON3A to the mips64 CI workflow because qemu doesn't work with this test while my Loongson 3B4000 does...

@martin-frbg
Copy link
Collaborator

so this is the real solution for what I papered over in #4810 ?

@martin-frbg martin-frbg added this to the 0.3.29 milestone Aug 14, 2024
@chenx97
Copy link
Contributor Author

chenx97 commented Aug 14, 2024

so this is the real solution for what I papered over in #4810 ?

My fix is for the old mips64el loongson. I haven't checked loongarch64 yet.

@chenx97
Copy link
Contributor Author

chenx97 commented Aug 14, 2024

TEST 99/103 potrf:smoketest_trivial [FAIL]
  ERR: test_potrs.c:535  L s(0,0) difference: 1.19209e-07

I see. This is exactly what I saw when the test was failing for the 3B4000. Maybe you can give it a try.

@chenx97
Copy link
Contributor Author

chenx97 commented Aug 14, 2024

    PROLOGUE

    addi.d     $sp,    $sp,   -128
    SDARG      $r23,   $sp,   0
    SDARG      $r24,   $sp,   8
    SDARG      $r25,   $sp,   16
    SDARG      $r26,   $sp,   24
    SDARG      $r27,   $sp,   32
    ST         $f23,   $sp,   40
    ST         $f24,   $sp,   48
    ST         $f25,   $sp,   56
    ST         $f26,   $sp,   64
    ST         $f27,   $sp,   72
    ST         $f28,   $sp,   80
    ST         $f29,   $sp,   88
    ST         $f30,   $sp,   96
    ST         $f31,   $sp,   104
    ST         ALPHA_R,$sp,   112
    ST         ALPHA_I,$sp,   120

oof...

@martin-frbg
Copy link
Collaborator

I see @fengrl tried to explain this to (a younger) me six years ago but I did not understand :(
Indeed looks like the loongarch64 kernel development followed the same scheme, people getting tricked by the LD/ST macro "sometimes" expanding to the correct fld.d/fst.d instruction.
The spurious error in the potrs test appears to be gone after updating the LOONGSON3R5 gemm kernels but I need to do some more testing before I can put that into a PR...

@martin-frbg martin-frbg merged commit a388c4b into OpenMathLib:develop Aug 14, 2024
77 of 78 checks passed
@chenx97 chenx97 deleted the ls3a-fix-stack-fpr-len branch November 11, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants