-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidelines for efficient faer dynamic library #108
Comments
yeah, that looks reasonable enough to me. im surprised about the results though. could you share your benchmark setup? |
Thanks for your input! I am multiplying dense f64 rectangular matrices of size (20,000x8,000) and (8,000x4,000). I am preallocating the result matrix before the benchmark. The benchmark macro Regarding the results I talked about, sorry I missed that I will do more rigorous and thorough benchmarking later in the week. On Intel hardware as well. Hardware and software
12 threads run
The MKL chooses to run on 6 threads according to benchmark.jl
|
one thing that could make a difference is building faer with the |
I switched to my desktop (AMD Ryzen 9 7950X3D 16C/32T, 64Go DDR5-6000) because I think my laptop may thermal throttle and artificially lower
(20,000 x 8,000) * (8,000 x 4,000) rustc 1.78.0-nightly
rustc 1.76.0
(40,000 x 16,000) * (16,000 x 8,000) rustc 1.78.0-nightly
rustc 1.76.0
Are those results reasonable? They look good to me but I don't know the expected performance of I will do different benchmarks when I have more time. Are you interested? And if so where should I share them? |
the results look pretty reasonable to me. it's hard to know exactly what is making faer slower without taking a closer look. |
FYI I ran the same benchmark on Intel hardware. I fixed my thread count problem: everything effectively run on 8 threads here.
Hardware and software
(20,000 x 8,000) * (8,000 x 4,000) rustc 1.78.0-nightly
rustc 1.76.0
|
what happens if you initialize the matrix instead of using |
i just got an idea! what happens if you benchmark faer without any of the other libraries running? i vaguely remember some issues with openmp's threadpool interfering with rayon's, which caused significant slowdowns on faer's side of things i would be curious to see those as well as single threaded results if that's alright with you |
No change
No change
Large performance difference here! Hardware and software
(20,000 x 8,000) * (8,000 x 4,000) rustc 1.78.0-nightly
rustc 1.76.0
rustc 1.78.0-nightly
rustc 1.76.0 |
yeah, no idea what's happening then. if you can share your full benchmark i can see if i can reproduce the results. |
https://github.com/guiburon/faer-api FYI something seems odd right now with I don't know if you are familiar with Julia. Don't hesitate to ask if you want some pointers. |
i tried the benchmark and im getting close results for all 3 libraries
one thing i noticed though, was that |
I ran the benchmark in monothread (
So the only hardware where |
Hi!
I am really impressed by your colossal work on this math kernel!
I am writing a Julia wrapper to benchmark faer against OpenBLAS and MKL.
So far I have only studied the dense matrix-matrix multiplication. My preliminary results show faer approximately 50% slower than OpenBLAS and 25% slower than MKL on an AMD Ryzen 5 7640U on 8 threads.
This is basically my first Rust project and I want to be fair to faer: is this a reasonable dynamic library exposing faer inplace matrix multiplication using the C ABI?
I am not sure if opening an issue is the right way to ask but the faer documentation is very sparse at the moment on how to import external matrices.
The text was updated successfully, but these errors were encountered: