-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ARM64 #570
Support for ARM64 #570
Conversation
The sse flags are not supported by ARM64 (this was a potential issue also for new macs, but I don't have a way to test it). Signed-off-by: Marcello Seri <[email protected]>
Signed-off-by: Marcello Seri <[email protected]>
Signed-off-by: Marcello Seri <[email protected]>
How much does -sse impact the performance? Did @jzstark evaluate this before? |
No idea, that is why I also added him as reviewer. In any case I think this is not supported on ARM-based macs, so I think we will need to find a solution.
I think it must be possible, but I don't know how to check the ARCH. Unfortunately I don't have the time to study that right now. |
Looking at gcc documentation it seems to me that this should make a difference only on i386 and maybe old compilers, all of |
I have not done evaluation about the We have some code to detecting the cpu architecture at C code level, but I think we need something at the OCaml level here. I don't know about that, but perhaps some similar method as we use in configure.ml to detect different OS could potentially be utilised. |
The post you link confirms what I was saying though: “ -msse -msse2 -mfpmath=sse is already the default for x86-64, but not for 32-bit i386”. So this change should not make any difference: Owl doesn’t really support i386, it has tests failing on i386 since the opam-repository has been testing it. I think we should aim to add arm64 support sooner rather than later (and maybe add a remark on i386 to enable those flags and keep an eye out for bugs if somebody wants to try) |
This is how we can add arch specific flags in configurator: https://github.com/mirage/mirage-crypto/blob/master/config/cfg.ml#L5-L16 |
I know it is not a proper benchmark but I did run all the owl examples and simulations that I have running around with and without this code (all 64 bits machines, ubuntu and macos), and there is no apparent difference in the running time (measured with time), confirming what the documentation says (i.e., these flags should be redundant in 64bit systems). |
Until owlbarn#570 is accepted Signed-off-by: Marcello Seri <[email protected]>
Until #570 is accepted Signed-off-by: Marcello Seri <[email protected]>
Greetings. I am building owl for the M1 chip on macOS. I have effectively applied this patch as I was starting the basic port. It stops when we get to an assembly call for cpuid, which ARM64 lacks. The caller is trying to determine the cache size available.
|
@jhgorse Thanks for reporting this problem. When thus cache size function was introduced, it was mainly for the optimization of convolution operations. I vaguely recall that if non-compatible architectures are used, it is supposed returns some default values, but apparently M1 was not covered. Unfortunately currently I lack hardware and time to revise the code base and add extra architectures on the whitelist. If you could propose a solution and make a PR, it would be really appreciated! |
Okay, I got past that. The compiler doesn't like the potential to need that object code so it dies. The preprocessor needs to filter out that intel or amd code for cpuid. Now we fail on link with:
|
Would be happy to add a commit or two off of your PR here once we get things ship shape. =) |
Digging into the generated outputs with -v, I see
Then finally the
|
libgcc_s.dylib is implicitly getting included due to libopenblas, specifically liblapacke. Something which likely shouldn't be grabbing libgcc is pulling it in the LAPACKE build. LAPACK, which we appear to sometimes bundle in OpenBLAS, depends on fortran, which requires gfortran, or a commercial compiler, or potentially flang llvm for the apple-m1. xref: Reference-LAPACK/lapack#643 x86_64 emulation through Rosetta2 is a stop-gap which may suffice while a better solution is designed and implemented. It requires a separate homebrew install, e.g. Apple has their own port of LAPACK https://developer.apple.com/documentation/accelerate/solving_systems_of_linear_equations_with_lapack And BLAS via vecLIB What would it take to use these to satisfy the dependency? What is lost in these libraries that are not in the latest open source? Cheers, |
I built LAPACK natively and statically linked the libraries and their included BLAS rather than OpenBLAS. This works and gets us past the libgcc_s requirement from before. Now I am probably breaking some glue that is in OpenBLAS. Build fails with:
Here is the build log at the break in full: https://termbin.com/8hrr Next steps are to either get the Owl build to work from here: or patch up OpenBLAS to make me some static LAPACK libs as they do by default, which avoids implicit inclusion of libgcc_s. @mseri @jzstark Which path should I take? What have I broken by not using OpenBLAS? |
Fantastic. I don't think you are breaking anything, those errors are actually warnings from recent ocaml versions. It is just compiled with too restrictive flags. If you add PA Are you working on my branch or on owl master? In the first case, the warning should disappear if you rebase it on the current owl master. At least I though the warning was silenced recently |
@mseri Great! Got a little further. We are trying to link now. =)
Thoughts on this guy? Seems like I may have unbalanced the owl stub generator. |
Off of your branch, which seems rebased. The first warning did disappear after rebasing to master. My fork and branch live here: |
@jhgorse jhgorse im working off your repo, I'm using the new clang-15 compiler that came out last week it seems like everything builds but the new clang has an issue that the stdlibc++ is not longer supported on apple devices, so it seems like the linker is getting into issues there. I'm using just dune build without any custom flags. |
@xinslu what is the linker error? |
@jhgorse Standard Library for c and cpp are not found linker is not able to connect -lm and -lSystem. I understand this has nothing to do with owl but i think its the furthest I gotten with the the base config and build. All the other files compile without an error I have errors when I reach the linker stage. |
@xinslu can you show me the log? From just before the error (the last build command that fails) to the end of the output. Even though it is not owl, we can still try to get your paths right for linking. If you find success, we will be interested in reproducing it. Cheers, |
As you see, I cannot link to the Standard Library. I tried using different stdlib defs by invoking a simple program using math.h and running -v. To check the path invocation, I've tried many different CFLAG combinations but none work. Any help is appreciated. |
@xinslu try the dune command with We need to drill down and see what -L library paths are being used. Then we can add things like |
@jhgorse I tried use the verbose flag nothing much related to clang linker invocation came out of it so i reverted to using a simple program using <math.h>. Here is the result of
|
This is independent of 4.14 and I believe is still relevant if you want to support arm64. It is only a preliminary step though |
OK, thanks. Let's come back to this and find a solution after I clean up the code. Looks like changing from master to main automatically closed the PR somewhow. |
@xinslu and team, I have built owl for the m1 successfully. Here are the steps to reproduce:
Changing
to
in Next, the performance tests. Cheers, |
@jhgorse if you merge my arm branch into your changes, you can avoid deleting the x86 tweak. I added some macros to disable that only on ARM |
The sse flags are not supported by ARM64 (this was a potential issue also for new macs, but I don't have a way to test it). I suggest disabling them.
This should improve the situation (or even fix) #569 and should fix (at least for now) #568
I also added ARM to the tests, to double-check...