Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ARM64 #570

Closed
wants to merge 4 commits into from
Closed

Support for ARM64 #570

wants to merge 4 commits into from

Conversation

mseri
Copy link
Member

@mseri mseri commented Jan 5, 2021

The sse flags are not supported by ARM64 (this was a potential issue also for new macs, but I don't have a way to test it). I suggest disabling them.

This should improve the situation (or even fix) #569 and should fix (at least for now) #568

I also added ARM to the tests, to double-check...

The sse flags are not supported by ARM64 (this was a potential issue also for new macs, but I don't have a way to test it).

Signed-off-by: Marcello Seri <[email protected]>
@mseri mseri requested review from jzstark and ryanrhymes January 5, 2021 21:57
@ryanrhymes ryanrhymes requested review from mor1 and tachukao January 5, 2021 22:06
@ryanrhymes
Copy link
Member

How much does -sse impact the performance? Did @jzstark evaluate this before?
Can this switch be conditioned on the arch?

@mseri
Copy link
Member Author

mseri commented Jan 5, 2021

How much does -sse impact the performance? Did @jzstark evaluate this before?

No idea, that is why I also added him as reviewer. In any case I think this is not supported on ARM-based macs, so I think we will need to find a solution.
We don't use sse on eigen or AEON either by the way, so that may need to be revisited and adjusted as well.

Can this switch be conditioned on the arch?

I think it must be possible, but I don't know how to check the ARCH. Unfortunately I don't have the time to study that right now.
If this PR works, anybody can pick it up and add the feature before merging.

@mseri
Copy link
Member Author

mseri commented Jan 6, 2021

Looking at gcc documentation it seems to me that this should make a difference only on i386 and maybe old compilers, all of -mfpmath=sse -msse -msse2 should be enabled by default on x86_64

@jzstark
Copy link
Collaborator

jzstark commented Jan 7, 2021

I have not done evaluation about the sse flag, but I think this StackOverflow post shows that it does have an impact on the code performance in general. Personally I'm in no hurry to remove this flag before we can make sure that removing this flag definitely have no impact on the performance of Owl. As Marcello pointed out, making clear how this flag is used by default and how it interacts with other flags on various platforms should be helpful.

We have some code to detecting the cpu architecture at C code level, but I think we need something at the OCaml level here. I don't know about that, but perhaps some similar method as we use in configure.ml to detect different OS could potentially be utilised.

@mseri
Copy link
Member Author

mseri commented Jan 7, 2021

The post you link confirms what I was saying though: “ -msse -msse2 -mfpmath=sse is already the default for x86-64, but not for 32-bit i386”. So this change should not make any difference: Owl doesn’t really support i386, it has tests failing on i386 since the opam-repository has been testing it. I think we should aim to add arm64 support sooner rather than later (and maybe add a remark on i386 to enable those flags and keep an eye out for bugs if somebody wants to try)

@mseri
Copy link
Member Author

mseri commented Jan 9, 2021

This is how we can add arch specific flags in configurator: https://github.com/mirage/mirage-crypto/blob/master/config/cfg.ml#L5-L16

@mseri mseri changed the title Improve support for ARM64 Support for ARM64 Mar 30, 2021
@mseri
Copy link
Member Author

mseri commented Apr 10, 2021

I know it is not a proper benchmark but I did run all the owl examples and simulations that I have running around with and without this code (all 64 bits machines, ubuntu and macos), and there is no apparent difference in the running time (measured with time), confirming what the documentation says (i.e., these flags should be redundant in 64bit systems).

mseri added a commit to mseri/owl that referenced this pull request Apr 10, 2021
Until owlbarn#570 is accepted

Signed-off-by: Marcello Seri <[email protected]>
@mseri mseri mentioned this pull request Apr 10, 2021
ryanrhymes pushed a commit that referenced this pull request Apr 14, 2021
Until #570 is accepted

Signed-off-by: Marcello Seri <[email protected]>
@jhgorse
Copy link

jhgorse commented Jan 2, 2022

Greetings. I am building owl for the M1 chip on macOS. I have effectively applied this patch as I was starting the basic port. It stops when we get to an assembly call for cpuid, which ARM64 lacks.

The caller is trying to determine the cache size available.

⌁ [jhg:~/Work/owl] mseri-patch-1 ± make
dune external-lib-deps --missing @install @runtest
dune build @install
          cc src/owl/owl_ndarray_contract_stub.o
In file included from src/owl/core/owl_ndarray_contract_stub.c:17:
src/owl/core/owl_ndarray_contract_impl.h:88:9: warning: incompatible pointer types assigning to 'int64_t *' (aka 'long long *') from 'intnat []' [-Wincompatible-pointer-types]
  cp->n = X->dim;
        ^ ~~~~~~
...
4 warnings generated.
          cc src/owl/owl_core_utils.o (exit 1)
(cd _build/default/src/owl && /usr/bin/cc -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64 -O2 -fno-strict-aliasing -fwrapv -pthread -I/opt/homebrew/Cellar/openblas/0.3.19/include -g -O3 -Ofast -mcpu=apple-m1 -funroll-loops -ffast-math -DSFMT_MEXP=19937 -fno-strict-aliasing -Wno-tautological-constant-out-of-range-compare -Wno-logical-op-parentheses -g -I /Users/jhg/.opam/default/lib/ocaml -I /Users/jhg/.opam/default/lib/bigarray-compat -I /Users/jhg/.opam/default/lib/bytes -I /Users/jhg/.opam/default/lib/camlzip -I /Users/jhg/.opam/default/lib/ctypes -I /Users/jhg/.opam/default/lib/eigen -I /Users/jhg/.opam/default/lib/eigen/cpp -I /Users/jhg/.opam/default/lib/integers -I /Users/jhg/.opam/default/lib/npy -I /Users/jhg/.opam/default/lib/zip -I ../base -o owl_core_utils.o -c owl_core_utils.c)
src/owl/core/owl_core_utils.c:218:5: error: invalid output constraint '=a' in asm
    CPUID(cpuinfo, 0x4, cache_id);
    ^
src/owl/core/owl_macros.h:97:36: note: expanded from macro 'CPUID'
    __asm__ __volatile__ ("cpuid": "=a" (cpuinfo[0]), "=b" (cpuinfo[1]), "=c" (cpuinfo[2]), "=d" (cpuinfo[3]) : "0" (func), "2" (id) );
                                   ^
...
                                   ^
4 errors generated.
File "src/base/dense/owl_base_dense_ndarray_generic.ml", line 432, characters 20-24:
Error (warning 16 [unerasable-optional-argument]): this optional argument cannot be erased.
...
Error (warning 5 [ignored-partial-application]): this function application is partial,
maybe some arguments are missing.
make: *** [build] Error 1

@jzstark
Copy link
Collaborator

jzstark commented Jan 2, 2022

@jhgorse Thanks for reporting this problem. When thus cache size function was introduced, it was mainly for the optimization of convolution operations. I vaguely recall that if non-compatible architectures are used, it is supposed returns some default values, but apparently M1 was not covered. Unfortunately currently I lack hardware and time to revise the code base and add extra architectures on the whitelist. If you could propose a solution and make a PR, it would be really appreciated!

@jhgorse
Copy link

jhgorse commented Jan 2, 2022

Okay, I got past that. The compiler doesn't like the potential to need that object code so it dies. The preprocessor needs to filter out that intel or amd code for cpuid.

Now we fail on link with:

⌁ ⌂6.43 [jhg:~/Work/owl] mseri-patch-1(+0/-82) 12.s 2 ± make
dune external-lib-deps --missing @install @runtest
dune build @install
  ocamlmklib src/owl/dllowl_stubs.so,src/owl/libowl_stubs.a (exit 2)
(cd _build/default && /Users/jhg/.opam/default/bin/ocamlmklib.opt -g -o src/owl/owl_stubs src/owl/SFMT.o src/owl/airy.o src/owl/airyf.o src/owl/bdtr.o src/owl/beta.o src/owl/btdtr.o src/owl/cbrt.o src/owl/chbevl.o src/owl/chbevlf.o src/owl/chdtr.o src/owl/const.o src/owl/constf.o src/owl/dawsn.o src/owl/dawsnf.o src/owl/ellie.o src/owl/ellik.o src/owl/ellpe.o src/owl/ellpj.o src/owl/ellpk.o src/owl/exp10.o src/owl/exp2.o src/owl/expn.o src/owl/fdtr.o src/owl/fresnl.o src/owl/gamma.o src/owl/gammaf.o src/owl/gdtr.o src/owl/gels.o src/owl/hyp2f1.o src/owl/hyperg.o src/owl/hypergf.o src/owl/i0.o src/owl/i0f.o src/owl/i1.o src/owl/i1f.o src/owl/igam.o src/owl/igami.o src/owl/incbet.o src/owl/incbi.o src/owl/ivf.o src/owl/j0.o src/owl/j0f.o src/owl/j1.o src/owl/j1f.o src/owl/jnf.o src/owl/jv.o src/owl/jvf.o src/owl/k0.o src/owl/k0f.o src/owl/k1.o src/owl/k1f.o src/owl/kn.o src/owl/kolmogorov.o src/owl/lanczos.o src/owl/mtherr.o src/owl/nbdtr.o src/owl/ndtr.o src/owl/ndtri.o src/owl/owl_cblas_generated_stub.o src/owl/owl_core_utils.o src/owl/owl_dcdflib.o src/owl/owl_distribution_common_c.o src/owl/owl_fftpack_float32.o src/owl/owl_fftpack_float64.o src/owl/owl_ipmpar.o src/owl/owl_lapacke_generated_stub.o src/owl/owl_maths_special_gamma.o src/owl/owl_maths_special_impl.o src/owl/owl_maths_special_stub.o src/owl/owl_matrix_check_stub.o src/owl/owl_matrix_swap_stub.o src/owl/owl_ndarray_contract_stub.o src/owl/owl_ndarray_conv_stub.o src/owl/owl_ndarray_fma_stub.o src/owl/owl_ndarray_maths_stub.o src/owl/owl_ndarray_pool_stub.o src/owl/owl_ndarray_repeat_stub.o src/owl/owl_ndarray_slide_stub.o src/owl/owl_ndarray_sort_stub.o src/owl/owl_ndarray_transpose_stub.o src/owl/owl_ndarray_upsampling_stub.o src/owl/owl_ndarray_utils_stub.o src/owl/owl_slicing_basic_stub.o src/owl/owl_slicing_fancy_stub.o src/owl/owl_stats_dist_beta.o src/owl/owl_stats_dist_binomial.o src/owl/owl_stats_dist_cauchy.o src/owl/owl_stats_dist_chi2.o src/owl/owl_stats_dist_dirichlet.o src/owl/owl_stats_dist_exponential.o src/owl/owl_stats_dist_exponpow.o src/owl/owl_stats_dist_f.o src/owl/owl_stats_dist_gamma.o src/owl/owl_stats_dist_gaussian.o src/owl/owl_stats_dist_gennorm.o src/owl/owl_stats_dist_geometric.o src/owl/owl_stats_dist_gumbel1.o src/owl/owl_stats_dist_gumbel2.o src/owl/owl_stats_dist_hypergeometric.o src/owl/owl_stats_dist_laplace.o src/owl/owl_stats_dist_logistic.o src/owl/owl_stats_dist_lognormal.o src/owl/owl_stats_dist_logseries.o src/owl/owl_stats_dist_lomax.o src/owl/owl_stats_dist_multinomial.o src/owl/owl_stats_dist_negative_binomial.o src/owl/owl_stats_dist_noncentral_chi2.o src/owl/owl_stats_dist_noncentral_f.o src/owl/owl_stats_dist_poisson.o src/owl/owl_stats_dist_power.o src/owl/owl_stats_dist_rayleigh.o src/owl/owl_stats_dist_stub.o src/owl/owl_stats_dist_t.o src/owl/owl_stats_dist_triangular.o src/owl/owl_stats_dist_uniform.o src/owl/owl_stats_dist_vonmises.o src/owl/owl_stats_dist_wald.o src/owl/owl_stats_dist_weibull.o src/owl/owl_stats_dist_zipf.o src/owl/owl_stats_extend_misc.o src/owl/owl_stats_extend_shuffle.o src/owl/owl_stats_extend_stub.o src/owl/owl_stats_prng_stub.o src/owl/owl_stats_ziggurat.o src/owl/pdtr.o src/owl/polevlf.o src/owl/psi.o src/owl/rgamma.o src/owl/round.o src/owl/scipy_iv.o src/owl/sf_error.o src/owl/shichi.o src/owl/sici.o src/owl/sincos.o src/owl/sindg.o src/owl/spence.o src/owl/sqrtf.o src/owl/stdtr.o src/owl/struve.o src/owl/struvef.o src/owl/tandg.o src/owl/tukey.o src/owl/unity.o src/owl/yn.o src/owl/ynf.o src/owl/zeta.o src/owl/zetac.o -L/opt/homebrew/Cellar/openblas/0.3.19/lib -lopenblas -lm)
ld: file not found: @rpath/libgcc_s.1.1.dylib for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
File "src/base/dense/owl_base_dense_ndarray_generic.ml", line 432, characters 20-24:
Error (warning 16 [unerasable-optional-argument]): this optional argument cannot be erased.

rpath/libgcc_s.1.1.dylib not found? That's odd. I thought I was using clang.

@jhgorse
Copy link

jhgorse commented Jan 2, 2022

@jhgorse Thanks for reporting this problem. When thus cache size function was introduced, it was mainly for the optimization of convolution operations. I vaguely recall that if non-compatible architectures are used, it is supposed returns some default values, but apparently M1 was not covered. Unfortunately currently I lack hardware and time to revise the code base and add extra architectures on the whitelist. If you could propose a solution and make a PR, it would be really appreciated!

Would be happy to add a commit or two off of your PR here once we get things ship shape. =)

@jhgorse
Copy link

jhgorse commented Jan 2, 2022

~/.opam/default/bin/ocamlmklib.opt will need updated to handle .tbd files, which are ascii text-based stub library files.
xref: ziglang/zig#8935

Digging into the generated outputs with -v, I see cc:

cc -shared                    -flat_namespace -undefined suppress -Wl,-no_compact_unwind                     -g -o src/owl/dllowl_stubs.so src/owl/SFMT.o src/owl/airy.o src/owl/airyf.o src/owl/bdtr.o src/owl/beta.o src/owl/btdtr.o src/owl/cbrt.o src/owl/chbevl.o src/owl/chbevlf.o src/owl/chdtr.o src/owl/const.o src/owl/constf.o src/owl/dawsn.o src/owl/dawsnf.o src/owl/ellie.o src/owl/ellik.o src/owl/ellpe.o src/owl/ellpj.o src/owl/ellpk.o src/owl/exp10.o src/owl/exp2.o src/owl/expn.o src/owl/fdtr.o src/owl/fresnl.o src/owl/gamma.o src/owl/gammaf.o src/owl/gdtr.o src/owl/gels.o src/owl/hyp2f1.o src/owl/hyperg.o src/owl/hypergf.o src/owl/i0.o src/owl/i0f.o src/owl/i1.o src/owl/i1f.o src/owl/igam.o src/owl/igami.o src/owl/incbet.o src/owl/incbi.o src/owl/ivf.o src/owl/j0.o src/owl/j0f.o src/owl/j1.o src/owl/j1f.o src/owl/jnf.o src/owl/jv.o src/owl/jvf.o src/owl/k0.o src/owl/k0f.o src/owl/k1.o src/owl/k1f.o src/owl/kn.o src/owl/kolmogorov.o src/owl/lanczos.o src/owl/mtherr.o src/owl/nbdtr.o src/owl/ndtr.o src/owl/ndtri.o src/owl/owl_cblas_generated_stub.o src/owl/owl_core_utils.o src/owl/owl_dcdflib.o src/owl/owl_distribution_common_c.o src/owl/owl_fftpack_float32.o src/owl/owl_fftpack_float64.o src/owl/owl_ipmpar.o src/owl/owl_lapacke_generated_stub.o src/owl/owl_maths_special_gamma.o src/owl/owl_maths_special_impl.o src/owl/owl_maths_special_stub.o src/owl/owl_matrix_check_stub.o src/owl/owl_matrix_swap_stub.o src/owl/owl_ndarray_contract_stub.o src/owl/owl_ndarray_conv_stub.o src/owl/owl_ndarray_fma_stub.o src/owl/owl_ndarray_maths_stub.o src/owl/owl_ndarray_pool_stub.o src/owl/owl_ndarray_repeat_stub.o src/owl/owl_ndarray_slide_stub.o src/owl/owl_ndarray_sort_stub.o src/owl/owl_ndarray_transpose_stub.o src/owl/owl_ndarray_upsampling_stub.o src/owl/owl_ndarray_utils_stub.o src/owl/owl_slicing_basic_stub.o src/owl/owl_slicing_fancy_stub.o src/owl/owl_stats_dist_beta.o src/owl/owl_stats_dist_binomial.o src/owl/owl_stats_dist_cauchy.o src/owl/owl_stats_dist_chi2.o src/owl/owl_stats_dist_dirichlet.o src/owl/owl_stats_dist_exponential.o src/owl/owl_stats_dist_exponpow.o src/owl/owl_stats_dist_f.o src/owl/owl_stats_dist_gamma.o src/owl/owl_stats_dist_gaussian.o src/owl/owl_stats_dist_gennorm.o src/owl/owl_stats_dist_geometric.o src/owl/owl_stats_dist_gumbel1.o src/owl/owl_stats_dist_gumbel2.o src/owl/owl_stats_dist_hypergeometric.o src/owl/owl_stats_dist_laplace.o src/owl/owl_stats_dist_logistic.o src/owl/owl_stats_dist_lognormal.o src/owl/owl_stats_dist_logseries.o src/owl/owl_stats_dist_lomax.o src/owl/owl_stats_dist_multinomial.o src/owl/owl_stats_dist_negative_binomial.o src/owl/owl_stats_dist_noncentral_chi2.o src/owl/owl_stats_dist_noncentral_f.o src/owl/owl_stats_dist_poisson.o src/owl/owl_stats_dist_power.o src/owl/owl_stats_dist_rayleigh.o src/owl/owl_stats_dist_stub.o src/owl/owl_stats_dist_t.o src/owl/owl_stats_dist_triangular.o src/owl/owl_stats_dist_uniform.o src/owl/owl_stats_dist_vonmises.o src/owl/owl_stats_dist_wald.o src/owl/owl_stats_dist_weibull.o src/owl/owl_stats_dist_zipf.o src/owl/owl_stats_extend_misc.o src/owl/owl_stats_extend_shuffle.o src/owl/owl_stats_extend_stub.o src/owl/owl_stats_prng_stub.o src/owl/owl_stats_ziggurat.o src/owl/pdtr.o src/owl/polevlf.o src/owl/psi.o src/owl/rgamma.o src/owl/round.o src/owl/scipy_iv.o src/owl/sf_error.o src/owl/shichi.o src/owl/sici.o src/owl/sincos.o src/owl/sindg.o src/owl/spence.o src/owl/sqrtf.o src/owl/stdtr.o src/owl/struve.o src/owl/struvef.o src/owl/tandg.o src/owl/tukey.o src/owl/unity.o src/owl/yn.o src/owl/ynf.o src/owl/zeta.o src/owl/zetac.o    -L/opt/homebrew/Cellar/openblas/0.3.19/lib -lopenblas -lm

Then finally the ld command:

Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -dynamic -dylib -arch arm64 -flat_namespace -platform_version macos 12.0.0 12.1 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -undefined suppress -undefined suppress -o src/owl/dllowl_stubs.so -L/opt/homebrew/Cellar/openblas/0.3.19/lib -L/usr/local/lib -no_compact_unwind src/owl/SFMT.o src/owl/airy.o src/owl/airyf.o src/owl/bdtr.o src/owl/beta.o src/owl/btdtr.o src/owl/cbrt.o src/owl/chbevl.o src/owl/chbevlf.o src/owl/chdtr.o src/owl/const.o src/owl/constf.o src/owl/dawsn.o src/owl/dawsnf.o src/owl/ellie.o src/owl/ellik.o src/owl/ellpe.o src/owl/ellpj.o src/owl/ellpk.o src/owl/exp10.o src/owl/exp2.o src/owl/expn.o src/owl/fdtr.o src/owl/fresnl.o src/owl/gamma.o src/owl/gammaf.o src/owl/gdtr.o src/owl/gels.o src/owl/hyp2f1.o src/owl/hyperg.o src/owl/hypergf.o src/owl/i0.o src/owl/i0f.o src/owl/i1.o src/owl/i1f.o src/owl/igam.o src/owl/igami.o src/owl/incbet.o src/owl/incbi.o src/owl/ivf.o src/owl/j0.o src/owl/j0f.o src/owl/j1.o src/owl/j1f.o src/owl/jnf.o src/owl/jv.o src/owl/jvf.o src/owl/k0.o src/owl/k0f.o src/owl/k1.o src/owl/k1f.o src/owl/kn.o src/owl/kolmogorov.o src/owl/lanczos.o src/owl/mtherr.o src/owl/nbdtr.o src/owl/ndtr.o src/owl/ndtri.o src/owl/owl_cblas_generated_stub.o src/owl/owl_core_utils.o src/owl/owl_dcdflib.o src/owl/owl_distribution_common_c.o src/owl/owl_fftpack_float32.o src/owl/owl_fftpack_float64.o src/owl/owl_ipmpar.o src/owl/owl_lapacke_generated_stub.o src/owl/owl_maths_special_gamma.o src/owl/owl_maths_special_impl.o src/owl/owl_maths_special_stub.o src/owl/owl_matrix_check_stub.o src/owl/owl_matrix_swap_stub.o src/owl/owl_ndarray_contract_stub.o src/owl/owl_ndarray_conv_stub.o src/owl/owl_ndarray_fma_stub.o src/owl/owl_ndarray_maths_stub.o src/owl/owl_ndarray_pool_stub.o src/owl/owl_ndarray_repeat_stub.o src/owl/owl_ndarray_slide_stub.o src/owl/owl_ndarray_sort_stub.o src/owl/owl_ndarray_transpose_stub.o src/owl/owl_ndarray_upsampling_stub.o src/owl/owl_ndarray_utils_stub.o src/owl/owl_slicing_basic_stub.o src/owl/owl_slicing_fancy_stub.o src/owl/owl_stats_dist_beta.o src/owl/owl_stats_dist_binomial.o src/owl/owl_stats_dist_cauchy.o src/owl/owl_stats_dist_chi2.o src/owl/owl_stats_dist_dirichlet.o src/owl/owl_stats_dist_exponential.o src/owl/owl_stats_dist_exponpow.o src/owl/owl_stats_dist_f.o src/owl/owl_stats_dist_gamma.o src/owl/owl_stats_dist_gaussian.o src/owl/owl_stats_dist_gennorm.o src/owl/owl_stats_dist_geometric.o src/owl/owl_stats_dist_gumbel1.o src/owl/owl_stats_dist_gumbel2.o src/owl/owl_stats_dist_hypergeometric.o src/owl/owl_stats_dist_laplace.o src/owl/owl_stats_dist_logistic.o src/owl/owl_stats_dist_lognormal.o src/owl/owl_stats_dist_logseries.o src/owl/owl_stats_dist_lomax.o src/owl/owl_stats_dist_multinomial.o src/owl/owl_stats_dist_negative_binomial.o src/owl/owl_stats_dist_noncentral_chi2.o src/owl/owl_stats_dist_noncentral_f.o src/owl/owl_stats_dist_poisson.o src/owl/owl_stats_dist_power.o src/owl/owl_stats_dist_rayleigh.o src/owl/owl_stats_dist_stub.o src/owl/owl_stats_dist_t.o src/owl/owl_stats_dist_triangular.o src/owl/owl_stats_dist_uniform.o src/owl/owl_stats_dist_vonmises.o src/owl/owl_stats_dist_wald.o src/owl/owl_stats_dist_weibull.o src/owl/owl_stats_dist_zipf.o src/owl/owl_stats_extend_misc.o src/owl/owl_stats_extend_shuffle.o src/owl/owl_stats_extend_stub.o src/owl/owl_stats_prng_stub.o src/owl/owl_stats_ziggurat.o src/owl/pdtr.o src/owl/polevlf.o src/owl/psi.o src/owl/rgamma.o src/owl/round.o src/owl/scipy_iv.o src/owl/sf_error.o src/owl/shichi.o src/owl/sici.o src/owl/sincos.o src/owl/sindg.o src/owl/spence.o src/owl/sqrtf.o src/owl/stdtr.o src/owl/struve.o src/owl/struvef.o src/owl/tandg.o src/owl/tukey.o src/owl/unity.o src/owl/yn.o src/owl/ynf.o src/owl/zeta.o src/owl/zetac.o -lopenblas -lm -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.0.0/lib/darwin/libclang_rt.osx.a
ld: file not found: @rpath/libgcc_s.1.1.dylib for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
...
@(#)PROGRAM:ld  PROJECT:ld64-711
BUILD 21:57:24 Nov 17 2021
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
Library search paths:
	/opt/homebrew/Cellar/openblas/0.3.19/lib
	/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib
Framework search paths:
	/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/
ld: file not found: @rpath/libgcc_s.1.1.dylib for architecture arm64

@jhgorse
Copy link

jhgorse commented Jan 3, 2022

libgcc_s.dylib is implicitly getting included due to libopenblas, specifically liblapacke. Something which likely shouldn't be grabbing libgcc is pulling it in the LAPACKE build.

LAPACK, which we appear to sometimes bundle in OpenBLAS, depends on fortran, which requires gfortran, or a commercial compiler, or potentially flang llvm for the apple-m1. xref: Reference-LAPACK/lapack#643

x86_64 emulation through Rosetta2 is a stop-gap which may suffice while a better solution is designed and implemented. It requires a separate homebrew install, e.g. /usr/local/homebrew, using arch -x86_64 $(echo $SHELL).

Apple has their own port of LAPACK https://developer.apple.com/documentation/accelerate/solving_systems_of_linear_equations_with_lapack

And BLAS via vecLIB
https://developer.apple.com/documentation/accelerate/veclib

What would it take to use these to satisfy the dependency? What is lost in these libraries that are not in the latest open source?

Cheers,
Joe

@jhgorse
Copy link

jhgorse commented Jan 4, 2022

I built LAPACK natively and statically linked the libraries and their included BLAS rather than OpenBLAS. This works and gets us past the libgcc_s requirement from before. Now I am probably breaking some glue that is in OpenBLAS.

Build fails with:

dune external-lib-deps --missing @install @runtest
dune build @install
File "src/base/dense/owl_base_dense_ndarray_generic.ml", line 432, characters 20-24:
Error (warning 16 [unerasable-optional-argument]): this optional argument cannot be erased.

Here is the build log at the break in full: https://termbin.com/8hrr

Next steps are to either get the Owl build to work from here:
https://github.com/jhgorse/owl/tree/mseri-patch-1

or patch up OpenBLAS to make me some static LAPACK libs as they do by default, which avoids implicit inclusion of libgcc_s.

@mseri @jzstark Which path should I take? What have I broken by not using OpenBLAS?

@mseri
Copy link
Member Author

mseri commented Jan 4, 2022

Fantastic. I don't think you are breaking anything, those errors are actually warnings from recent ocaml versions. It is just compiled with too restrictive flags.

If you add --profile=release to the dune invocation it should work fine.

PA Are you working on my branch or on owl master? In the first case, the warning should disappear if you rebase it on the current owl master. At least I though the warning was silenced recently

@jhgorse
Copy link

jhgorse commented Jan 4, 2022

@mseri Great! Got a little further. We are trying to link now. =)

Undefined symbols for architecture arm64:
  "_LAPACKE_clagge", referenced from:
      _owl_stub_393_LAPACKE_clagge in libowl_stubs.a(owl_lapacke_generated_stub.o)
      _owl_stub_393_LAPACKE_clagge_byte9 in libowl_stubs.a(owl_lapacke_generated_stub.o)
     (maybe you meant: _owl_stub_393_LAPACKE_clagge_byte9, _owl_stub_393_LAPACKE_clagge )

Thoughts on this guy? Seems like I may have unbalanced the owl stub generator.

@jhgorse
Copy link

jhgorse commented Jan 4, 2022

@mseri

PA Are you working on my branch or on owl master? In the first case, the warning should disappear if you rebase it on the current owl master. At least I though the warning was silenced recently

Off of your branch, which seems rebased. The first warning did disappear after rebasing to master. My fork and branch live here:
https://github.com/jhgorse/owl/tree/mseri-patch-1

@xinslu
Copy link

xinslu commented Mar 30, 2022

@jhgorse jhgorse im working off your repo, I'm using the new clang-15 compiler that came out last week it seems like everything builds but the new clang has an issue that the stdlibc++ is not longer supported on apple devices, so it seems like the linker is getting into issues there. I'm using just dune build without any custom flags.
I tried running it using Xcode's clang there is a symbol not found in architecture arm64 error. I'm not sure how we can fix this but apparently the new clang update enables -march=native which is the error i was facing on making eigen, to be fixed. I'm not sure how to fix the linker error perhaps a native version of the stdlib might work?

@jhgorse
Copy link

jhgorse commented Mar 30, 2022

@xinslu what is the linker error?

@xinslu
Copy link

xinslu commented Mar 31, 2022

@xinslu what is the linker error?

@jhgorse Standard Library for c and cpp are not found linker is not able to connect -lm and -lSystem. I understand this has nothing to do with owl but i think its the furthest I gotten with the the base config and build. All the other files compile without an error I have errors when I reach the linker stage.

@jhgorse
Copy link

jhgorse commented Mar 31, 2022

@xinslu can you show me the log? From just before the error (the last build command that fails) to the end of the output.

Even though it is not owl, we can still try to get your paths right for linking. If you find success, we will be interested in reproducing it.

Cheers,
Joe

@xinslu
Copy link

xinslu commented Apr 1, 2022

@jhgorse

❯ dune build --profile=release
File "src/aeos/config/dune", line 2, characters 7-16:
2 |  (name configure)
           ^^^^^^^^^
ld: library not found for -lm
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
File "caml_startup", line 1:
Error: Error during linking (exit code 1)
File "src/owl/config/dune", line 2, characters 7-16:
2 |  (name configure)
           ^^^^^^^^^
ld: library not found for -lm
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
File "caml_startup", line 1:
Error: Error during linking (exit code 1)
File "src/base/dune", line 23, characters 0-89:
23 | (library
24 |  (name owl_base)
25 |  (public_name owl-base)
26 |  (wrapped false)
27 |  (libraries bigarray))
ld: library not found for -lSystem
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
File "caml_startup", line 1:

As you see, I cannot link to the Standard Library. I tried using different stdlib defs by invoking a simple program using math.h and running -v. To check the path invocation, I've tried many different CFLAG combinations but none work. Any help is appreciated.

@jhgorse
Copy link

jhgorse commented Apr 1, 2022

@xinslu try the dune command with --verbose
https://dune.readthedocs.io/en/stable/usage.html

We need to drill down and see what -L library paths are being used. Then we can add things like /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk/System/Library/Frameworks and such.

@xinslu
Copy link

xinslu commented Apr 1, 2022

@jhgorse I tried use the verbose flag nothing much related to clang linker invocation came out of it so i reverted to using a simple program using <math.h>. Here is the result of clang -XLinker -v

@(#)PROGRAM:ld  PROJECT:ld64-762
BUILD 06:28:58 Feb 18 2022
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
ld: warning: directory not found for option '-L/Applications/Xcode.app/Content'
ld: warning: directory not found for option '-L/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include'
Library search paths:
	/usr/local/include
	/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1
	/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/include
	/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
	/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks
	/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib
	/usr/lib
	/usr/local/lib
Framework search paths:
	/Library/Frameworks/
	/System/Library/Frameworks/

@ryanrhymes
Copy link
Member

I will planning to clean up the code and migrate to ocaml 4.14 in the following months. Is this patch still relevant? Should we merge or drop this? @mseri @jzstark @tachukao

@ryanrhymes ryanrhymes deleted the branch master April 10, 2022 07:11
@ryanrhymes ryanrhymes closed this Apr 10, 2022
@mseri
Copy link
Member Author

mseri commented Apr 10, 2022

This is independent of 4.14 and I believe is still relevant if you want to support arm64. It is only a preliminary step though

@mseri mseri deleted the mseri-patch-1 branch April 10, 2022 08:36
@ryanrhymes
Copy link
Member

This is independent of 4.14 and I believe is still relevant if you want to support arm64. It is only a preliminary step though

OK, thanks. Let's come back to this and find a solution after I clean up the code. Looks like changing from master to main automatically closed the PR somewhow.

@jhgorse
Copy link

jhgorse commented Apr 11, 2022

@xinslu and team,

I have built owl for the m1 successfully. Here are the steps to reproduce:

brew install opam ocaml openblas eigen

#pkg-conflig
export PKG_CONFIG_PATH=/opt/homebrew/opt/openblas/lib/pkgconfig

#eigen cflags
export EIGEN_FLAGS="-O3 -Ofast -mcpu=apple-m1 -funroll-loops -ffast-math"
export EIGENCPP_OPTFLAGS="-Ofast -mcpu=apple-m1 -funroll-loops -ffast-math"

#owl cflags

export OWL_CFLAGS="-g -O3 -Ofast -mcpu=apple-m1 -funroll-loops -ffast-math -DSFMT_MEXP=19937 -fno-strict-aliasing -Wno-tautological-constant-out-of-range-compare"; export OWL_AEOS_CFLAGS="-g -O3 -Ofast -mcpu=apple-m1 -funroll-loops -ffast-math -DSFMT_MEXP=19937 -fno-strict-aliasing"

mkdir ~/Work && cd ~/Work
# src/owl/core/owl_core_utils.c issues
# removed x86 code that does not compile on arm64
# see offending code here: https://github.com/jhgorse/owl/commit/bab252a413db946560eb11e3cd54a94100a46a69
# note: somehow need to exclude this code when we are arm64 arch.
git clone https://github.com/jhgorse/owl
opam pin owl ~/Work/owl

dune test fails on warnings, though:

% dune test
In file included from src/owl/core/owl_ndarray_contract_stub.c:17:
src/owl/core/owl_ndarray_contract_impl.h:88:9: warning: incompatible pointer types assigning to 'int64_t *' (aka 'long long *') from 'intnat []' [-Wincompatible-pointer-types]
  cp->n = X->dim;
        ^ ~~~~~~
In file included from src/owl/core/owl_ndarray_contract_stub.c:26:
src/owl/core/owl_ndarray_contract_impl.h:88:9: warning: incompatible pointer types assigning to 'int64_t *' (aka 'long long *') from 'intnat []' [-Wincompatible-pointer-types]
  cp->n = X->dim;
        ^ ~~~~~~
In file included from src/owl/core/owl_ndarray_contract_stub.c:35:
src/owl/core/owl_ndarray_contract_impl.h:88:9: warning: incompatible pointer types assigning to 'int64_t *' (aka 'long long *') from 'intnat []' [-Wincompatible-pointer-types]
  cp->n = X->dim;
        ^ ~~~~~~
In file included from src/owl/core/owl_ndarray_contract_stub.c:44:
src/owl/core/owl_ndarray_contract_impl.h:88:9: warning: incompatible pointer types assigning to 'int64_t *' (aka 'long long *') from 'intnat []' [-Wincompatible-pointer-types]
  cp->n = X->dim;
        ^ ~~~~~~
4 warnings generated.
File "test/unit_signal.ml", line 78, characters 14-15:
78 |       |> fun (a, b) -> b
                   ^
Error (warning 27 [unused-var-strict]): unused variable a.

Changing

      |> fun (a, b) -> b

to

      |> fun (_a, b) -> b

in test/unit_signal.ml succeeds without the warning 27 issue. I have added that to my fork for convenience.

Next, the performance tests.

Cheers,
Joe

@mseri
Copy link
Member Author

mseri commented Apr 11, 2022

@jhgorse if you merge my arm branch into your changes, you can avoid deleting the x86 tweak. I added some macros to disable that only on ARM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants