-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we allow C++? #988
Comments
I suspect this will create portability problems for some - finding a Fortran compiler for some platforms is hard enough, but having both Fortran and C++ as dependencies in a project just from the reference implementation of a standard API may be a bit much (and by the time everything got rewritten, users may have moved on to newer APIs that drop the Fortran naming constraints) |
Thanks for the feedback
While this is true, I kind of assumed that most platforms that have a Fortran compiler also have a C++ compiler, maybe not a cutting edge one with C++20 support, but we could restrict which features we use. Do you have examples? I assume you know more about those issues than me from your work on OpenBLAS. |
I would never thought I would see this day 😃 and I am very confused about this. I do feel that indeed getting out of F77 is an absolute win. However going into C++ scares me and smells like yet another trap for 10-years-later ourselves regretting it. Definitely not want to go into a language flamewar at all but C++ does not feel like an archival language like Fortran or C is and it is getting larger and weirder with each standard update. However there are many many industry grade C++ code hence I would refrain from dismissing it. Having said that, in the same breath, I would also agree that C++ is not an array friendly number crunching language. There is even a new addition accepted for C++23 Over SciPy, for many reasons, we started to write all F77 codebase out of F77 scipy/scipy#18566 and we are covering quite some distance. We started to use C just to have a very neutral codebase (with all error warning flags enabled and carefully not using any opinionated parts of C) in case we jump to another language should it arise in the near future. In fact, I am gathering courage to write BLAS in C/Rust myself as a pet project once SciPy work finalizes. So please let me know, if you would need some extra hand. The times seem somehow ripe for it. I know Rust is not ready for everything but these new languages actually offer quite robust codebase with very little room for strange memory issues we always have with lwork et al. Not proposing that we jump on the band-wagon but having native support for threading, SIMD, GPU and other goodies seems to me worth shooting for though. The critical issue is, I can imagine, the actual experts not knowing enough Rust. There are also multiple native attempts in Rust by different folks that really do feel like the language itself is already competitive without much too low-level wizardry for example https://github.com/sarah-ek/faer-rs Anyways, thank you even for considering this option though. |
The major problems with Fortran are the massively error prone code duplication (real/complex, single/double), error prone variable initialization, and the lack of conditional compilation including assertions and initializing memory in debug mode with, e.g., NaN. Even better to this day, gfortran allows me to compile code without warnings that reads undeclared and uninitialized variables even when I support C++ as a replacement because it fixes all of the mentioned problems (see for example my C++ header-only library generating uniformly distributed floating-point numbers for which there exists only one generic implementation). C++11 and C++14 compilers are widely available, also on the supercomputers that I had access to. A major challenge with C++ is the number of language features (virtual inheritance, concepts, variadic template arguments, undefined vs unspecified behavior...) and the complex semantics. In this regard I suggest to limit the use to a subset of a modern C++ ("modern" meaning C++11 or newer).
In my experience finding a Fortran compiler is the hard part, especially when building for Android. LLVM became the default toolchain in Android native development kit (NDK) release 13b in 2016 (see the NDK release notes); GCC was finally removed in r18b (2018). The f18 Fortran compiler became part of LLVM only in 2020, see Flang and F18. Finding a feature-complete(!) C++20/C++23 compiler is harder though, cf. Cppreference: C++ Compiler Support. One question comes to mind though: What is the difference between LAPACK with C++ and the tlapack project then? |
As one of the main authors, I know TLAPACK quite well. While I love the project, I don't think the direction where it is currently headed is suitable for a replacement for reference-lapack. The value of TLAPACK lies more in being able to use completely different layouts relatively seamlessly (not row-column major, but more like tiled, block cyclic, distributed through starpu,....). The same templated nature that makes it powerful also means that codes need to work for both owning and non-owning matrix classes, half precision, ... I also realized, limiting ourselves to c++14 means no if constexpr from c++17. That has already proven to be essential in TLAPACK for dealing with real and complex code. Maybe we can still use that. |
I have been thinking about this quite a while now from a rather amateurish point of view. Regardless of my starting point, it always boils down to translating things out of F77 because pretty much everything stops at packaging and compiler availability for it. Hence I am becoming more convinced that at some point we have to do this necessary evil. My audacity is originating from doing a big chunk of translation on SciPy side, scipy/scipy#18566 (so far successfully) regarding our old codebase touching codebases like quadpack, minpack, arpack and so on that netlib graciously host too. I've also written a diatribe about this specifically in the Python context in scientific-python/faster-scientific-python-ideas#7 and there you can see the breakdown of the work that awaits in terms of number of SLOC compared to the work we have done so far in SciPy F77 codebase. I don't know when (or ever) but I am planning to take a stab at translating LAPACK to C out of desperation. I hope to spiral out, starting from basics Because there is no language in the horizon so far that would give us the array manipulation flexibility of fortran while taking us closer to the contemporary toolchain availability like C, not sure if success is guaranteed for it but I'd like to hear your thoughts about this preconceived attempt. If there are any showstoppers known ahead of time that would be fantastic to know. |
[Context: Ilhan and I are both SciPy maintainers] I think the first half of this statement is obviously true, but the latter half is changing on a much more manageable timeline. LLVM 19 will ship in 1-2 weeks with a flang that's basically production-ready and available on all major platforms (including windows!). On the unixes, gfortran is a valid (and freely-available) choice too of course. That does not take anything away from the impressive work Ilhan has been doing in SciPy (which is beneficial to SciPy in more ways than just getting rid of F77), however I wanted to share my slightly different perspective on the toolchain situation. One of my other hats is a maintainer of the conda-forge ecosystem, and I'm planning to switch our default fortran compilers on windows to flang 19 basically as soon as it comes out (we've been using it in limited fashion - e.g. for SciPy - already since LLVM 17, and I've been testing many packages with the release candidates). As I ran into an issue while compiling lapack with flang, I was looking around the issue tracker here and found the mention here. But that's something for a separate issue. PS. Regarding the question posed in the title, I'll note that building 3.12.0 already fails if there's no C++ compiler present.
|
I do not think this is true unless you enabled the options to fetch and build either or both of the BLAS++ and LAPACK++ bindings ? (which btw still appear to download the original 2020 versions of the respective sources from ICL's bitbucket) |
Sorry, actually this appears to be an unintended side effect of #834 - where the intention must have been to remove the project's global dependency on Fortran in order to allow building just LAPACKE. However, the CMAKE default for a |
Another Fortran problem is the lack of young Fortran programmers. Every Fortran programmer that I know learned the language because he had to work on an existing code base.
C++23 features multi-dimensional arrays views ( |
I have been looking into C++26 standard additions and it seems seriously complicated for me. But I don't have visual affinity to such new features hence I don't have any opinion about that. Because slicing seems very noisy etc. Another question I have been thinking about is that if C++ or similar C-family code is getting in, are they also going to work with col-major order in the arrays? Soon we will have some opportunity to test out a few functions written in C on top of MKL, BLIS, OpenBLAS and see if this effort will be even feasible in the limited scope. Thus any feedback is very welcome and highly appreciated. |
some perhaps more provoking thoughts - why C++, why not a fashionable assumed-to-be-safe-by-design language like Rust ? And what would the continued claim to "Reference" status be based on ? |
In the linked discussion above scientific-python/faster-scientific-python-ideas#7 I did not consider C++ but I would consider either C as the true archival value or something new that has proper and established complex number and 2d array support. Hence Zig is also an important candidate if the complex number situation improves. Memory safety won't be as important as these are established functions and there is no surprise in the way we use memory, if there is a memory issue it should be rather easy to find out anyways (either leakage or out-of-bounds). |
Yes, the layout can be set to row- and column-major just like with, say, NumPy.
C++ has generics (templates) which are really useful when writing linear algebra code and playing with mixed precision.
Rust has functionality to use C libraries (libraries that follow the C naming and calling convention and which are generated by compiling, e.g., C code, C++ code with The major challenge that I see is the build system which is an integral part of the Rust experience: Cargo is introduced early in the Rust Book. Cargo downloads all dependencies and builds them with the build options given by the dependee. This is the opposite of the conventional workflow with C/C++/Fortran code where a dependency is built with user-options of choice and the dependee only checks if the dependency is present and if it was built with the required options. Cargo works very well if your code relies only on the Rust ecosystem. It does not seem to work well when you want re-use existing libraries or even customize the builds of all dependencies. For example, it is not possible to have build flags specific to a dependency which is very easily achievable with C and C++ code (set Another problem might be the strict type-checking of generics which make writing linear algebra code unpleasant but I would have to crank out an example and see the compiler errors before I could be more specific about this point. |
In any other project I would agree with you, but the type of memory issues Rust will protect against is not something that is very likely to happen in the way LAPACK routines work (I think). The main memory issue we deal with is out of bounds access, and I don't think Rust can easily protect against that, or at least not in a way that C++ can't. As for "Reference", I think LAPACK as a library has different functions. Yes, it is a reference interface that different implementation share, but it also functions as a kind of "default" implementation and I think that second function is almost more important than the first. I think the reference implementation will be more valuable if we can actually assert that no out-of-bounds access is occurring... |
I have been working on a pet project. What is would like to do is introduce some C++ code in LAPACK, with a matrix class to represent the matrices, not just a pointer. This is not meant as a way for people to have a nice API to call LAPACK (there are plenty of libraries that provide nice wrappers), but to make things easier for developers in the future. It would allow for comprehensive bounds checks that even work for subarray (via asserts of course, we should disable those in a production build).
The way I want to achieve this is by mixing C++ and Fortran, so that we don't have to suffer through the immense task it would be to translate all of LAPACK at once. We could just stick to C++ for new algorithms or for routines that need to be reworked/debugged (for example, dgesdd, see #672)
A proof of concept implementation is available at https://github.com/thijssteel/lapackv4
I appreciate all feedback.
The text was updated successfully, but these errors were encountered: