-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite LAPACK so it's not in F77 #7
Comments
Indeed this needs some explanation because 99% of the responses will be "yeah right, sure" which is not the audience I care about so I don't feel discouraged by any of such responses hence feel free if you feel like it. AcknowledgementsI think current and past authors and contributors of BLAS and LAPACK should be knighted and placed in chateaus and mansions with 4 lambos and 2 yachts each. There is no way to measure their contributions to the scientific community. It is in fact make me feel a bit uneasy given how much money is poured into AI and not to this fundamental library. So I will make some remarks below but this is purely from future-proof perspective and nothing related to past decisions. It is what it is. And in fact I know maintainers of Reference LAPACK already contemplate about similar points. But if I summarize them here, it is going to open up a different discussion which I want to skip completely. Meat of the argumentThe goal is to store LAPACK in a format that is both
and currently it is neither with fortran77. Very brief and opinionated personal summary;Currently, LAPACK is the fundamental reason why we need fortran compilers or even we need to care about fortran in general if we are not using fortran. BLAS has been ported to C/Assembly multiple times in various forms and except the reference implementation, nobody uses it in its native form. If BLAS is to NumPy, LAPACK is to SciPy in our case; LAPACK heavily uses BLAS operations but essentially it is just a never-ending stream of for loops and index olympics to do clever stuff with arrays. Thus it has no special property that blocks such porting to whichever language we would be using. Therefore fundamentally, F77 is the gatekeeper of taking that code and implementing anywhere that has a suitable compiler of any other language. In other words, if the reference implementation was in C, it would have been substantially easier to port it to almost all platforms under the blue sky. And I would speculate that it would have been translated to many languages. In fact the totality of the numerical community is forced to either find a fortran compiler or use a compiled library but still forced to have a Fortran layer on top to prevent ABI compatibility issues. As every SciPy maintainer by now knows, it is just an endless pain point. Plus, initiatives such as Pyodide are always getting stuck at wrapping fortran code and anytime SciPy rewrites anything from F77 to C they remove a good chunk of unnecessary code that was meant for handling f77. Besides, Fortran community is dwarfed by C community and also when it comes to low-level optimizations, lot of folks who can perform SSE optimizations cannot do anything about the LAPACK code since it is in Fortran. So it is a loss of capacity and skill. Anything to be done is already being done to BLAS such as BLIS or GotoBLAS/OpenBLAS, since it is a smaller codebase and also simple to follow the functionality goals, but for LAPACK very little has been done, say, OpenBLAS having a custom parallelized So far there have been only two, battle tested, and proven, software languages that has true archival properties. First one is obvious, Fortran77 and the other one is C (I'm talking about multi-decades). Hence we are left with C. Now, it would be unfair to call C as a better fortran alternative when it comes to numerical array computing. Because it absolutely is not. It has less unergonomic parts however it is just tedious to have straightforward code. But a C array is just a pacifier to hide pointer arithmetic. And we are talking about a huge array library. Some might say well what is wrong with The most problematic issue for C is Microsoft and its relentless insistence on not conforming to C99 standards. Therefore, it is always some stupid boilerplate code that needs to be included at the top of the project structure and can hurt in the long run. However, it has been done countless times and can stay as a necessary evil. Or can be left for the folks who want to use MSVC. If open source C compilers become widely adopted in Windows ecosystem then probably the best strategy might be is to drop MSVC completely and be done with it. But until then, it would hurt majority of the users. This might sound bad but it is much worse for Windows if Fortran is used already which is why we are here. An ode to fortran thoughFortran, old or new, regardless of its age, has superior array ergonomics, hands down, no competition. It is fantastic in that regard, so much so that, NumPy borrowed almost all identically. But that's where good things about F77 end for us. Because it is typically 1-indexed and column major memory layout, it does not play well with anything else (except julia and matlab). It is possible to change the indexing in fortran however LAPACK code did not. Hence one of the biggest annoyances of such rewrite will be 2D array index bonanza if C to be selected but the advantages so far outweigh this very important annoyance. Language designers completely miss the number crunching community and we are always piggybacking on regular langs such as Python's In summary, fortran is a very good language to stick with, if you don't interface with other languages ( Obligatory Rust and Zig sectionWhy not Zig or Rust? Rust already has a couple of attempts, a very nice one is https://github.com/sarah-ek/faer-rs and a there are few others via ndalgebra crate. But its impossible looking syntax is killing the starting rationale of this rewrite. Zig has multidimensional arrays but it is no different than C and also does not come with complex numbers which is in my opinion two very important trains missed. It is now available in Why not others langs?I don't know them. Put it below if you want to make a case. But we might as well use C to get all the middle man out despite all the problems given above.
|
In case it will avoid pointless debates for the future, I don't like working with C especially after using C3, Zig and bunch of other langs which fix "a different part of C and not the parts I need". I am just forced into it because of the lack of another specialized array language. My unicorn wish is to have a truly "arrays are at the center" language. Think of the ever so wonderful Lua and tables, but instead |
A few small point to summarize the pages of thoughts I could have written about this topic (not enough time, sorry): (1) For SciPy, once we get rid of all our own Fortran code, we have no fundamental problem with Reference LAPACK being written in F77. The problem we have with BLAS and LAPACK is an ABI problem - multiple problems actually, from LAPACKE not being well-enough supported to the routines like (2) I'm honestly not sure if it would help for LAPACK to be rewritten. Whether some vendor or open source effort writes an accelerated implementation is also based on need. The payoff for an individual routine to be written with SIMD/assembly/etc. is much smaller for LAPACK than it is for BLAS, so the chances of that effort being done are also much lower. |
We should get to a retreat in a cabin somewhere so we can compare pages. (1) I can also go the other way, the reason why you have ABI problems is actually f77. The very reason for LAPACKE existing is also f77. I can give you another word that hides a saga; (2) This one I don't follow yet. Whatever makes BLIS fast can be applied to intermediate steps in LAPACK too which I think what https://github.com/flame/libflame was supposed to be in the first place. It is just not as tidy as BLAS routines. If you can unroll things in BLAS, you can still unroll in LAPACK too. Also this assumes that there is no room for improvement in LAPACK which is quite not true. Even LU can do much better. That's what Julia/Rust folks are trying to generalize in many different codebases. But the code cannot be modified as is and has to be rewritten. The main problem is that it is just not feasible to decode all that old crust and reorganize the new code and squeeze it back into F77 without causing mayhem. After quite some thousands of line of code translation, I can guarantee you, there are low-hanging fruits left and right everywhere. You can do the same exercise, just look at our translation of The point I wanted to make is that it is possible to do optimizations without knowing the actual algorithm if you are using C if you encounter a for loop. In f77 you need a surgery. Much easier in modern fortran but not in f77. Also it is not "practically" possible to do it in F77 unless you know everything about the algorithm because the algorithm is baked into indices such as if We were holding the same idea about our fortran codebase being not feasible to translate, needs unreasonable human hours, they are battle-tested, crazily optimized yada yada. And I am not that blind or that stupid, I saw them everywhere too. We had all the reasons not to do it but now we are where we are. If we do not do this step, regardless of the bugs that will seep through, we are not going to go anywhere and still be discussing ABI issues in the decades to come which is a big waste of time to treat the symptom. To be honest, I am starting to think that everybody is making too much of a deal about these old algorithms. They are not magical code written in Ocaml or Assembly but just flat out, |
@ilayn What do you think should be the next step? A couple more questions:
|
I don't have a concrete plan yet to start the work but the following are something that would help understanding the situation;
and many more questions. However, before all this a few examples should be done I think such as
Not sure, but there is some. They are quite strict (up to a few ULP) hence when there is a test I'd expect it to have good coverage. But this is never guaranteed.
I don't know any direct Rust versions. But many rewrites are providing the same functionality by re-coding (not translating). But I am not sure, I did not really sit down for a comprehensive research. Note that NumPy used to avoid LAPACK by using the so-called "LAPACK LITE" in a similar fashion but it was just f2c'd LAPACK. |
Keeping in mind that I am not a linear algebra person: I tend to think of C as legacy language at this point, and more broadly any language that lacks memory safety by default. So personally I'd go with Rust for any new infrastructure projects; syntax can be learnt, a C ABI can be exposed... My impression is that https://docs.rs/num/latest/num/complex/struct.Complex.html is the standardized Rust-specific API/ABI for complex numbers, it's used by A random thought: https://github.com/immunant/c2rust transpiler is apparently successful enough to be used in some real world projects, e.g. as the starting point for a video codec (https://www.memorysafety.org/blog/porting-c-to-rust-for-av1/). One can imagine the equivalent for Fortran 77. |
As I tried to argue above, C is indeed the legacy language however memory safety is not an issue on these decades old code. Hence it is a matter of whether all the necessary details are included in the language. The complex number issue is one, like you point out, but also the 2D array libraries are still competing so it seems to me that there is a very high cost of choosing the wrong one.
This only exists for F77->Modern Fortran with partial success so far and I don't think there is any help coming anytime soon. |
Have you guys seen that there is a Modern Fortran implementation? https://github.com/perazz/fortran-lapack It is already included on the Fortran stdlib with preprocessing for different types: https://github.com/fortran-lang/stdlib/blob/master/src/stdlib_linalg_lapack.fypp |
I do and have discussed this on fortran-lang discourse. It is a reformulation of the modern function declaration with little or no code rewrite (their parsing python script is also available in case interested) hence it is not solving the problems I have mentioned above. |
From @ilayn, who can hopefully expand:
The text was updated successfully, but these errors were encountered: