Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for row- and column-major order #3

Open
grothesque opened this issue Sep 17, 2024 · 2 comments
Open

Support for row- and column-major order #3

grothesque opened this issue Sep 17, 2024 · 2 comments

Comments

@grothesque
Copy link

I reference a discussion from rust-ndarray/ndarray#1272 (comment):

@grothesque wrote:

Is your choice motivated by BLAS/LAPACK being (marginally) more efficient for column-major data?

Do I understand correctly that mdarray is column major in the sense that the restricted layouts are column major? But the fully strided layout can accept any (fixed rank) strided array, right? Right now in Rust we cannot have a fully generic ndspan like in C++, but it should be possible to have a set of useful layouts for both column-major and row-major within a single library, or do you see a problem with this?

@fre-hu replied:

The choice is only to have a convention, and then column major is common for linear algebra. It is used both for memory layout and to give the order of dimensions in iteration.

Using strided layout with row major data will work, but operations that depend on iteration order will have worse access pattern. It works fine for interfacing though, and internally one could make a copy or reverse indices.

To have full support for both row and column major would require one more generic parameter for the order. I had it in an earlier version, both removed it as it made both the library and interface more complex. C++ mdspan gets around this since it is quite thin.

From my point of view, row-major is arguably more relevant for a Rust array library than column major:

  • Rust is much more a spiritual heir to C/C++ than Fortran.
  • expr![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]] having shape (3, 2) and not (2, 3) by default seems surprising.
  • NumPy is and likely will remain the array library to which most people are first exposed. Despite its matlab lineage, but in consistency with the Python/C-world to which it belongs, NumPy uses row major by default. Moreover, seamless interoperability with NumPy (at least potentially) seems like an important feature of a Rust array library. (Cf. the success of Polars.)

So, if there can be only one, I'd vote for row major 😇...

However, as far as memory layout goes, it should be possible to have both without an additional generic parameter, right? Just like C++ mdspan has layout_right, layout_left, and layout_stride.

The problem seems to be more about ensuring efficient order of dimensions when iterating. One possibility would be to have both ("iterate_left_to_right", and "iterate_right_to_left"), and then only one (or none) would be efficient for a given array.

To treat the general case efficiently, there could be a function to (statically or dynamically) reorder dimensions into either layout (if possible).

All of this would not require an additional generic parameter (I believe).

@fre-hu
Copy link
Owner

fre-hu commented Sep 22, 2024

I have switched to row-major order as it makes more sense, see the latest commit. I have also reduced the layout types to dense and strided, since these are the most important. Better to keep it simple and add back if it is really needed.

Regarding having both row and column-major order, yes the order could be merged into the layout type. The complexity is still there though, and there must be rules about iteration order, how to derive types for subarrays and how to do broadcasting etc. I will think some more about it.

@grothesque
Copy link
Author

Sounds great. Looking forward to your ideas about merging order into layout.

I fully agree that an array crate for Rust should strive to be minimalist. I think that the right approach is to think of it as providing common concepts and glue, while actual numerical algorithms are to be provided by other crates.

This is the approach that has been successful with Fortran, and that C++ is finally pursuing with their mdspan. I consider the monolithic approach of Numpy/SciPy, despite their success, as a technical liability rooted in the shortcomings of Python's packaging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants